PromDA

Source code for ACL 2022 Paper "Prompt-based Data Augmentation for Low-Resource NLU Tasks"

Generate Convert Improve

Install / Use

/learn @GaryYufei/PromDA

About this skill

Quality Score

0/100

README

Prompt-based Data Augmentation for Low-Resource NLU Tasks

This repository is the official implementation of PromDA: Prompt-based Data Augmentation for Low-Resource NLU Tasks.

PromDA

Requirements

To install requirements:

conda create --name exp --file requirements.txt

Pre-training Soft Prompt

To obtain C4 realnewslike split, please run:

python get_large_pre_training_c4_data.py

We need to do the Prompt Pre-training, please run:

python pre_train_t5.py --config model_config/pre_train_keyword_pt.yml --serialization-dir pretrain_web_page_keyword_t5_short --train

In a Nvidia A100 GPU, it takes about 24 hours to complete the pre-training.

We also provide the checkpoint that we used in the paper here. Please put the file folder pretrain_web_page_keyword_t5_short in the root directory of this project.

Run PromDA

To run the full data augmentation experiments, please follow below instructuins:

Sequence Labelling Tasks

To run the wikiann experiments on the first GPU under the shot-10 setting using random seed 18, please run

bash script/run_few_shot_bert_prefix.sh 0 18 10 wikiann 1000

Sentence Classification Tasks

To run the sst2 experiments on the first GPU under the shot-10 setting using random seed 18, please run

bash script/run_few_shot_bert_prefix_sen_cls.sh 0 18 10 sst2 1000

Related Skills

node-connect

343.3k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

92.1k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

343.3k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

343.3k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。