BTmPG

Code for paper Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach by Zhe Lin, Xiaojun Wan. This paper is accepted by Findings of ACL'21.

Generate Convert Improve

Install / Use

/learn @L-Zhe/BTmPG

About this skill

Quality Score

0/100

README

BTmPG

Code for paper Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach by Zhe Lin, Xiaojun Wan. This paper is accepted by Findings of ACL'21. Please contact me at linzhe@pku.edu.cn for any question.

Dependencies

PyTorch 1.4
NLTK 3.5

Model

Create Vocabulary

You should first create a vocabulary from your corpora. You can use the following command.

python createVocab.py --file ~/context/train.tgt ~/context/train.src  \
                      --save_path ~/context/vocab.pkl \
                      --vocab_num 50000

Train

You can train your model leveraged the following command:

python train.py --cuda --cuda_num 5 \
                --train_source ~/context/train.src \
                --train_target ~/context/train.tgt \
                --test_source  ~/context/test.src \
                --test_target  ~/context/test.tgt \
                --vocab_path ~/context/vocab.pkl \
                --batch_size 32\
                --epoch 100 \
                --num_rounds 2 \
                --max_length 110 \
                --clip_length 100 \
                --model_save_path ~/context/output/model.pth \
                --generation_save_path ~/context/output

Inference

After training, you can leverage the following command to generate multi-round paraphrase.

python generator.py --cuda      --cuda_num 3 \
                    --source ~/context/test.src \
                    --target ~/context/test.tgt \
                    --vocab_path ~/context/vocab.pkl \
                    --batch_size 64 \
                    --num_rounds 10 \
                    --max_length 60 \
                    --model_path ~/context/model.pth \
                    --save_path ~/context/output/

We also provide the pretrain-model file in releases page.

Result

Case Study

Reference

If you use any content of this repo for your work, please cite the following bib entry:

@inproceedings{lin-wan-2021-pushing,
    title = "Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach",
    author = "Lin, Zhe  and
      Wan, Xiaojun",
    booktitle = "Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021",
    month = aug,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-acl.135",
    doi = "10.18653/v1/2021.findings-acl.135",
    pages = "1548--1557",
}

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

17.5k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

L-Zhe

View profile

View on GitHub

GitHub Stars14

CategoryEducation

Updated1y ago

Forks5

L-Zhe/BTmPG

Languages

Python

Security Score

80/100

Audited on Oct 30, 2024

No findings