PQG

Code for paper title "Learning Semantic Sentence Embeddings using Pair-wise Discriminator" COLING-2018

Generate Convert Improve

Install / Use

/learn @badripatro/PQG

About this skill

Quality Score

0/100

README

Learning Semantic Sentence Embeddings using Pair-wise Discriminator [ COLING-2018]

Torch code for Parapharse Question Generation. For more information, please refer the paper

Result

Requirements

This code is written in Lua and requires Torch. The preprocssinng code is in Python, and you need to install NLTK if you want to use NLTK to tokenize the question.

pip install nltk

You also need to install the following package in order to sucessfully run the code.

Training

We have prepared everything for you ;)

Download Dataset

We have referred neuraltalk2 and Text-to-Image Synthesis to prepare our code base. The first thing you need to do is to download the Quora Question Pairs dataset from the Quora Question Pair website and put the same in the data folder.

If you want to train from scratch continue reading or if you just want to evaluate using a pretrained model then head over to Datafiles section and download the data files (put all the data files in the data folder) and pretrained model( put this in the pretrained folder) and run eval.lua

Now we need to do some preprocessing, head over to the prepro folder and run

$ python quora_prepro.py

Note The above command generates json files for 100K question pairs for train, 5k question pairs for validation and 30K question pairs for Test set. If you want to change this and instead use only 50K question pairs for training and rest remaining the same, then you need to make some minor changes in the above file. After this step, it will generate the files under the data folder. quora_raw_train.json, quora_raw_val.json and quora_raw_test.json

Preprocess Paraphrase Question

$ python prepro_quora.py --input_train_json ../data/quora_raw_train.json --input_test_json ../data/quora_raw_test.json

This will generate two files in data/ folder, quora_data_prepro.h5 and quora_data_prepro.json.

Train the model

We have everything ready to train the Question paraphrase model. Back to the root directory

th train.lua -input_ques_h5 data/quora_data_prepro.h5 -input_json data/quora_data_prepro.json

Evaluate the model

In root folder run

th eval.lua -input_ques_h5 data/quora_data_prepro.h5 -input_json data/quora_data_prepro.json

This command will provide BLEU, METEOR, ROUGE and CIDER scores for Paraphrase Question Generation.

Metric

To Evaluate Question paraphrase, you need to download the evaluation tool. To evaluate Questio Pair , you can use script myeval.py under coco-caption/ folder. If you need to evaluate based on Bleu, Meteor, Rouge and CiDER scores, follow All the instruction from this link here. You also need to download this zip file and create a data directory in the meteor folder of pycoevalcap and put this downloaded file there.

For calculating the TER score

This code is taken from the OpenNMT repo

Step1: Put the results checkpoint json file inside the folder check_point_json

Step2: Rename the check point json file to resuts_json

Step3: Rename the ground truth json file to quora_prepro_test_updated_int_4k

Step4: Run the ./score.sh file

Data Files

Download all the data files from here.

Evaluate using Pre-trained Model

The pre-trained model can be downloaded here.

Reference

If you use this code as part of any published research, please acknowledge the following paper

@inproceedings{patro2018learning,
  title={Learning Semantic Sentence Embeddings using Sequential Pair-wise Discriminator},
  author={Patro, Badri Narayana and Kurmi, Vinod Kumar and Kumar, Sandeep and Namboodiri, Vinay},
  booktitle={Proceedings of the 27th International Conference on Computational Linguistics},
  pages={2715--2729},
  year={2018}
}

Contributors

Badri N. Patro (badri@iitk.ac.in)
Vinod K. Kurmi (vinodkk@iitk.ac.in)
Sandeep Kumar (sandepkr@iitk.ac.in)

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

groundhog

398

Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).

last30days-skill

18.3k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

sec-edgar-agentkit

AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.

badripatro

View profile

View on GitHub

GitHub Stars52

CategoryEducation

Updated3mo ago

Forks9

badripatro/PQG

Languages

Jupyter Notebook

Security Score

97/100

Audited on Dec 9, 2025

No findings