InferDoc

Generate SQUAD style dataset from raw text file and train a transformer based question answering model .This repo has code from https://github.com/facebookresearch/UnsupervisedQA and https://github.com/deepset-ai/haystack

Generate Convert Improve

Install / Use

/learn @karthik19967829/InferDoc

About this skill

Quality Score

0/100

README

InferDoc

This repo has code to generate synthetic question answering training data using https://github.com/facebookresearch/UnsupervisedQA and training them using https://github.com/deepset-ai/haystack

SQUAD style QA dataset generation

cd self_supervised_qa && python -m unsupervisedqa.generate_synthetic_qa_data example_input.txt example_output

Transformer QA Model train ,eval and CLI testing

Usage:

qa_model.py train --data_dir=<data_dir> --train_file_name=<train_file_name> --dev_file_name=<dev_file_name>  --save_dir=<save_dir>\
qa_model.py test --data_dir=<data_dir> --eval_file_name=<eval_file_name> --save_dir=<save_dir>\
qa_model.py cli --data_dir=<data_dir> --save_dir=<save_dir>

Options:

--data_dir=<data_dir>........A namespace to find .txt squad formatted train or eval files
--train_file_name=<train_file_name>..............name of the train file in the data dir
--dev_file_name=<dev_file_name>..............The file to be used as a development set ,expected in SQUAD json format
--eval_file_name=<eval_file_name>..............The file to be used as a evaluation file,expected in SQUAD json format
--save_dir=<save_dir> ............The directory to save the trained model or to load the model from

Todo

Add automatic dataset generation within https://github.com/cdqa-suite/cdQA-ui to enable human in loop semi-supervised training Make fine-tuning on domain specific data more robust with https://github.com/deepset-ai/FARM/issues/141

Related Skills

proje

Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

research_rules

Research & Verification Rules Quote Verification Protocol Primary Task "Make sure that the quote is relevant to the chapter and so you we want to make sure that we want to have it identifie

karthik19967829

View profile

View on GitHub

GitHub Stars13

CategoryEducation

Updated5mo ago

Forks3

karthik19967829/InferDoc

Languages

Python

Security Score

77/100

Audited on Oct 16, 2025

No findings