InferDoc
Generate SQUAD style dataset from raw text file and train a transformer based question answering model .This repo has code from https://github.com/facebookresearch/UnsupervisedQA and https://github.com/deepset-ai/haystack
Install / Use
/learn @karthik19967829/InferDocREADME
InferDoc
This repo has code to generate synthetic question answering training data using https://github.com/facebookresearch/UnsupervisedQA and training them using https://github.com/deepset-ai/haystack
SQUAD style QA dataset generation
cd self_supervised_qa && python -m unsupervisedqa.generate_synthetic_qa_data example_input.txt example_output
Transformer QA Model train ,eval and CLI testing
Usage:
qa_model.py train --data_dir=<data_dir> --train_file_name=<train_file_name> --dev_file_name=<dev_file_name> --save_dir=<save_dir>\
qa_model.py test --data_dir=<data_dir> --eval_file_name=<eval_file_name> --save_dir=<save_dir>\
qa_model.py cli --data_dir=<data_dir> --save_dir=<save_dir>
Options:
--data_dir=<data_dir>........A namespace to find .txt squad formatted train or eval files
--train_file_name=<train_file_name>..............name of the train file in the data dir
--dev_file_name=<dev_file_name>..............The file to be used as a development set ,expected in SQUAD json format
--eval_file_name=<eval_file_name>..............The file to be used as a evaluation file,expected in SQUAD json format
--save_dir=<save_dir> ............The directory to save the trained model or to load the model from
Todo
Add automatic dataset generation within https://github.com/cdqa-suite/cdQA-ui to enable human in loop semi-supervised training Make fine-tuning on domain specific data more robust with https://github.com/deepset-ai/FARM/issues/141
Related Skills
proje
Interactive vocabulary learning platform with smart flashcards and spaced repetition for effective language acquisition.
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
research_rules
Research & Verification Rules Quote Verification Protocol Primary Task "Make sure that the quote is relevant to the chapter and so you we want to make sure that we want to have it identifie
