TextSSL
[AAAI 2022] Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification
Install / Use
/learn @qkrdmsghk/TextSSLREADME
About data
We use the same benchmark datasets that are used in Yao, Mao, and Luo 2019, where we follow the same train/test splits and data preprocessing for MR, Ohsumed and 20NG datasets as Kim 2014; Yao, Mao, and Luo 2019. Thanks for their work.
For R8 and R52 datasets, they are only provided by a preprocessed version that lack punctuations and do not have explicit sample names. Since we use documents with sentence segmentation information to construct graph, we re-extract the data from original Reuters-21578 dataset.
You can download the dataset here:
- re-extract R8 and R52 datasets.
python re-extract_data/mk_R8_R52.py --name R8 - remove words.
python remove_words.py --name R8
About path
To run the code, you should change Your_path=/data/project/yinhuapark/ssl/ to your own path.
Make graph dataset
- create co-occurrence pairs of each documents.
python ssl_make_graphs/create_cooc_document.py --name R8 - construct graphs of each documents in InMemoryDatset.
python ssl_make_graphs/PygDocsGraphDataset.py --name R8
Train
python ssl_graphmodels/pyg_models/train_docs.py --name R8
Reference
If you find our paper and repo useful, please cite our paper:
@inproceedings{piao2022sparse,
title={Sparse Structure Learning via Graph Neural Networks for Inductive Document Classification},
author={Piao, Yinhua and Lee, Sangseon and Lee, Dohoon and Kim, Sun},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={36},
number={10},
pages={11165--11173},
year={2022}
}
The readme is inspired by GSAT.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
research_rules
Research & Verification Rules Quote Verification Protocol Primary Task "Make sure that the quote is relevant to the chapter and so you we want to make sure that we want to have it identifie
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
