IFACETSUM
Corpus exploration platform using advanced tools such as interactive summarization and multi document coreference resolution
Install / Use
/learn @BIU-NLP/IFACETSUMREADME
iFᴀᴄᴇᴛSᴜᴍ
iFᴀᴄᴇᴛSᴜᴍ is an interactive faceted summarization approach and system for navigating within a large document-set on a topic.
- Paper 📄 https://arxiv.org/pdf/2109.11621.pdf (Proceedings of EMNLP 2021, System Demonstrations)
- Demo 🤩 https://nlp.biu.ac.il/~hirsche5/ifacetsum/

Development
How to run
First, git clone the project.
Set up the server
- Run
pip install -r requirements.txt - Run
python -m spacy download en_core_web_md - From inside python, run
import nltkand thennltk.download('punkt') - Run
python WebApp/server/app.py
Set up the client (node)
- Run
cd WebApp/client - Run
npm install - Run
npm start - Open the url
http://localhost:3000
How to work with DUC2006 data
You should request access for DUC2006Clean from https://duc.nist.gov/ and place it inside the data/ directory.
How to add your own data
- Change
Config.pyto point to your data directory, including the text files and the cluster files (either json or conll format).
How to create your own clusters
To support reproducibility efforts and adding custom document-sets, all models used were released and available online.
CD Event Co-reference Alignment
- Create event mentions using the models and scripts in https://github.com/ariecattan/event_extractor.
- Create pairwise mention scores and clusters using CDLM https://github.com/aviclu/CDLM.
- Use agglomerative clustering to combine mentions into clusters.
CD Entities Co-reference Alignment
For the end-to-end iFᴀᴄᴇᴛSᴜᴍ entities script (following above instructions) refer to https://github.com/AlonEirew/wd-plus-srl-extraction#wec-cd-coreference
- Create entities mentions using SpanBert, accessible from https://docs.allennlp.org/models/main/.
- Use the WEC model to score each pairwise.
- Use agglomerative clustering to combine WD and CD mentions into clusters.
Proposition Alignment
- Please refer to https://github.com/oriern/SuperPAL for instructions of extracting propositions using OIE and extracting pairwise scores.
- iFᴀᴄᴇᴛSᴜᴍ's code takes care of converting the pairwise CSV from SuperPAL into clusters.
Citation:
If you find our work useful, please cite the paper as:
@article{hirsch2021ifacetsum,
title={iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration},
author={Hirsch, Eran and Eirew, Alon and Shapira, Ori and Caciularu, Avi and Cattan, Arie and Ernst, Ori and Pasunuru, Ramakanth and Ronen, Hadar and Bansal, Mohit and Dagan, Ido},
journal={Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
year={2021}
}
