iFᴀᴄᴇᴛSᴜᴍ

iFᴀᴄᴇᴛSᴜᴍ is an interactive faceted summarization approach and system for navigating within a large document-set on a topic.

Paper 📄 https://arxiv.org/pdf/2109.11621.pdf (Proceedings of EMNLP 2021, System Demonstrations)
Demo 🤩 https://nlp.biu.ac.il/~hirsche5/ifacetsum/

iFacetSum Gif

Development

How to run

First, git clone the project.

Set up the server

Run pip install -r requirements.txt
Run python -m spacy download en_core_web_md
From inside python, run import nltk and then nltk.download('punkt')
Run python WebApp/server/app.py

Set up the client (node)

Run cd WebApp/client
Run npm install
Run npm start
Open the url http://localhost:3000

How to work with DUC2006 data

You should request access for DUC2006Clean from https://duc.nist.gov/ and place it inside the data/ directory.

How to add your own data

Change Config.py to point to your data directory, including the text files and the cluster files (either json or conll format).

How to create your own clusters

To support reproducibility efforts and adding custom document-sets, all models used were released and available online.

CD Event Co-reference Alignment

Create event mentions using the models and scripts in https://github.com/ariecattan/event_extractor.
Create pairwise mention scores and clusters using CDLM https://github.com/aviclu/CDLM.
Use agglomerative clustering to combine mentions into clusters.

CD Entities Co-reference Alignment

For the end-to-end iFᴀᴄᴇᴛSᴜᴍ entities script (following above instructions) refer to https://github.com/AlonEirew/wd-plus-srl-extraction#wec-cd-coreference

Create entities mentions using SpanBert, accessible from https://docs.allennlp.org/models/main/.
Use the WEC model to score each pairwise.
Use agglomerative clustering to combine WD and CD mentions into clusters.

Proposition Alignment

Please refer to https://github.com/oriern/SuperPAL for instructions of extracting propositions using OIE and extracting pairwise scores.
iFᴀᴄᴇᴛSᴜᴍ's code takes care of converting the pairwise CSV from SuperPAL into clusters.

Citation:

If you find our work useful, please cite the paper as:

@article{hirsch2021ifacetsum,
  title={iFacetSum: Coreference-based Interactive Faceted Summarization for Multi-Document Exploration},
  author={Hirsch, Eran and Eirew, Alon and Shapira, Ori and Caciularu, Avi and Cattan, Arie and Ernst, Ori and Pasunuru, Ramakanth and Ronen, Hadar and Bansal, Mohit and Dagan, Ido},
  journal={Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
  year={2021}
}

IFACETSUM

Install / Use

README