Marseille
Mining Argument Structures with Expressive Inference (Linear and LSTM Engines)
Install / Use
/learn @vene/MarseilleREADME
marseille
mining argument structures with expressive inference (with linear and lstm engines)
What is it?
Marseille learns to predict argumentative proposition types and the support relations between them, as inference in a expressive factor graph.
Read more about it in our paper,
Vlad Niculae, Joonsuk Park, Claire Cardie. Argument Mining with Structured SVMs and RNNs. In: Proceedings of ACL, 2017.
If you find this project useful, you may cite us using:
@inproceedings{niculae17marseille,
author={Vlad Niculae and Joonsuk Park and Claire Cardie},
title={{Argument Mining with Structured SVMs and RNNs}},
booktitle={Proceedings of ACL},
year=2017
}
Requirements
- numpy
- scipy
- scikit-learn
- pystruct
- nltk
- dill
- docopt
- dynet v1.1
- lightning
- ad3 >= v2.1 (
pip install ad3)
Usage
(replace $ds with cdcp or ukp)
-
download the data from http://joonsuk.org/ and unzip it in the subdirectory
data, i.e. the path./data/process/erule/train/is valid. -
extract relevant subset of GloVe embeddings:
python -m marseille.preprocess embeddings $ds --glove-file=/p/glove.840B.300d.txt
- extract features:
python -m marseille.features $ds
# (for cdcp only:)
python -m marseille.features cdcp-test
- generate vectorized train-test split (for baselines only)
mkdir data/process/.../
python -m marseille.vectorize split cdcp
- run chosen model, for example:
python -m experiments.exp_train_test $ds --method rnn-struct --model strict
(for dynet models, set --dynet-seed=42 for exact reproducibility)
- compare results:
python -m experiments.plot_test_results.py $ds
To reproduce cross-validation model selection, you also would need to run:
python -m marseille.vectorize folds $ds
Running a model on your own data:
If you have some documents e.g. F.txt, G.txt that you would like to run a pretrained model on, read on.
- download the required preprocessing toolkits: Stanford CoreNLP (tested with version 3.6.0) and the WING-NUS PDTB discourse parser (tested with this commit) and configure their paths:
export MARSEILLE_CORENLP_PATH=/home/vlad/corenlp # path to CoreNLP
export MARSEILLE_WINGNUS_PATH=/home/vlad/wingnus # path to WING-NUS parser
Note: If you already generated F.txt.json with CoreNLP and F.txt.pipe with the WING-NUS parser (e.g., on a different computer), you may skip this step and marseille will detect those files automatically.
Otherwise, these files are generated the first time that a UserDoc object
is instantiated for a given document. In particular, the step below will do
this automatically.
- extract the features:
python -m marseille.features user F G # raw input must be in F.txt & G.txt
This is needed for the RNN models too, because the feature files encode some metadata about the document structure.
- predict, e.g. using the model saved in step 4 above:
python -m experiments.predict_pretrained --method=rnn-struct \
test_results/exact=True_cdcp_rnn-struct_strict F G
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
last30days-skill
19.8kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
000-main-rules
Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce
