Marseille

Mining Argument Structures with Expressive Inference (Linear and LSTM Engines)

Generate Convert Improve

Install / Use

/learn @vene/Marseille

About this skill

Quality Score

0/100

README

marseille

mining argument structures with expressive inference (with linear and lstm engines)

What is it?

Marseille learns to predict argumentative proposition types and the support relations between them, as inference in a expressive factor graph.

Requirements

numpy
scipy
scikit-learn
pystruct
nltk
dill
docopt
dynet v1.1
lightning
ad3 >= v2.1 (pip install ad3)

Usage

(replace $ds with cdcp or ukp)

download the data from http://joonsuk.org/ and unzip it in the subdirectory data, i.e. the path ./data/process/erule/train/ is valid.
extract relevant subset of GloVe embeddings:

    python -m marseille.preprocess embeddings $ds --glove-file=/p/glove.840B.300d.txt

extract features:

    python -m marseille.features $ds

    # (for cdcp only:)
    python -m marseille.features cdcp-test

generate vectorized train-test split (for baselines only)

    mkdir data/process/.../
    python -m marseille.vectorize split cdcp

run chosen model, for example:

    python -m experiments.exp_train_test $ds --method rnn-struct --model strict

(for dynet models, set --dynet-seed=42 for exact reproducibility)

compare results:

    python -m experiments.plot_test_results.py $ds

To reproduce cross-validation model selection, you also would need to run:

    python -m marseille.vectorize folds $ds

Running a model on your own data:

If you have some documents e.g. F.txt, G.txt that you would like to run a pretrained model on, read on.

download the required preprocessing toolkits: Stanford CoreNLP (tested with version 3.6.0) and the WING-NUS PDTB discourse parser (tested with this commit) and configure their paths:

    export MARSEILLE_CORENLP_PATH=/home/vlad/corenlp  #  path to CoreNLP
    export MARSEILLE_WINGNUS_PATH=/home/vlad/wingnus  #  path to WING-NUS parser

Note: If you already generated F.txt.json with CoreNLP and F.txt.pipe with the WING-NUS parser (e.g., on a different computer), you may skip this step and marseille will detect those files automatically.

Otherwise, these files are generated the first time that a UserDoc object is instantiated for a given document. In particular, the step below will do this automatically.

extract the features:

    python -m marseille.features user F G  # raw input must be in F.txt & G.txt

This is needed for the RNN models too, because the feature files encode some metadata about the document structure.

predict, e.g. using the model saved in step 4 above:

    python -m experiments.predict_pretrained --method=rnn-struct \
    test_results/exact=True_cdcp_rnn-struct_strict F G

Related Skills

YC-Killer

2.7k

A library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.

workshop-rules

Materials used to teach the summer camp <Data Science for Kids>

last30days-skill

19.8k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

000-main-rules

Project Context - Name: Interactive Developer Portfolio - Stack: Next.js (App Router), TypeScript, React, Tailwind CSS, Three.js - Architecture: Component-driven UI with a strict separation of conce

vene

View profile

View on GitHub

GitHub Stars66

CategoryEducation

Updated23d ago

Forks29

vene/marseille

Languages

Python

Security Score

100/100

Audited on Mar 16, 2026

No findings