MARIOH
Multiplicity-Aware Hypergraph Reconstruction
Install / Use
/learn @KyuhanLee/MARIOHREADME
MARIOH: Multiplicity-Aware Hypergraph Reconstruction
This repository provides the official implementation of MARIOH, a supervised method for reconstructing hyperedges in hypergraphs by leveraging edge multiplicity. MARIOH integrates several key components: a theoretically guaranteed filtering step to identify true size-2 hyperedges, a multiplicity-aware classifier for scoring hyperedge candidates, and a bidirectional search strategy that explores both high- and low-confidence cliques. These components work together to achieve accurate and efficient hypergraph reconstruction. For further details, please refer to our accompanying research paper.
Contents
main.py: The entry point script to run the hyperedge reconstruction pipeline.params.py: Parameter dictionaries for various datasets and modes (reduced or preserved).utils/: A directory containing modularized code for data processing, feature extraction, graph operations, model training, evaluation, and input/output utilities.data/: A directory that should contain the dataset-specific training and testing files.
Requirements
-
Python Version: 3.8+ recommended
-
Dependencies:
numpynetworkxtorchjoblibargparse- Additional Python dependencies can be installed via:
pip install -r requirements.txt
Adjust
requirements.txtor the installation commands as needed for your environment.
Datasets
You must place your datasets into the data/ directory. Each dataset should have its own subdirectory, for example:
data/
|-- {dataset_name}/
|-- train.txt # Training data (reduced mode)
|-- test.txt # Testing data (reduced mode)
|-- train_dup.txt # Training data (preserved mode)
+-- test_dup.txt # Testing data (preserved mode)
- Reduced mode uses
train.txtandtest.txt. - Preserved mode uses
train_dup.txtandtest_dup.txt.
Please refer to the related publication for details on dataset formats and preprocessing steps.
Running the Code
To run the pipeline, navigate to the directory containing main.py and execute:
python main.py --data {dataset_name} --gpu 0 --seed 42 --output_dir output
Arguments
--data {dataset_name}: Specify the dataset folder name located underdata/.--gpu {int}: GPU device number. If no GPU is available or you wish to run on CPU, set--gputo a non-existent GPU ID (e.g.,--gpu 99), and it will default to CPU.--seed {int}: Random seed for reproducibility.--output_dir {path}: Directory to store the output hyperedge predictions and results.--preserved: Optional flag. If set, the pipeline will run in "preserved" mode usingtrain_dup.txtandtest_dup.txt. If omitted, the pipeline runs in "reduced" mode usingtrain.txtandtest.txt.
Examples
Reduced mode (default):
python main.py --data hschool --gpu 0 --seed 123 --output_dir output
Preserved mode:
python main.py --data hschool --gpu 0 --seed 123 --output_dir output --preserved
In these examples, the code will:
- Load and preprocess the graph data.
- Extract features and prepare a training dataset.
- Train a classifier network with the best parameters specified in
params.py. - Use the trained classifier to reconstruct hyperedges in the test graph.
- Save the reconstructed hyperedges to
output/reconstructed_hyp_reduced/{dataset_name}_{seed}.txt(in reduced mode) oroutput/reconstructed_hyp_preserved/{dataset_name}_{seed}.txt(in preserved mode).
Interpreting the Results
- Output Files: The final reconstructed hyperedges are stored as comma-separated node IDs per line.
- Evaluation Metrics: The code prints evaluation metrics such as Jaccard similarity and multiset Jaccard similarity during execution. These metrics help assess the quality of hyperedge reconstruction relative to the ground truth.
- Performance & Reproducibility: By setting the random seed (
--seed) and controlling hyperparameters throughparams.py, you can reproduce experimental results reported in the associated research paper.
Extending and Customizing
- Modify
params.pyto add or change hyperparameters for different datasets. - Adjust or add dataset loaders in
utils/data_processing.pyif your input format differs. - Add new evaluation metrics in
utils/evaluation.py.
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
