Last Commit

Graph4NLP

Graph4NLP is an easy-to-use library for R&D at the intersection of Deep Learning on Graphs and Natural Language Processing (i.e., DLG4NLP). It provides both full implementations of state-of-the-art models for data scientists and also flexible interfaces to build customized models for researchers and developers with whole-pipeline support. Built upon highly-optimized runtime libraries including DGL , Graph4NLP has both high running efficiency and great extensibility. The architecture of Graph4NLP is shown in the following figure, where boxes with dashed lines represents the features under development. Graph4NLP consists of four different layers: 1) Data Layer, 2) Module Layer, 3) Model Layer, and 4) Application Layer.

<img src="docs/arch.png" alt="architecture" width="700" /> Figure: Graph4NLP Overall Architecture

<img src="docs/new.png" alt='new' width=30 /> Graph4NLP news

01/20/2022: The v0.5.5 release. Try it out! 09/26/2021: The v0.5.1 release. Try it out! 09/01/2021: Welcome to visit our DLG4NLP website (https://dlg4nlp.github.io/index.html) for various learning resources! 06/05/2021: The v0.4.1 release.

Major Releases

| Releases | Date | Features | | -------- | ---------- | ------------------------------------------------------------ | | v0.5.5 | 2022-01-20 | - Support model.predict API by introducing wrapper functions. - Introduce Three new inference_wrapper functions: classifier_inference_wrapper, generator_inference_wrapper, generator_inference_wrapper_for_tree. - Add the inference and inference_advance examples in each application. - Separate the graph topology and graph embedding process. - Renew all the graph construction functions. - Module graph_embedding is divided into graph_embedding_initialization and graph_embedding_learning. - Unify the parameters in Dataset. We remove the ambiguous parameter graph_type and introduce graph_name to indicate the graph construction method and static_or_dynamic to indicate the static or dynamic graph construction type. - New: The dataset now can automatically choose the default methods (e.g., topology_builder) by only one parameter graph_name. | | v0.5.1 | 2021-09-26 | - Lint the codes - Support testing with users' own data - Fix the bug: The word embedding size was hard-coded in the 0.4.1 version. Now it is equal to "word_emb_size" parameter. - Fix the bug: The build_vocab() is called twice in the 0.4.1 version. - Fix the bug: The two main files of knowledge graph completion example missed the optional parameter "kg_graph" in ranking_and_hits() when resuming training the model. - Fix the bug: We have fixed the preprocessing path error in KGC readme. - Fix the bug: We have fixed embedding construction bug when setting emb_strategy to 'w2v'. | | v0.4.1 | 2021-06-05 | - Support the whole pipeline of Graph4NLP - GraphData and Dataset support |

Quick tour

Graph4nlp aims to make it incredibly easy to use GNNs in NLP tasks (check out Graph4NLP Documentation). Here is an example of how to use the Graph2seq model (widely used in machine translation, question answering, semantic parsing, and various other NLP tasks that can be abstracted as graph-to-sequence problem and has shown superior performance).

We also offer other high-level model APIs such as graph-to-tree models. If you are interested in DLG4NLP related research problems, you are very welcome to use our library and refer to our graph4nlp survey.

from graph4nlp.pytorch.datasets.jobs import JobsDataset
from graph4nlp.pytorch.modules.graph_construction.dependency_graph_construction import DependencyBasedGraphConstruction
from graph4nlp.pytorch.modules.config import get_basic_args
from graph4nlp.pytorch.models.graph2seq import Graph2Seq
from graph4nlp.pytorch.modules.utils.config_utils import update_values, get_yaml_config

# build dataset
jobs_dataset = JobsDataset(root_dir='graph4nlp/pytorch/test/dataset/jobs',
                           topology_builder=DependencyBasedGraphConstruction,
                           topology_subdir='DependencyGraph')  # You should run stanfordcorenlp at background
vocab_model = jobs_dataset.vocab_model

# build model
user_args = get_yaml_config("examples/pytorch/semantic_parsing/graph2seq/config/dependency_gcn_bi_sep_demo.yaml")
args = get_basic_args(graph_construction_name="node_emb", graph_embedding_name="gat", decoder_name="stdrnn")
update_values(to_args=args, from_args_list=[user_args])
graph2seq = Graph2Seq.from_args(args, vocab_model)

# calculation
batch_data = JobsDataset.collate_fn(jobs_dataset.train[0:12])

scores = graph2seq(batch_data["graph_data"], batch_data["tgt_seq"])  # [Batch_size, seq_len, Vocab_size]

Overview

Our Graph4NLP computing flow is shown as below.

Graph4NLP Models and Applications

Graph4NLP models

Graph2Seq: a general end-to-end neural encoder-decoder model that maps an input graph to a sequence of tokens.
Graph2Tree: a general end-to-end neural encoder-decoder model that maps an input graph to a tree structure.

Graph4NLP applications

We provide a comprehensive collection of NLP applications, together with detailed examples as follows:

Text classification: to give the sentence or document an appropriate label.
Semantic parsing: to translate natural language into a machine-interpretable formal meaning representation.
Neural machine translation: to translate a sentence in a source language to a different target language.
summarization: to generate a shorter version of input texts which could preserve major meaning.
KG completion: to predict missing relations between two existing entities in konwledge graphs.
Math word problem solving: to automatically solve mathematical exercises that provide background information about a problem in easy-to-understand language.
Name entity recognition: to tag entities in input texts with their corresponding type.
Question generation: to generate an valid and fluent question based on the given passage and target answer (optional).

Performance

Environment: torch 1.8, ubuntu 16.04 with 2080ti GPUs

| Task | Dataset | GNN Model | Graph construction | Evaluation | Performance | |----------------------------|:--------------------------------:|:-------------------:|----------------------------------------------|--------------------|:-----------------------------:| | Text classification | TRECT CAirline CNSST | GAT | Dependency Constituency Dependency | Accuracy | 0.948 0.785 0.538 | | Semantic Parsing | JOBS | SAGE | Constituency | Execution accuracy | 0.936 | | Question generation | SQuAD | GGNN | De

Graph4nlp

Install / Use

README