NeuralAmr
Sequence-to-sequence models for AMR parsing and generation
Install / Use
/learn @sinantie/NeuralAmrREADME
Neural AMR
Torch implementation of sequence-to-sequence models for AMR parsing and generation based on the Harvard NLP framework. We provide the code for pre-processing, anonymizing, de-anonymizing, training and predicting from and to AMR. We are also including pre-trained models on 20M sentences from Gigaword and fine-tuned on the AMR LDC2015E86: DEFT Phase 2 AMR Annotation R1 Corpus. You can find all the details in the following paper:
- Neural AMR: Sequence-to-Sequence Models for Parsing and Generation. (Ioannis Konstas, Srinivasan Iyer, Mark Yatskar, Yejin Choi, Luke Zettlemoyer. ACL 2017)
Requirements
The pre-trained models only run on GPUs, so you will need to have the following installed:
- Latest NVIDIA driver
- CUDA 8 Toolkit
- cuDNN (The NVIDIA CUDA Deep Neural Network library)
- Torch
Installation
- Install the following packages for Torch using
luarocks:
nn nngraph cutorch cunn cudnn
- Install the Deepmind version of
torch-hdf5from here.
(Only for training models)
- Install
cudnn.torchfrom here. - Install the following packages for Python 2.7:
pip install numpy h5py
(Only for downloading the pretrained models)
-
Download and unzip the models from here
-
Export the cuDNN library path (you can add it to your .bashrc or .profile):
export CUDNN_PATH="path_to_cudnn/lib64/libcudnn.so"
- Or instead of the previous step you can copy the cuDNN library files into /usr/local/cuda/lib64/ or to the corresponding folders in the CUDA directory.
Usage
AMR Generation
You can generate text from AMR graphs using our pre-trained model on 20M sentences from Gigaword, in two different ways:
- By running an interactive tool that reads input from
stdin:
./generate_amr_single.sh [stripped|full|anonymized]
- By running the prediction on a single file, which contains an AMR graph per line:
./generate_amr.sh input_file [stripped|full|anonymized]
You can optionally provide an argument that tells the system to accept either full AMR as described in the annotation guidelines, or a stripped version, which removes variables, senses, parentheses from leaves, and assumes a simpler markup for Named Entities, date mentions, and numbers. You can also provide the input in anonymized format, i.e., similar to stripped but with Named Entities, date mentions, and numbers anonymized.
An example using the full format:
(h / hold-04 :ARG0 (p2 / person :ARG0-of (h2 / have-org-role-91 :ARG1 (c2 / country :name (n3 / name :op1 "United" :op2 "States")) :ARG2 (o / official))) :ARG1 (m / meet-03 :ARG0 (p / person :ARG1-of (e / expert-01) :ARG2-of (g / group-01))) :time (d2 / date-entity :year 2002 :month 1) :location (c / city :name (n / name :op1 "New" :op2 "York")))
The same example using the stripped format:
hold :ARG0 ( person :ARG0-of ( have-org-role :ARG1 (country :name "United States") :ARG2 official)) :ARG1 (meet :ARG0 (person :ARG1-of expert :ARG2-of group)) :time (date-entity :year 2002 :month 1) :location (city :name "New York" )
The same example using the anonymized format:
hold :ARG0 ( person :ARG0-of ( have-org-role :ARG1 location_name_0 :ARG2 official ) ) :ARG1 ( meet :ARG0 ( person :ARG1-of expert :ARG2-of group ) ) :time ( date-entity year_date-entity_0 month_date-entity_0 ) :location location_name_1
For full details and more examples, see here.
AMR Parsing
You can also parse text to the corresponding AMR graph, using our pre-trained model on 20M sentences from Gigaword.
Similarly to AMR generation, you can parse text in two ways:
- By running an interactive tool that reads text from
stdin:
./parse_amr_single.sh [text|textAnonymized]
- By running the prediction on a single file, which contains a sentence per line:
./parse_amr.sh input_file [text|textAnonymized]
You can optionally provide an argument to the scripts that inform them to either accept text and perform NE recognition and anonymization on it, or bypass this process entirely (textAnonymized).
Script Options (generate_amr.sh, generate_amr_single.sh, parse_amr.sh, parse_amr_single.sh)
interactive_mode [0,1]: Set0for generating from a file, or1to generate fromstdin.model [str]: The path to the trained model.input_type [stripped|full](AMR Generation only): Setfullfor standard AMR graph input, orstrippedwhich expects AMR graphs with no variables, senses, parentheses from leaves, and assumes a simpler markup for Named Entities (for more details and examples, see here).src_file [str]: The path to the input file that contains AMR graphs, one per line.gpuid [int]: The GPU id number.src_dict, targ_dict [str]: Path to source and target dictionaries. These are usually generated during preprocessing of the corpus. ==Note==:src_dictandtarg_dictpaths need to be reversed when generating text or parsing to AMR.beam [int]: The beam size of the decoder (default is 5).replace_unk [0,1]: Replace unknown words with either the input token that has the highest attention weight, or the word that maps to the input token as provided insrctarg_dict.srctarg_dict [str]: Path to source-target dictionary to replace unknown tokens. Each line should be a source token and its corresponding target token, separated by|||(seeresources/training-amr-nl-alignments.txt).max_sent_l [str]: Maximum sentence length (default is 507, i.e., the longest input AMR graph or sentence (depending on the task) in number of tokens from the dev set). If any of the sequences insrc_fileare longer than this it will error out.
(De-)Anonymization Process
The source code for the whole anonymization/deanonymization pipeline is provided under the java/AmrUtils folder. You can rebuild the code by running the script:
./rebuild_AmrUtils.sh
This should create the executable lib/AmrUtils.jar.
The (de-)anonymization tools are generally controlled using the following shell script command (==Note== that it is automatically being called inside the lua code when parsing/generating, so generally you don't need to deal with it when running the scripts described above). The first argument denotes the specific (de-)anonymization to perform, the second argument specifies whether the input comes either from stdin or from a file, where each input is provided one per line:
./anonDeAnon_java.sh anonymizeAmrStripped|anonymizeAmrFull|deAnonymizeAmr|anonymizeText|deAnonymizeText input_isFile[true|false] input
-
==Note==: In order to anonymize text sentences, you need to run the Stanford NER server first (you can just execute it in the background):
./nerServer.sh&Optionally you can provide a port number as an argument.
There are four main operations you can perform with the tools, namely anonymization of AMR graphs, anonymization of text sentences, deAnonymization of (predicted) sentences, and deAnonymization of (predicted) AMR graphs.:
-
Anonymize an AMR graph (
anonymizeAmrStripped, anonymizeAmrFull) In this case, you provide an input representing a stripped or full AMR graph, and the script outputs the anonymized graph (in the case of full it also strips it down, i.e., removes variable names, instance-of relations, most brackets, and simplifies NEs/dates/number subgraphs of the input), the anonymization alignments (useful for deAnonymizing the corresponding predicted sentence later), and the nodes/edges of the graph in an un-ordered JSON format (useful for visualization tools such as vis.js). The three outputs are delimited using the special character#. For example:./anonDeAnon_java.sh anonymizeAmrFull false "(h / hello :arg1 (p / person :name (n / name :op1 \"John\" :op2 \"Doe\")))"should give the output:
hello :arg1 person_name_0#person_name_0|||name_John_Doe# "nodes":[{"id":1,"label":"hello"},{"id":2,"label":"person"},{"id":3,"label":"name"},{"id":4,"label":"\"John\""},{"id":5,"label":"\"Doe\""}], "edges":[{"from":1,"to":2,"label":"arg1"},{"from":2,"to":3,"label":"name"},{"from":3,"to":4,"label":"op1"},{"from":3,"to":5,"label":"op2"}]Anonymization alignments have the format:
amr-anonymized-token|||type_concatenated-AMR-tokensFinally, multiple anonymization alignments for the same sentence, are tab-delimeted.
-
Anonymize a text sentence (
anonymizeText) Remember that you need to have the NER server running, as explained above. In this example you simply provide the sentence as in input. For example:./anonDeAnon_java.sh anonymizeText false "My name is John Doe"should give the output:
my name is person_name_0#person_name_0|||John DoeNote that the anonymization alignments from text are slightly different than the ones from AMR graphs; the second part is a span of the text separated with space.
-
De-anonymize an AMR graph (
deAnonymizeAmr) In this case, you provide an input representing a stripped AMR graph, as well as the corresonding anonymization alignments provided from a previous run of the script using the ==anonymizeText== option, delimited by#, and the script outputs the de-anonymized AMR graph, as well as the nodes/edges of the graph in an un-ordered JSON format (useful for vi
Related Skills
node-connect
353.3kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.7kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.3kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.3kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
