NaturalLanguageProcessing
Natural Language Procesing
Install / Use
/learn @Nativeatom/NaturalLanguageProcessingREADME
Natural Language Procesing
This repository includes basic concepts of Natural Language Processing, textbooks and blogs of good reputation, popular papers and so on.
This is also the Natural Language Processing part of Machine Learning Resources created by a group of people including jindongwang.
Contributors are welcomed to work together and make it BETTER!
Resource of Textbooks and Lectures
Mathemetical and Statistical Foundation
-
Linear Algebra
-
Matrix Analysis
-
Convex Optimization
Machine Learning
- The Elements of Statistical Learning(ESL) - HTF
- CS228 Probabilistic Graphical Model - Stanford
- 10708 Probabilistic Graphical Model - CMU
Deep Learning
- Deep Learning - Ian Goodfellow, Yoshua Bengio, Aaron Courville
- CS231n Convolutional Neural Networks for Visual Recognition - Stanford
Natural Language Processing
- Foundations of Statistical Natural Language Processing - Chris Manning
- Speech and Language Processing - Daniel Jurafsky and James H. Martin
- 统计学习方法 - 李航
- Advanced Natural Language Processing - MIT
- CS 224n Natural Language Processing with Deep Learning - Stanford
- Deep Learning for NLP at Oxford with Deepmind - Oxford
- 11-747 NN4NLP
- 11-737 Multilingual NLP
- Some Knowledge about Machine Learning
- A list of datasets
Models and Applications
-
Probalistic Graphical Model
- Hidden Markov Model
- Conditional Random Fields
-
Topic Model
- Latent Dirichlet Allocation(paper)
-
Deep Learning Model
- Long Short Term Memory(LSTM) Sepp Hochreiter, 1997
- Interpretation Omer Levy, UWashington, 2018
- Recurrent Neuron Network - Seq2Seq(Tensorflow Tutorial) - Machine Translation Tensorflow implement
- Convolutional Neuron Network
- Attention Model
- Overview(Chinese)
- Generative Adversial Network(GAN)
- Transformer
- Training Tips
- Bidirectional Encoder Representation from Transformers(BERT) Jacob Devlin, Google 2018
- Long Short Term Memory(LSTM) Sepp Hochreiter, 1997
Blog and Tutorials
- Tensorflow implement on RNN and undocumented features
- The Unreasonable Effectiveness of Recurrent Neural Networks
Topics and Tasks
Category of areas is based on tracks in ACL 2018, ACL 2020, EMNLP 2020
Summerization
- Task
- Summerization
- Opinion Summarization
- Evaluation
- Model
- Extractive
- Generative
- Hybrid
- Dataset
- XSum, EMNLP2018 [paper]
- CNN/DailyMail
- NEWSROOM
- Multi-News
- Gigaword
- arXiv
- PubMed
- BIGPATENT
- WikiHow
- Reddit TIFU (long, short)
- AESLC
- BillSum
Embedding
- Model
- Word2Vec
- Pre-trained Embedding
- Glove
- word2vec
- FastText
- Contextual Word Embedding
- ELMo
- GPT
- BERT
- XLNet
- BART
- T-5
Sentimental Analysis and Argument Mining
Name Entity Recognition
Tagging, Chunking
- Task
- Word Segmentation
- Syntactic Parsing
- Model
- Hidden Markov Model (HMM)
- Conditional Random Fields (CRFs)
- Finetuned Language Models
Syntax, Parsing
- Task
- Constituency Parsing
- Dependency Parsing
- Visual Grounded Syntactic Aquisition
- Model
- Dataset
Document Analysis
Sentence-level Semantics
-
Tasks
- Semantic Parsing
- AMR-to-text
- Text-to-AMR
- Table-to-text
- Code Generation
- Semantic Parsing
-
Model
-
Dataset
Semantics: Lexical
- Tasks
- Word Sense Disambiguation
Information Extraction and Text Mining
- Tasks
- Topic Extraction
- Sentimental Extraction
- Aspect Extraction
Machine Translation
- Task
- Machine Translation
- Non-autogressive Machine Translation
- Word-alignment
- Model
- Dataset
- WMT
Text Generation
Text Classification
- Task
- SPAM Classification
- Sentiment Analysis
- Model
- Dataset
Dialogue and Interactive Systems
Question Answering
- Task
- Dataset
- CNN/DailyMail
- SQuAD
- Benchmark: F1-86.967 BERT + Synthetic Self-Training (ensemble) Jan 10, 2019
- RACE
- Benchmark: RACE-83.2 RACEC-M-86.5 RACE-H-81.3 RoBERTa July 2019
Resources and Evaluation
Linguistic Theories and Cognitive Modeling
Multilinguality
- Task
- Code-Switching
- Mutilingual Translation
- Model
- Dataset
Phonology, Morphology and Word Segmentation
Textual Inference
Vision, Robotics, Speech, Multimodal
Language Modeling
- Tasks
- Model
- N-gram
- ELMo, NAACL2018
- GPT
- GPT-2, arXiv2019
- [GPT-3, NeurIPS2020](https
View on GitHub87/100
Security Score
Audited on Sep 12, 2025
No findings
