[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url] [![MIT License][license-shield]][license-url] [![LinkedIn][linkedin-shield]][linkedin-url]

<br /> <p align="center"> <a href="https://github.com/MachineLearningJournalClub/LearningNLP"> <img src="img/logos/logo_mljc.png" alt="Logo" width="120" height="120"> </a> <h1 align="center">Learning NLP</h1> <h3 align="center">Some Tutorials and in depth analysis of Natural Language Processing (NLP) techniques and applied NLP</h3> <p align="center"> <br /> <a href="https://github.com/MachineLearningJournalClub/LearningNLP"><strong>Explore the docs »</strong></a> <br /> <br /> <a href="https://github.com/MachineLearningJournalClub/LearningNLP">View Demo</a> · <a href="https://github.com/MachineLearningJournalClub/LearningNLP/issues">Report Bug</a> · <a href="https://github.com/MachineLearningJournalClub/LearningNLP/pulls">Request Feature</a> </p> </p>  <details open="open"> <summary><h2 style="display: inline-block">Table of Contents</h2></summary> <ol> <li> <a href="#about-the-project">About The Project</a> <ul> <li><a href="#built-with">Built With</a></li> </ul> </li> <li> <a href="#getting-started">Getting Started</a> <ul> <li><a href="#prerequisites">Prerequisites</a></li> <li><a href="#tutorial-1">Tutorial 1</a></li> <li><a href="#tutorial-2">Tutorial 2</a></li> <li><a href="#tutorial-3">Tutorial 3</a></li> <li><a href="#tutorial-4">Tutorial 4</a></li> <li><a href="#tutorial-5">Tutorial 5</a></li> </ul> </li> <li><a href="#roadmap">Roadmap</a></li> <li><a href="#contributing">Contributing</a></li> <li><a href="#license">License</a></li> <li><a href="#contact">Contact</a></li> <li><a href="#acknowledgements">Acknowledgements</a></li> </ol> </details>

About The Project

ADD PROJECT DESCRIPTION + TWO LINES ABOUT MLJC

Built With

Much Love :two_hearts:

Getting Started

You can either get a local copy by downloading this repo or either use Google Colaboratory by copy-pasting the link of the notebook (.ipynb file) of your choice.

Prerequisites (Local Version)

Install Miniconda

Please go to the Anaconda website. Download and install the latest Miniconda version for Python 3.8 for your operating system.

wget <http:// link to miniconda>
sh <miniconda*.sh>

Download This Repo

git clone https://github.com/MachineLearningJournalClub/LearningNLP

Setup Conda Environment

IN THE END WE CAN SETUP A CONDA ENVIRONMENT AND EXPORT REQUIREMENTS (NEEDED LIBRARIES)

Change directory (cd) into the LearningNLP folder, then type:

# cd LearningNLP
conda env create -f environment.yml
source activate LNLP

Tutorial 1

Topics

Sentiment Analysis with Logistic Regression
Sentiment Analysis with Naive Bayes
Word Vectorizing (CountVectorizer in Scikit-learn)
Some Explainability Methods

Notebook

Dataset: ArXiv from Kaggle
Preprocessing: pandas, nltk, gensim
Binary classification: Scikit-learn's CountVectorizer + TfidfTransformer
Explainability Methods: LIME, SHAP

Useful references for explainibility methods:
- LIME, Why Should I Trust You?": Explaining the Predictions of Any Classifier
- SHAP, A Unified Approach to Interpreting Model Predictions
- Adversarial attacks (have you heard of?), i.e. how to fool algorithms --> Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods
Open Questions for you:
- How to deal with multiclass problems?
- Try to develop binary classification with abstracts instead of titles
- Try to develop the same pipeline with spaCy

Tutorial 2

Topics

Bias & Fairness in NLP (Ethics and Machine Learning)
Gender Framing (in Political Tweets)
Political Party Prediction
Topic Modeling - Latent Dirichlet Allocation (LDA)

Slides

We'd like to introduce some ethical concerns in ML and especially in NLP, the idea is to start a long-term project directed towards Bias & Fairness in Machine Learning, i.e. intrinsic problems in our data can create inequalities in the real world (Have you watched "Coded Bias" on Netflix?)

Notebook

Dataset: we created a dataset by scraping tweets from some US politicians
Preprocessing: pandas, nltk, gensim
Binary classification: Scikit-learn's CountVectorizer + TfidfTransformer
Topic Modeling by employing Latent Dirichlet Allocation (LDA) + visualization. Some educational contents for LDA: L. Serrano part 1 on LDA, L. Serrano part 2 How to train LDA

Tutorial 3

In the two following notebooks we are going to focus on a Kaggle competition, namely: the CommonLit Readability Prize

Tutorial 3.1

Topics

Exploratory Data Analysis

Tutorial 3.2

You can directly run it on Kaggle

Topics

Pretrained Word2Vec model, feature extraction
Dimensionality Reduction and visualization with UMAP
Naive Word2Vec Augmentation

Tutorial 4

Topics

Global Vectors for word representations (GloVe), Stanford NLP
Fasttext, skipgrams vs CBOWs
Bias in Word Embeddings (Gender + Ethnic Stereotypes) with WEFE
Bias in Word Embeddings: What causes it?

Possible Ideas:

Understanding Bias in Word Embeddings, ICML paper + code
Employing The Word Embedding Fairness Evaluation Framework (WEFE): WEAT, (RIPA?)
Debiasing Word Embeddings, Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings, code
Biasing a simple model: how can we deliberately bias our model by injecting biased information into our model? What can we learn from this? How is this thing useful for debiasing purposes?

Tutorial 5

In the two following notebook we are going to focus on a Kaggle competition, namely: the CommonLit Readability Prize

Topics

Data Augmentation

Tutorial 6

In the following notebooks (in this Github repo) we outlined our solution for the CommonLit Readibility Prize

Topics

Finetuning Sentence Transformers models (Roberta family) in PyTorch
Possible strategies for data augmentation

Roadmap

See the open issues for a list of proposed features (and known issues).

LearningNLP

Install / Use

README

About The Project

Built With

Getting Started

Prerequisites (Local Version)

Tutorial 1

Topics

Notebook

Tutorial 2

Topics

Slides

Notebook

Tutorial 3

Tutorial 3.1

Topics

Tutorial 3.2

Topics

Tutorial 4

Topics

Possible Ideas:

Tutorial 5

Topics

Tutorial 6

Topics

Roadmap