SkillAgentSearch skills...

LearningNLP

Some Tutorials & in depth analysis of NLP's algorithms with an ethical flavour

Install / Use

/learn @MachineLearningJournalClub/LearningNLP
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

<!-- PROJECT SHIELDS --> <!-- *** I'm using markdown "reference style" links for readability. *** Reference links are enclosed in brackets [ ] instead of parentheses ( ). *** See the bottom of this document for the declaration of the reference variables *** for contributors-url, forks-url, etc. This is an optional, concise syntax you may use. *** https://www.markdownguide.org/basic-syntax/#reference-style-links -->

[![Contributors][contributors-shield]][contributors-url] [![Forks][forks-shield]][forks-url] [![Stargazers][stars-shield]][stars-url] [![Issues][issues-shield]][issues-url] [![MIT License][license-shield]][license-url] [![LinkedIn][linkedin-shield]][linkedin-url]

<!-- PROJECT LOGO --> <br /> <p align="center"> <a href="https://github.com/MachineLearningJournalClub/LearningNLP"> <img src="img/logos/logo_mljc.png" alt="Logo" width="120" height="120"> </a> <h1 align="center">Learning NLP</h1> <h3 align="center">Some Tutorials and in depth analysis of Natural Language Processing (NLP) techniques and applied NLP</h3> <p align="center"> <br /> <a href="https://github.com/MachineLearningJournalClub/LearningNLP"><strong>Explore the docs »</strong></a> <br /> <br /> <a href="https://github.com/MachineLearningJournalClub/LearningNLP">View Demo</a> · <a href="https://github.com/MachineLearningJournalClub/LearningNLP/issues">Report Bug</a> · <a href="https://github.com/MachineLearningJournalClub/LearningNLP/pulls">Request Feature</a> </p> </p> <!-- TABLE OF CONTENTS --> <details open="open"> <summary><h2 style="display: inline-block">Table of Contents</h2></summary> <ol> <li> <a href="#about-the-project">About The Project</a> <ul> <li><a href="#built-with">Built With</a></li> </ul> </li> <li> <a href="#getting-started">Getting Started</a> <ul> <li><a href="#prerequisites">Prerequisites</a></li> <li><a href="#tutorial-1">Tutorial 1</a></li> <li><a href="#tutorial-2">Tutorial 2</a></li> <li><a href="#tutorial-3">Tutorial 3</a></li> <li><a href="#tutorial-4">Tutorial 4</a></li> <li><a href="#tutorial-5">Tutorial 5</a></li> </ul> </li> <li><a href="#roadmap">Roadmap</a></li> <li><a href="#contributing">Contributing</a></li> <li><a href="#license">License</a></li> <li><a href="#contact">Contact</a></li> <li><a href="#acknowledgements">Acknowledgements</a></li> </ol> </details> <!-- ABOUT THE PROJECT -->

About The Project

ADD PROJECT DESCRIPTION + TWO LINES ABOUT MLJC

Built With

  • Much Love :two_hearts:
<!-- GETTING STARTED -->

Getting Started

You can either get a local copy by downloading this repo or either use Google Colaboratory by copy-pasting the link of the notebook (.ipynb file) of your choice.

Prerequisites (Local Version)

Install Miniconda

Please go to the Anaconda website. Download and install the latest Miniconda version for Python 3.8 for your operating system.

wget <http:// link to miniconda>
sh <miniconda*.sh>

Download This Repo

git clone https://github.com/MachineLearningJournalClub/LearningNLP

Setup Conda Environment

IN THE END WE CAN SETUP A CONDA ENVIRONMENT AND EXPORT REQUIREMENTS (NEEDED LIBRARIES)

Change directory (cd) into the LearningNLP folder, then type:

# cd LearningNLP
conda env create -f environment.yml
source activate LNLP

Tutorial 1

Topics

  • Sentiment Analysis with Logistic Regression
  • Sentiment Analysis with Naive Bayes
  • Word Vectorizing (CountVectorizer in Scikit-learn)
  • Some Explainability Methods

Notebook


Tutorial 2

Topics

  • Bias & Fairness in NLP (Ethics and Machine Learning)
  • Gender Framing (in Political Tweets)
  • Political Party Prediction
  • Topic Modeling - Latent Dirichlet Allocation (LDA)

Slides

We'd like to introduce some ethical concerns in ML and especially in NLP, the idea is to start a long-term project directed towards Bias & Fairness in Machine Learning, i.e. intrinsic problems in our data can create inequalities in the real world (Have you watched "Coded Bias" on Netflix?)

Notebook


Tutorial 3

In the two following notebooks we are going to focus on a Kaggle competition, namely: the CommonLit Readability Prize

Tutorial 3.1

Topics

  • Exploratory Data Analysis

Tutorial 3.2

You can directly run it on Kaggle

Topics

  • Pretrained Word2Vec model, feature extraction
  • Dimensionality Reduction and visualization with UMAP
  • Naive Word2Vec Augmentation

Tutorial 4

Topics

Possible Ideas:


Tutorial 5

In the two following notebook we are going to focus on a Kaggle competition, namely: the CommonLit Readability Prize

Topics

  • Data Augmentation

Tutorial 6

In the following notebooks (in this Github repo) we outlined our solution for the CommonLit Readibility Prize

Topics

  • Finetuning Sentence Transformers models (Roberta family) in PyTorch
  • Possible strategies for data augmentation

<!-- ROADMAP -->

Roadmap

See the open issues for a list of proposed features (and known issues).

View on GitHub
GitHub Stars14
CategoryEducation
Updated12mo ago
Forks5

Languages

Jupyter Notebook

Security Score

72/100

Audited on Apr 8, 2025

No findings