Panlingo
Collection of language detection libraries for .NET: FastText, CLD2, CLD3, MediaPipe, Lingua, Whatlang
Install / Use
/learn @gluschenko/PanlingoREADME
Panlingo
Overview
Welcome to the Panlingo repository! 🚀
This project presents a comprehensive collection of language identification libraries for .NET. Its primary purpose is to bring popular language identification models to the .NET ecosystem, allowing developers to seamlessly integrate language detection functionality into their applications.
Libraries
| Library | NuGet Package |
| :------ | :------------ |
| Panlingo.LanguageIdentification.CLD2 | |
| Panlingo.LanguageIdentification.CLD3 |
|
| Panlingo.LanguageIdentification.FastText |
|
| Panlingo.LanguageIdentification.Whatlang |
|
| Panlingo.LanguageIdentification.MediaPipe |
|
| Panlingo.LanguageIdentification.Lingua |
|
| Panlingo.LanguageCode |
|
Contents
Models
| Model | Authors | License | Original source code | Wrapper docs | | :------------ | :------------------- | :--------- | :------------------- | :------------------- | | CLD2 | Google, Inc. | Apache-2.0 | @CLD2Owners/cld2 | link | | CLD3 | Google, Inc. | Apache-2.0 | @google/cld3 | link | | FastText | Meta Platforms, Inc. | MIT | @facebookresearch/fastText | link | | Whatlang | Serhii Potapov | MIT | @greyblake/whatlang-rs | link | | MediaPipe | Google, Inc. | Apache-2.0 | @google-ai-edge/mediapipe | link | | Lingua | Peter M. Stahl | Apache-2.0 | @pemistahl/lingua-rs | link |
Key concerns
- Zero-dependency development.
- The original code of libraries (CLD2, CLD3, FastText, MediaPipe) is used as submodules without additional significant modifications or improvements (except for a small monkey-patching 😂). Third-party code is not included into this repository.
- Preserve the original library behavior without breaking changes.
Features
| Feature | CLD2 | CLD3 | FastText* | Whatlang | MediaPipe** | Lingua | | :----------------------------- | :-------: | :------------: | :----------------: | :------: | :------------: | :------: | | Single language prediction | Yes | Yes | Yes | Yes | Yes | Yes | | Multi language prediction | Yes | Yes | Yes | No | Yes | Yes | | Supported languages | 83 | 107 | 176 or 217 | 69 | 110 | 75 | | Unknown language detection | Yes | Yes | No | No | Yes | No | | Algorithm | quadgrams | neural network | neural network | trigrams | neural network | trigrams | | Script detection | No | No | Yes (only lid218e) | Yes | No | No | | Written in | C++ | C++ | C++ | Rust | C++ | Rust |
* When using these models: lid176, lid218e
** When using MediaPipe Language Detector
Platform support
| Model | Linux (x86_64) | Linux (arm64) | Windows (x86_64) | Windows (arm64) | macOS (x86_64) | macOS (arm64) | | :------------ | :----------------: | :----------------: | :----------------: | :----------------: | :----------------: | :----------------: | | CLD2 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | CLD3 | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | FastText | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | Whatlang | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | :white_check_mark: | | MediaPipe | :white_check_mark: | :white_check_mark: | :white_check_mark: | :x: | :x: | :white_check_mark: | | Lingua | :white_check_mark: | :x: | :white_check_mark: | :x: | :x: | :white_check_mark: |
:white_check_mark: — Full support | :x: — No support | :construction: — Under research
Contributing
We welcome contributions from developers of all skill levels. Whether you're fixing a bug, adding a new feature, or improving documentation, we appreciate your help in making this project better.
Getting Started
To get started with contributing, follow these simple steps:
-
Clone the Repository
First, clone the repository to your local machine with the following command:
git clone --recurse-submodules --remote-submodules https://github.com/gluschenko/panlingo.git -
Create a Branch
Before you start making changes, create a new branch to keep your work organized. Use a descriptive name for your branch to make it easy to understand its purpose:
git checkout -b feature/your-feature-name -
Make Changes
Now, you can make changes to the codebase. Please ensure your code follows our project's coding standards and includes relevant tests if applicable.
-
Commit Your Changes
Once you've made your changes, commit them with a clear and informative commit message:
git add . git commit -m "Add description of your changes" -
Push Your Changes
Push your branch to the remote repository:
git push origin feature/your-feature-name -
Build
Each library project in the solution has four configurations:
ReleaseLinuxOnly,DebugLinuxOnly,Release, andDebug.- The
ReleaseLinuxOnlyandDebugLinuxOnlyconfigurations are for building on a local Linux or Windows machine (WSL is supported as well). It produces native binaries only for Linux. - The
ReleaseandDebugconfigurations are intended for cross-platform builds, which are only supported in CI/CD environments like GitHub Actions.
Here's how you can build the projects on a local Linux machine.
Requirements:
- Windows 10 or higher.
- WSL2 set up for simulating a Linux environment.
- Docker Desktop for container management.
- 45GB+ of free disk space for storing Docker images.
- Modern CPU with AVX support for optimal performance.
To build the entire solution:
cd src dotnet build -c ReleaseLinuxOnlyTo build a specific library:
cd src/LanguageIdentification.FastText.Native dotnet build -c ReleaseLinuxOnly cd src/LanguageIdentification.FastText dotnet build -c ReleaseLinuxOnly - The
-
Test
To execute the test project on Linux or Windows, follow these instructions:
-
Linux:
For Linux systems, access the test project's directory and execute:
cd src/LanguageIdentification.Tests dotnet test -c ReleaseLinuxOnly -
Windows:
On Windows, you can utilize WSL to run the test project. Do so by:
cd src/LanguageIdentification.Tests wsl -d Ubuntu -e bash -c "dotnet test -c ReleaseLinuxOnly" -
Docker:
Also you can run test project inside Docker-container on every supported platform (see run-tests.ps1 and run-tests.sh):
cd src docker build --file test.Dockerfile -t panlingo-test-image . docker container create --name panlingo-test-runner -v "${PWD}:/src" -i panlingo-test-image
-
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
mentoring-juniors
Community-contributed instructions, agents, skills, and configurations to help you make the most of GitHub Copilot.
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
