Mmocr
OpenMMLab Text Detection, Recognition and Understanding Toolbox
Install / Use
/learn @open-mmlab/MmocrREADME
<a href="https://console.tiyaro.ai/explore?q=mmocr&pub=mmocr"> <img src="https://tiyaro-public-docs.s3.us-west-2.amazonaws.com/assets/try_on_tiyaro_badge.svg"></a>
📘Documentation | 🛠️Installation | 👀Model Zoo | 🆕Update News | 🤔Reporting Issues
</div> <div align="center">English | 简体中文
</div> <div align="center"> <a href="https://openmmlab.medium.com/" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/219255827-67c1a27f-f8c5-46a9-811d-5e57448c61d1.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://discord.gg/raweFPmdzG" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/218347213-c080267f-cbb6-443e-8532-8e1ed9a58ea9.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://twitter.com/OpenMMLab" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/218346637-d30c8a0f-3eba-4699-8131-512fb06d46db.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://www.youtube.com/openmmlab" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/218346691-ceb2116a-465a-40af-8424-9f30d2348ca9.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://space.bilibili.com/1293512903" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/219026751-d7d14cce-a7c9-4e82-9942-8375fca65b99.png" width="3%" alt="" /></a> <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" /> <a href="https://www.zhihu.com/people/openmmlab" style="text-decoration:none;"> <img src="https://user-images.githubusercontent.com/25839884/219026120-ba71e48b-6e94-4bd4-b4e9-b7d175b5e362.png" width="3%" alt="" /></a> </div>Latest Updates
The default branch is now main and the code on the branch has been upgraded to v1.0.0. The old main branch (v0.6.3) code now exists on the 0.x branch. If you have been using the main branch and encounter upgrade issues, please read the Migration Guide and notes on Branches .
v1.0.0 was released in 2023-04-06. Major updates from 1.0.0rc6 include:
- Support for SCUT-CTW1500, SynthText, and MJSynth datasets in Dataset Preparer
- Updated FAQ and documentation
- Deprecation of file_client_args in favor of backend_args
- Added a new MMOCR tutorial notebook
To know more about the updates in MMOCR 1.0, please refer to What's New in MMOCR 1.x, or Read Changelog for more details!
Introduction
MMOCR is an open-source toolbox based on PyTorch and mmdetection for text detection, text recognition, and the corresponding downstream tasks including key information extraction. It is part of the OpenMMLab project.
The main branch works with PyTorch 1.6+.
<div align="center"> <img src="https://user-images.githubusercontent.com/24622904/187838618-1fdc61c0-2d46-49f9-8502-976ffdf01f28.png"/> </div>Major Features
-
Comprehensive Pipeline
The toolbox supports not only text detection and text recognition, but also their downstream tasks such as key information extraction.
-
Multiple Models
The toolbox supports a wide variety of state-of-the-art models for text detection, text recognition and key information extraction.
-
Modular Design
The modular design of MMOCR enables users to define their own optimizers, data preprocessors, and model components such as backbones, necks and heads as well as losses. Please refer to Overview for how to construct a customized model.
-
Numerous Utilities
The toolbox provides a comprehensive set of utilities which can help users assess the performance of models. It includes visualizers which allow visualization of images, ground truths as well as predicted bounding boxes, and a validation tool for evaluating checkpoints during training. It also includes data converters to demonstrate how to convert your own data to the annotation files which the toolbox supports.
Installation
MMOCR depends on PyTorch, MMEngine, MMCV and MMDetection. Below are quick steps for installation. Please refer to Install Guide for more detailed instruction.
conda create -n open-mmlab python=3.8 pytorch=1.10 cudatoolkit=11.3 torchvision -c pytorch -y
conda activate open-mmlab
pip3 install openmim
git clone https://github.com/open-mmlab/mmocr.git
cd mmocr
mim install -e .
Get Started
Please see Quick Run for the basic usage of MMOCR.
Model Zoo
Supported algorithms:
<details open> <summary>BackBone</summary>- [x] oCLIP (ECCV'2022)
- [x] DBNet (AAAI'2020) / DBNet++ (TPAMI'2022)
- [x] Mask R-CNN (ICCV'2017)
- [x] PANet (ICCV'2019)
- [x] PSENet (CVPR'2019)
- [x] TextSnake (ECCV'2018)
- [x] DRRG (CVPR'2020)
- [x] FCENet (CVPR'2021)
- [x] ABINet (CVPR'2021)
- [x] ASTER (TPAMI'2018)
- [x] CRNN (TPAMI'2016)
- [x] MASTER (PR'2021)
- [x] NRTR (ICDAR'2019)
- [x] RobustScanner (ECCV'2020)
- [x] SAR (AAAI'2019)
- [x] SATRN (CVPR'2020 Workshop on Text and Documents in the Deep Learning Era)
- [x] SVTR (IJCAI'2022)
- [x] SDMG-R (ArXiv'2021)
Please refer to model_zoo for more details.
Projects
Here are some implementations of SOTA models and solutions built on MMOCR, which are supported and maintained by community users. These projects demonstrate the best practices based on MMOCR for research and product development. We welcome and appreciate all the contributions to OpenMMLab ecosystem.
Contributing
We appreciate all contributions to improve MMOCR. Please refer to CONTRIBUTING.md for the contributing guidelines.
Acknowledgement
MMOCR is an open-source project that is contributed by researchers and engineers from various colleges and companies. We appreciate all the contributors who implement their methods or add new features, as well as users who give valuable fe
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
399Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
workshop-rules
Materials used to teach the summer camp <Data Science for Kids>
