AliceMind
ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
Install / Use
/learn @alibaba/AliceMindREADME
AliceMind
AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab
This repository provides pre-trained encoder-decoder models and its related optimization techniques developed by Alibaba's MinD (Machine IntelligeNce of Damo) Lab.
The family of AliceMind:
- Pre-trained Models:
- Release the first multimodal large language model for enhancing LLM and MLLM through modal collaboration: mPLUG-Owl2(
CVPR 2024) - Release the first ocr-free multimodal large language model for universal document understanding: mPLUG-DocOwl(
EMNLP 2023) - Release the first and largest public Chinese Video-language pretraining dataset and benchmarks: Youku-mPLUG, and the Chinese video large language model named mPLUG-video
- A new training paradigm with a modularized design for large multi-modal language models: mPLUG-Owl
- Large-scale Chinese open-domain dialogue system for digital human: ChatPLUG
- A Modularized Multi-modal Foundation Model Across Text, Image and Video: mPLUG-2(
ICML 2023) - Large-scale vision-language understanding and generation model: mPLUG(
EMNLP 2022) - Large-scale chinese understanding and generation model: PLUG
- Pre-training table model: SDCUP (
Under Review) - Chinese language understanding model with multi-granularity inputs: LatticeBERT (
NAACL 2021) - Structural language model: StructuralLM (
ACL 2021) - Cross-modal language model: StructVBERT (
CVPR 2020 VQA Challenge Runner-up) - Cross-lingual language model: VECO (
ACL 2021) - Generative language model: PALM (
EMNLP 2020) - Language understanding model: StructBERT (
ICLR 2020)
- Release the first multimodal large language model for enhancing LLM and MLLM through modal collaboration: mPLUG-Owl2(
- Fine-tuning Methods:
- Parameter-Efficient Sparsity methods PST (
IJCAI 2022) - Effective and generalizable fine-tuning method ChildTuning (
EMNLP 2021)
- Parameter-Efficient Sparsity methods PST (
- Model Compression:
- Language model compression methods ContrastivePruning (
AAAI 2022)
- Language model compression methods ContrastivePruning (
News
- November 9, 2023: mPLUG-Owl2, the first multimodal large language model for enhancing LLM and MLLM through modal collaboration, were accepted by CVPR 2024.
- July 7, 2023: mPLUG-DocOwl, the first ocr-free multimodal large language model for universal document understanding, were accepted by EMNLP 2023.
- June 8, 2023: Youku-mPLUG, release the first and largest public Chinese Video-language pretraining dataset and benchmarks, and the Chinese video large language model named mPLUG-video.
- April 27, 2023: mPLUG-Owl, a new training paradigm with a modularized design for large multi-modal language models released.
- April 25, 2023: mPLUG-2 were accepted by ICML 2023.
- April 16, 2023: ChatPLUG, the Chinese open-domain dialogue system for digital human applications released.
- October, 2022: mPLUG were accepted by EMNLP 2022.
- May, 2022: PST were accepted by IJCAI 2022.
- April, 2022: The SOFA modeling toolkit released which supports models&techs standard code and the direct use of them in transformers!
- December, 2021: ContrastivePruning were accepted by AAAI 2022.
- October, 2021: ChildTuning were accepted by EMNLP 2021.
- September, 2021: The first Chinese pre-training table model SDCUP released!
- May, 2021: VECO and StructuralLM were accepted by ACL 2021.
- March, 2021: AliceMind released!
Pre-trained Models
-
mPLUG-Owl (April 27, 2023): a new training paradigm with a modularized design for large multi-modal language models. Learns visual knowledge while support multi-turn conversation consisting of different modalities. Observed abilities such as multi-image correlation and scene text understanding, vision-based document comprehension. Release a visually-related instruction evaluation set OwlEval. mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
-
ChatPLUG (April 16, 2023): a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format. Different from other open-domain dialogue models that focus on large-scale pre-training and scaling up model size or dialogue corpus, we aim to build a powerful and practical dialogue system for digital human with diverse skills and good multi-task generalization by internet-augmented instruction tuning. ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human
-
mPLUG (September 1, 2022): large-scale pre-trained model for vision-language understanding and generation. mPLUG is pre-trained end-to-end on large scale image-text pairs with both discriminative and generative objectives. It achieves state-of-the-art results on a wide range of vision-language downstream tasks, including image-captioning, image-text retrieval, visual grounding and visual question answering. mPLUG: Effective Multi-Modal Learning by Cross-Modal Skip Connections(
EMNLP 2022) -
PLUG (September 1, 2022): large-scale chinese pre-trained model for understanding and generation. PLUG (27B) is a large-scale chinese pre-training model for language understanding and generation. The training of PLUG is two-stage, the first stage is a 24-layer StructBERT encoder, and the second stage is a 24-6-layer PALM encoder-decoder.
-
SDCUP (September 6, 2021): pre-trained models for table understanding. We design a schema dependency pre-training objective to impose the desired inductive bias into the learned representations for table pre-training. We further propose a schema-aware curriculum learning approach to alleviate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner. The experiment results on SQUALL and Spider demonstrate the effectiveness of our pre-training objective and curriculum in comparison to a variety of baselines. "SDCUP: Schema Dependency Enhanced Curriculum Pre-Training for Table Semantic Parsing" (
Under Review) -
LatticeBERT (March 15, 2021): we propose a novel pre-training paradigm for Chinese — Lattice-BERT which explicitly incorporates word representations with those of characters, thus can model a sentence in a multi-granularity manner. "Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models" (
NAACL 2021) -
StructuralLM (March 15, 2021): pre-trained models for document-image understanding. We propose a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned documents. The pre-trained StructuralLM achieves new state-of-the-art results in different types of downstream tasks. "StructuralLM: Structural Pre-training for Form Understanding" (
ACL 2021) -
StructVBERT (March 15, 2021): pre-trained models for vision-language understanding. We propose a new single-stream visual-linguistic pre-training scheme by leveraging multi-stage progressive pre-training and multi-task learning. StructVBERT obtained the 2020 VQA Challenge Runner-up award, and SOTA result on VQA 2020 public Test-standard benchmark (June 2020). "Talk Slides" (
CVPR 2020 VQA Challenge Runner-up). -
VECO v0 (March 15, 2021): pre-trained models for cross-lingual (x) natural language understanding (x-NLU) and generation (x-NLG). VECO (v0) achieves the new SOTA results on various cross-lingual understanding tasks of the XTREME benchmark, covering text classification, sequence labeling, question answering, and sentence retrieval. For cross-lingual generation tasks, it also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1~2 BLEU. “[VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generati
