AliceMind

AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

This repository provides pre-trained encoder-decoder models and its related optimization techniques developed by Alibaba's MinD (Machine IntelligeNce of Damo) Lab.

The family of AliceMind:

Pre-trained Models:
- Release the first multimodal large language model for enhancing LLM and MLLM through modal collaboration: mPLUG-Owl2(CVPR 2024)
- Release the first ocr-free multimodal large language model for universal document understanding: mPLUG-DocOwl(EMNLP 2023)
- Release the first and largest public Chinese Video-language pretraining dataset and benchmarks: Youku-mPLUG, and the Chinese video large language model named mPLUG-video
- A new training paradigm with a modularized design for large multi-modal language models: mPLUG-Owl
- Large-scale Chinese open-domain dialogue system for digital human: ChatPLUG
- A Modularized Multi-modal Foundation Model Across Text, Image and Video: mPLUG-2(ICML 2023)
- Large-scale vision-language understanding and generation model: mPLUG(EMNLP 2022)
- Large-scale chinese understanding and generation model: PLUG
- Pre-training table model: SDCUP (Under Review)
- Chinese language understanding model with multi-granularity inputs: LatticeBERT (NAACL 2021)
- Structural language model: StructuralLM (ACL 2021)
- Cross-modal language model: StructVBERT (CVPR 2020 VQA Challenge Runner-up)
- Cross-lingual language model: VECO (ACL 2021)
- Generative language model: PALM (EMNLP 2020)
- Language understanding model: StructBERT (ICLR 2020)
Fine-tuning Methods:
- Parameter-Efficient Sparsity methods PST (IJCAI 2022)
- Effective and generalizable fine-tuning method ChildTuning (EMNLP 2021)
Model Compression:
- Language model compression methods ContrastivePruning (AAAI 2022)

News

November 9, 2023: mPLUG-Owl2, the first multimodal large language model for enhancing LLM and MLLM through modal collaboration, were accepted by CVPR 2024.
July 7, 2023: mPLUG-DocOwl, the first ocr-free multimodal large language model for universal document understanding, were accepted by EMNLP 2023.
June 8, 2023: Youku-mPLUG, release the first and largest public Chinese Video-language pretraining dataset and benchmarks, and the Chinese video large language model named mPLUG-video.
April 27, 2023: mPLUG-Owl, a new training paradigm with a modularized design for large multi-modal language models released.
April 25, 2023: mPLUG-2 were accepted by ICML 2023.
April 16, 2023: ChatPLUG, the Chinese open-domain dialogue system for digital human applications released.
October, 2022: mPLUG were accepted by EMNLP 2022.
May, 2022: PST were accepted by IJCAI 2022.
April, 2022: The SOFA modeling toolkit released which supports models&techs standard code and the direct use of them in transformers!
December, 2021: ContrastivePruning were accepted by AAAI 2022.
October, 2021: ChildTuning were accepted by EMNLP 2021.
September, 2021: The first Chinese pre-training table model SDCUP released!
May, 2021: VECO and StructuralLM were accepted by ACL 2021.
March, 2021: AliceMind released!

Pre-trained Models

mPLUG-Owl (April 27, 2023): a new training paradigm with a modularized design for large multi-modal language models. Learns visual knowledge while support multi-turn conversation consisting of different modalities. Observed abilities such as multi-image correlation and scene text understanding, vision-based document comprehension. Release a visually-related instruction evaluation set OwlEval. mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
ChatPLUG (April 16, 2023): a Chinese open-domain dialogue system for digital human applications that instruction finetunes on a wide range of dialogue tasks in a unified internet-augmented format. Different from other open-domain dialogue models that focus on large-scale pre-training and scaling up model size or dialogue corpus, we aim to build a powerful and practical dialogue system for digital human with diverse skills and good multi-task generalization by internet-augmented instruction tuning. ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human
mPLUG (September 1, 2022): large-scale pre-trained model for vision-language understanding and generation. mPLUG is pre-trained end-to-end on large scale image-text pairs with both discriminative and generative objectives. It achieves state-of-the-art results on a wide range of vision-language downstream tasks, including image-captioning, image-text retrieval, visual grounding and visual question answering. mPLUG: Effective Multi-Modal Learning by Cross-Modal Skip Connections(EMNLP 2022)
PLUG (September 1, 2022): large-scale chinese pre-trained model for understanding and generation. PLUG (27B) is a large-scale chinese pre-training model for language understanding and generation. The training of PLUG is two-stage, the first stage is a 24-layer StructBERT encoder, and the second stage is a 24-6-layer PALM encoder-decoder.
SDCUP (September 6, 2021): pre-trained models for table understanding. We design a schema dependency pre-training objective to impose the desired inductive bias into the learned representations for table pre-training. We further propose a schema-aware curriculum learning approach to alleviate the impact of noise and learn effectively from the pre-training data in an easy-to-hard manner. The experiment results on SQUALL and Spider demonstrate the effectiveness of our pre-training objective and curriculum in comparison to a variety of baselines. "SDCUP: Schema Dependency Enhanced Curriculum Pre-Training for Table Semantic Parsing" (Under Review)
LatticeBERT (March 15, 2021): we propose a novel pre-training paradigm for Chinese — Lattice-BERT which explicitly incorporates word representations with those of characters, thus can model a sentence in a multi-granularity manner. "Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese Pre-trained Language Models" (NAACL 2021)
StructuralLM (March 15, 2021): pre-trained models for document-image understanding. We propose a new pre-training approach, StructuralLM, to jointly leverage cell and layout information from scanned documents. The pre-trained StructuralLM achieves new state-of-the-art results in different types of downstream tasks. "StructuralLM: Structural Pre-training for Form Understanding" (ACL 2021)
StructVBERT (March 15, 2021): pre-trained models for vision-language understanding. We propose a new single-stream visual-linguistic pre-training scheme by leveraging multi-stage progressive pre-training and multi-task learning. StructVBERT obtained the 2020 VQA Challenge Runner-up award, and SOTA result on VQA 2020 public Test-standard benchmark (June 2020). "Talk Slides" (CVPR 2020 VQA Challenge Runner-up).
VECO v0 (March 15, 2021): pre-trained models for cross-lingual (x) natural language understanding (x-NLU) and generation (x-NLG). VECO (v0) achieves the new SOTA results on various cross-lingual understanding tasks of the XTREME benchmark, covering text classification, sequence labeling, question answering, and sentence retrieval. For cross-lingual generation tasks, it also outperforms all existing cross-lingual models and state-of-the-art Transformer variants on WMT14 English-to-German and English-to-French translation datasets, with gains of up to 1~2 BLEU. “[VECO: Variable Encoder-decoder Pre-training for Cross-lingual Understanding and Generati

AliceMind

Install / Use

README

AliceMind

AliceMind: ALIbaba's Collection of Encoder-decoders from MinD (Machine IntelligeNce of Damo) Lab

News

Pre-trained Models