StableLM

StableLM: Stability AI Language Models

Generate Convert Improve

Install / Use

/learn @Stability-AI/StableLM

About this skill

Quality Score

0/100

README

StableLM: Stability AI Language Models

<br/>“A Stochastic Parrot, flat design, vector art” — Stable Diffusion XL

This repository contains Stability AI's ongoing development of the StableLM series of language models and will be continuously updated with new checkpoints. The following provides an overview of all currently available models. More coming soon.

News

September 29, 2023

Released StableLM-3B-4E1T model under CC BY-SA-4.0.

August 5, 2023

Released patched StableLM-Alpha v2 models with 3B and 7B parameters.

April 28, 2023

Released StableVicuna-13B, our RLHF fine-tune of Vicuna-13B v0, which itself is a fine-tune of LLaMA-13B. Delta weights over the original Llama model is released under (CC BY-NC-SA-4.0).

April 20, 2023

Released initial set of StableLM-Alpha models, with 3B and 7B parameters. Base models are released under CC BY-SA-4.0.
Try to chat with our 7B model, StableLM-Tuned-Alpha-7B, on Hugging Face Spaces.

Models

StableLM-3B-4E1T

Technical Report: StableLM-3B-4E1T

StableLM-3B-4E1T is a 3 billion (3B) parameter language model pre-trained under the multi-epoch regime to study the impact of repeated tokens on downstream performance. Given prior success in this area (Tay et al., 2023 and Taylor et al., 2022), we train on 1 trillion (1T) tokens for 4 epochs following the observations of Muennighoff et al. (2023) in "Scaling Data-Constrained Language Models" in which they find "training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data." Further inspiration for the token count is taken from "Go smol or go home" (De Vries, 2023), which suggests a 2.96B model trained for 2.85 trillion tokens achieves a similar loss to a Chinchilla compute-optimal 9.87B language model ($k_n = 0.3$).

| Size | StableLM-3B-4E1T | Training Tokens | Parameters | |------|--------------------------------------------------------------------|-----------------|---------------| | 3B | checkpoint | 4T | 2,795,443,200 |

Model Architecture

The model is a decoder-only transformer similar to the LLaMA (Touvron et al., 2023) architecture with the following modifications:

| Parameters | Hidden Size | Layers | Heads | Sequence Length | |----------------|-------------|--------|-------|-----------------| | 2,795,443,200 | 2560 | 32 | 32 | 4096 |

Position Embeddings: Rotary Position Embeddings (Su et al., 2021) applied to the first 25% of head embedding dimensions for improved throughput following Black et al. (2022).
Normalization: LayerNorm (Ba et al., 2016) with learned bias terms as opposed to RMSNorm (Zhang & Sennrich, 2019).
Tokenizer: GPT-NeoX (Black et al., 2022).

Training Data

The dataset is comprised of a filtered mixture of open-source large-scale datasets available on the HuggingFace Hub: Falcon RefinedWeb extract (Penedo et al., 2023), and RedPajama-Data (Together Computer., 2023) and The Pile (Gao et al., 2020) both without Books3 and other subsets, and StarCoder (Li et al., 2023).

Given the large amount of web data, we recommend fine-tuning the base StableLM-3B-4E1T for your downstream tasks.

Training Details

Please refer to the provided YAML configuration file stablelm-3b-4e1t.yml for complete hyperparameter settings and the technical report for further details.

Downstream Results

The following zero-shot evaluations are performed with the lm-evaluation-harness using the lm-bench branch of Stability AI's fork. Full lm-eval JSONs can be found in the evals directory.

| Pre-Trained Model | ------------------------- | meta-llama/Llama-2-13b-hf | huggyllama/llama-7b | meta-llama/Llama-2-7b-hf | Qwen/Qwen-7B | tiiuae/falcon-7b | mosaicml/mpt-7b | stabilityai/stabl | baichuan-inc/Baichuan2-7B-Base | stabilityai/stablelm-base-alpha-7b-v2 | openlm-research/open_llama_7b_v2 | microsoft/phi-1_5 | EleutherAI/gpt-neox-20B | togethercomputer/RedPajam | cerebras/btlm-3b-8k-base (§) | EleutherAI/pythia-12b | openlm-research/open_llama_3b_v2 | EleutherAI/gpt-j-6B | stabilityai/stablelm-base-alpha-3b-v2 | facebook/opt-6.7b | Average | ARC<br>Challenge | ARC<br>Easy | BoolQ | HellaSwag (✱) | LAMBADA<br>OpenAI | OpenBookQA | PIQA | SciQ | Winogrande | ------------------------------------------------------------ |:-----------------:|:----------------:|:-----------:|:-----:|:-------------:|:-----------------:|:----------:|:-----:|:-----:|:----------:| | 71.77 | 48.63 | 79.50 | 80.52 | 79.36 | 76.77 | 35.40 | 79.05 | 94.50 | 72.22 | | 68.84 | 41.89 | 75.25 | 75.05 | 76.22 | 73.55 | 34.40 | 78.67 | 94.60 | 69.93 | | 68.75 | 43.00 | 76.26 | 77.74 | 75.94 | 73.47 | 31.40 | 77.75 | 93.60 | 69.61 | | 67.91 | 45.39 | 67.38 | 74.56 | 88.85 (?) | 69.67 | 32.20 | 73.99 | 93.20 | 65.98 | | 67.83 | 40.27 | 74.41 | 73.55 | 76.35 | 74.56 | 30.60 | 79.49 | 94.00 | 67.25 | | 67.36 | 40.53 | 74.92 | 73.94 | 76.17 | 68.64 | 31.40 | 78.89 | 93.70 | 68.03 | elm-3b-4e1t | 66.93 | 37.80 | 72.47 | 75.63 | 73.90 | 70.64 | 31.40 | 79.22 | 94.80 | 66.54 | | 66.93 | 42.24 | 75.00 | 73.09 | 72.29 | 70.99 | 30.40 | 76.17 | 94.60 | 67.56 | | 66.89 | 38.48 | 73.19 | 70.31 | 74.27 | 74.19 | 30.40 | 78.45 | 93.90 | 68.82 | | 66.32 | 38.82 | 71.93 | 71.41 | 74.65 | 71.05 | 30.20 | 79.16 | 93.80 | 65.82 | | 65.57 | 44.45 | 76.14 | 74.53 | 62.62 | 52.75 | 37.60 | 76.33 | 93.20 | 72.53 | | 65.57 | 37.88 | 72.90 | 69.48 | 71.43 | 71.98 | 29.80 | 77.42 | 93.10 | 66.14 | a-INCITE-7B-Base | 65.07 | 37.71 | 72.35 | 70.76 | 70.33 | 71.34 | 29.00 | 77.15 | 92.70 | 64.33 | | 63.59 | 34.90 | 70.45 | 69.63 | 69.78 | 66.23 | 27.60 | 75.84 | 92.90 | 64.96 | | 62.69 | 31.83 | 70.20 | 67.31 | 67.38 | 70.64 | 26.40 | 76.28 | 90.20 | 64.01 | | 62.43 | 33.87 | 67.59 | 65.69 | 69.99 | 66.74 | 26.00 | 76.66 | 92.40 | 62.90 | | 62.34 | 33.96 | 66.96 | 65.44 | 66.24 | 68.23 | 29.00 | 75.57 | 91.50 | 64.17 | | 62.19 | 32.42 | 67.26 | 64.56 | 68.58 | 70.25 | 26.40 | 76.01 | 92.10 | 62.12 | | 61.85 | 30.72 | 65.66 | 66.02 | 67.20 | 67.65 | 27.60 | 76.33 | 90

Related Skills

node-connect

349.0k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

109.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

349.0k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

349.0k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。