Ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

Generate Convert Improve

Install / Use

/learn @marella/Ctransformers

About this skill

Quality Score

0/100

README

CTransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

Also see ChatDocs

Supported Models
Installation
Usage
- 🤗 Transformers
- LangChain
- GPU
- GPTQ
Documentation
License

Supported Models

| Models | Model Type | CUDA | Metal | | :------------------ | ------------- | :--: | :---: | | GPT-2 | gpt2 | | | | GPT-J, GPT4All-J | gptj | | | | GPT-NeoX, StableLM | gpt_neox | | | | Falcon | falcon | ✅ | | | LLaMA, LLaMA 2 | llama | ✅ | ✅ | | MPT | mpt | ✅ | | | StarCoder, StarChat | gpt_bigcode | ✅ | | | Dolly V2 | dolly-v2 | | | | Replit | replit | | |

Installation

pip install ctransformers

Usage

It provides a unified interface for all models:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2")

print(llm("AI is going to"))

Run in Google Colab

To stream the output, set stream=True:

for text in llm("AI is going to", stream=True):
    print(text, end="", flush=True)

You can load models from Hugging Face Hub directly:

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml")

If a model repo has multiple model files (.bin or .gguf files), specify a model file using:

llm = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", model_file="ggml-model.bin")

🤗 Transformers

Note: This is an experimental feature and may change in the future.

To use it with 🤗 Transformers, create model and tokenizer using:

from ctransformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)
tokenizer = AutoTokenizer.from_pretrained(model)

Run in Google Colab

You can use 🤗 Transformers text generation pipeline:

from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
print(pipe("AI is going to", max_new_tokens=256))

You can use 🤗 Transformers generation parameters:

pipe("AI is going to", max_new_tokens=256, do_sample=True, temperature=0.8, repetition_penalty=1.1)

You can use 🤗 Transformers tokenizers:

from ctransformers import AutoModelForCausalLM
from transformers import AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("marella/gpt-2-ggml", hf=True)  # Load model from GGML model repo.
tokenizer = AutoTokenizer.from_pretrained("gpt2")  # Load tokenizer from original model repo.

LangChain

It is integrated into LangChain. See LangChain docs.

GPU

To run some of the model layers on GPU, set the gpu_layers parameter:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GGML", gpu_layers=50)

Run in Google Colab

CUDA

Install CUDA libraries using:

pip install ctransformers[cuda]

ROCm

To enable ROCm support, install the ctransformers package using:

CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers

Metal

To enable Metal support, install the ctransformers package using:

CT_METAL=1 pip install ctransformers --no-binary ctransformers

GPTQ

Note: This is an experimental feature and only LLaMA models are supported using ExLlama.

Install additional dependencies using:

pip install ctransformers[gptq]

Load a GPTQ model using:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-7B-GPTQ")

Run in Google Colab

If model name or path doesn't contain the word gptq then specify model_type="gptq".

It can also be used with LangChain. Low-level APIs are not fully supported.

Documentation

Config

Note: Currently only LLaMA, MPT and Falcon models support the context_length parameter.

<kbd>class</kbd> `AutoModelForCausalLM`

<kbd>classmethod</kbd> `AutoModelForCausalLM.from_pretrained`

from_pretrained(
    model_path_or_repo_id: str,
    model_type: Optional[str] = None,
    model_file: Optional[str] = None,
    config: Optional[ctransformers.hub.AutoConfig] = None,
    lib: Optional[str] = None,
    local_files_only: bool = False,
    revision: Optional[str] = None,
    hf: bool = False,
    **kwargs
) → LLM

Loads the language model from a local file or remote repo.

Args:

model_path_or_repo_id: The path to a model file or directory or the name of a Hugging Face Hub model repo.
model_type: The model type.
model_file: The name of the model file in repo or directory.
config: AutoConfig object.
lib: The path to a shared library or one of avx2, avx, basic.
local_files_only: Whether or not to only look at local files (i.e., do not try to download the model).
revision: The specific model version to use. It can be a branch name, a tag name, or a commit id.
hf: Whether to create a Hugging Face Transformers model.

Returns: LLM object.

<kbd>class</kbd> `LLM`

<kbd>method</kbd> `LLM.init`

__init__(
    model_path: str,
    model_type: Optional[str] = None,
    config: Optional[ctransformers.llm.Config] = None,
    lib: Optional[str] = None
)

Loads the language model from a local file.

Args:

model_path: The path to a model file.
model_type: The model type.
config: Config object.
lib: The path to a shared library or one of avx2, avx, basic.

<kbd>property</kbd> LLM.bos_token_id

The beginning-of-sequence token.

<kbd>property</kbd> LLM.config

The config object.

<kbd>property</kbd> LLM.context_length

The context length of model.

<kbd>property</kbd> LLM.embeddings

The input embeddings.

<kbd>property</kbd> LLM.eos_token_id

The end-of-sequence token.

<kbd>property</kbd> LLM.logits

The unnormalized log probabilities.

<kbd>property</kbd> LLM.model_path

The path to the model file.

<kbd>property</kbd> LLM.model_type

The model type.

<kbd>property</kbd> LLM.pad_token_id

The padding token.

<kbd>property</kbd> LLM.vocab_size

The number of tokens in vocabulary.

<kbd>method</kbd> `LLM.detokenize`

detokenize(tokens: Sequence[int], decode: bool = True) → Union[str, bytes]

Converts a list of tokens to text.

Args:

tokens: The list of tokens.
decode: Whether to decode the text as UTF-8 string.

Returns: The combined text of all tokens.

<kbd>method</kbd> `LLM.embed`

embed(
    input: Union[str, Sequence[int]],
    batc

Related Skills

node-connect

340.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

84.2k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

340.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

commit-push-pr

84.2k

Commit, push, and open a PR

marella

View profile

View on GitHub

GitHub Stars1.9k

CategoryDevelopment

Updated12d ago

Forks144

marella/ctransformers

Languages

Security Score

100/100

Audited on Mar 17, 2026

No findings

Ctransformers

Install / Use

README

CTransformers

Supported Models

Installation

Usage

🤗 Transformers

LangChain

GPU

CUDA

ROCm

Metal

GPTQ

Documentation

Config

<kbd>class</kbd> AutoModelForCausalLM

<kbd>classmethod</kbd> AutoModelForCausalLM.from_pretrained

<kbd>class</kbd> LLM

<kbd>method</kbd> LLM.__init__

<kbd>property</kbd> LLM.bos_token_id

<kbd>property</kbd> LLM.config

<kbd>property</kbd> LLM.context_length

<kbd>property</kbd> LLM.embeddings

<kbd>property</kbd> LLM.eos_token_id

<kbd>property</kbd> LLM.logits

<kbd>property</kbd> LLM.model_path

<kbd>property</kbd> LLM.model_type

<kbd>property</kbd> LLM.pad_token_id

<kbd>property</kbd> LLM.vocab_size

<kbd>method</kbd> LLM.detokenize

<kbd>method</kbd> LLM.embed

Related Skills

<kbd>class</kbd> `AutoModelForCausalLM`

<kbd>classmethod</kbd> `AutoModelForCausalLM.from_pretrained`

<kbd>class</kbd> `LLM`

<kbd>method</kbd> `LLM.init`

<kbd>method</kbd> `LLM.detokenize`

<kbd>method</kbd> `LLM.embed`