SkillAgentSearch skills...

Gollama.cpp

A high-performance Go binding for llama.cpp using purego for cross-platform compatibility without CGO.

Install / Use

/learn @dianlight/Gollama.cpp
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Gollama.cpp

Go Reference License: MIT Release

A high-performance Go binding for llama.cpp using purego and libffi for cross-platform compatibility without CGO.

Features

  • Pure Go: No CGO required, uses purego and libffi for C interop
  • Cross-Platform: Supports macOS (CPU/Metal), Linux (CPU/NVIDIA/AMD), Windows (CPU/NVIDIA/AMD)
  • Struct Support: Uses libffi for calling C functions with struct parameters/returns on all platforms
  • Performance: Direct bindings to llama.cpp shared libraries
  • Compatibility: Version-synchronized with llama.cpp releases
  • Easy Integration: Simple Go API for LLM inference
  • GPU Acceleration: Supports Metal, CUDA, HIP, Vulkan, OpenCL, SYCL, and other backends
  • Embedded Runtime Libraries: Optional go:embed bundle for all supported platforms
  • GGML Bindings: Low-level GGML tensor library bindings for advanced use cases

Supported Platforms

Gollama.cpp uses a platform-specific architecture with build tags to ensure optimal compatibility and performance across all operating systems.

✅ Fully Supported Platforms

macOS

  • CPU: Intel x64, Apple Silicon (ARM64)
  • GPU: Metal (Apple Silicon)
  • Status: Full feature support with purego
  • Build Tags: Uses !windows build tag

Linux

  • CPU: x86_64, ARM64
  • GPU: NVIDIA (CUDA/Vulkan), AMD (HIP/ROCm/Vulkan), Intel (SYCL/Vulkan)
  • Status: Full feature support with purego and libffi
  • Build Tags: Uses !windows build tag

Windows

  • CPU: x86_64, ARM64
  • GPU: NVIDIA (CUDA/Vulkan), AMD (HIP/Vulkan), Intel (SYCL/Vulkan), Qualcomm Adreno (OpenCL)
  • Status: Full feature support with libffi
  • Build Tags: Uses windows build tag with syscall-based library loading
  • Current State:
    • ✅ Compiles without errors on Windows
    • ✅ Cross-compilation from other platforms works
    • ✅ Runtime functionality fully enabled via libffi and GetProcAddress
    • ✅ Full struct parameter/return support through function registration
    • 🚧 GPU acceleration being tested

Windows runtime notes

  • The loader now adds the DLL's directory to the Windows DLL search path and uses LoadLibraryExW with safe search flags to reliably resolve sibling dependencies (ggml, libomp, libcurl, etc.).
  • When a symbol isn't found in llama.dll, resolution automatically searches sibling DLLs from the same directory (e.g., ggml*.dll). This matches how upstream splits exports on Windows and fixes missing llama_backend_* on some builds.
  • If you see “The specified module could not be found.” while loading llama.dll, it often indicates a missing system runtime (e.g., Microsoft Visual C++ Redistributable 2015–2022). Installing the latest x64/x86 redistributable typically resolves it.
  • CI runners set PATH for later steps, but the downloader verifies loading immediately after download; the improved loader handles dependency resolution without relying on PATH.

Platform-Specific Implementation Details

Our platform abstraction layer uses Go build tags to provide:

  • Unix-like systems (!windows): Uses purego for dynamic library loading
  • Windows (windows): Uses native Windows syscalls (LoadLibraryW, FreeLibrary, GetProcAddress)
  • All platforms: Uses libffi for calling C functions with struct parameters/returns
  • Cross-compilation: Supports building for any platform from any platform
  • Automatic detection: Runtime platform capability detection

Installation

go get github.com/dianlight/gollama.cpp

The Go module automatically downloads pre-built llama.cpp libraries from the official ggml-org/llama.cpp releases on first use. No manual compilation required!

Embedding Libraries

For reproducible builds you can embed the pre-built libraries directly into the Go module. A helper Makefile target downloads the configured llama.cpp build (LLAMA_CPP_BUILD) for every supported platform and synchronises the ./libs directory which is picked up by go:embed:

# Download all platform builds for the configured llama.cpp version and populate ./libs
make populate-libs

# Alternatively, use the CLI directly
go run ./cmd/gollama-download -download-all -version b6862 -copy-libs

Only a single llama.cpp version is stored in ./libs at a time. Running populate-libs removes outdated directories automatically. Subsequent go build invocations embed the freshly synchronised libraries and LoadLibraryWithVersion("") will prefer the embedded bundle.

Cross-Platform Development

Build Compatibility Matrix

Our CI system tests compilation across all platforms:

| Target Platform | Build From Linux | Build From macOS | Build From Windows | | --------------- | :--------------: | :--------------: | :----------------: | | Linux (amd64) | ✅ | ✅ | ✅ | | Linux (arm64) | ✅ | ✅ | ✅ | | macOS (amd64) | ✅ | ✅ | ✅ | | macOS (arm64) | ✅ | ✅ | ✅ | | Windows (amd64) | ✅ | ✅ | ✅ | | Windows (arm64) | ✅ | ✅ | ✅ |

Development Workflow

# Test cross-compilation for all platforms
make test-cross-compile

# Build for specific platform
GOOS=windows GOARCH=amd64 go build ./...
GOOS=linux GOARCH=arm64 go build ./...
GOOS=darwin GOARCH=arm64 go build ./...

# Run platform-specific tests
go test -v -run TestPlatformSpecific ./...

Quick Start

package main

import (
    "fmt"
    "log"

    "github.com/dianlight/gollama.cpp"
)

func main() {
    // Initialize the library
    gollama.Backend_init()
    defer gollama.Backend_free()

    // Load model
    params := gollama.Model_default_params()
    model, err := gollama.Model_load_from_file("path/to/model.gguf", params)
    if err != nil {
        log.Fatal(err)
    }
    defer gollama.Model_free(model)

    // Create context
    ctxParams := gollama.Context_default_params()
    ctx, err := gollama.Init_from_model(model, ctxParams)
    if err != nil {
        log.Fatal(err)
    }
    defer gollama.Free(ctx)

    // Tokenize and generate
    prompt := "The future of AI is"
    tokens, err := gollama.Tokenize(model, prompt, true, false)
    if err != nil {
        log.Fatal(err)
    }

    // Create batch and decode
    batch := gollama.Batch_init(len(tokens), 0, 1)
    defer gollama.Batch_free(batch)

    for i, token := range tokens {
        gollama.Batch_add(batch, token, int32(i), []int32{0}, false)
    }

    if err := gollama.Decode(ctx, batch); err != nil {
        log.Fatal(err)
    }

    // Sample next token
    logits := gollama.Get_logits_ith(ctx, -1)
    candidates := gollama.Token_data_array_init(model)
    
    sampler := gollama.Sampler_init_greedy()
    defer gollama.Sampler_free(sampler)
    
    newToken := gollama.Sampler_sample(sampler, ctx, candidates)
    
    // Convert token to text
    text := gollama.Token_to_piece(model, newToken, false)
    fmt.Printf("Generated: %s\n", text)
}

Advanced Usage

GGML Low-Level API

For advanced use cases, gollama.cpp provides direct access to GGML (the tensor library powering llama.cpp):

// Check GGML type information
typeSize, err := gollama.Ggml_type_size(gollama.GGML_TYPE_F32)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("F32 type size: %d bytes\n", typeSize)

// Check if a type is quantized
isQuantized, err := gollama.Ggml_type_is_quantized(gollama.GGML_TYPE_Q4_0)
if err != nil {
    log.Fatal(err)
}
fmt.Printf("Q4_0 is quantized: %v\n", isQuantized)

// Enumerate backend devices
devCount, err := gollama.Ggml_backend_dev_count()
if err == nil && devCount > 0 {
    for i := uint64(0); i < devCount; i++ {
        dev, _ := gollama.Ggml_backend_dev_get(i)
        name, _ := gollama.Ggml_backend_dev_name(dev)
        fmt.Printf("Device %d: %s\n", i, name)
    }
}

Supported GGML Features:

  • 31 tensor type definitions (F32, F16, Q4_0, Q8_0, BF16, etc.)
  • Type size and quantization utilities
  • Backend device enumeration and management
  • Buffer allocation and management
  • Type information queries

Note: GGML functions may not be exported in all llama.cpp builds. The library gracefully handles missing functions without errors.

GPU Configuration

Gollama.cpp automatically downloads the appropriate pre-built binaries with GPU support and configures the optimal backend:

// Automatic GPU detection and configuration
params := gollama.Context_default_params()
params.n_gpu_layers = 32 // Offload layers to GPU (if available)

// Detect available GPU backend
backend := gollama.DetectGpuBackend()
fmt.Printf("Using GPU backend: %s\n", backend.String())

// Platform-specific optimizations:
// - macOS: Uses Metal when available  
// - Linux: Supports CUDA, HIP, Vulkan, and SYCL
// - Windows: Supports CUDA, HIP, Vulkan, OpenCL, and SYCL
params.split_mode = gollama.LLAMA_SPLIT_MODE_LAYER

GPU Support Matrix

| Platform | GPU Type | Backend | Status | | -------- | --------------- | -------- | ----------------------- | | macOS | Apple Silicon | Metal | ✅ Supported | | macOS | Intel/AMD | CPU only | ✅ Supported | | Linux | NVIDIA | CUDA

View on GitHub
GitHub Stars24
CategoryDevelopment
Updated17d ago
Forks6

Languages

Go

Security Score

90/100

Audited on Mar 15, 2026

No findings