SynCode: LLM Generation with Grammar Augmentation [![Test Status][test-img]][tests]

ℹ️ <a href="#-about">About</a> | 📚 <a href="#-features">Features</a> | 📖 <a href="#-more-about-syncode">More About SynCode</a> | 🚀 <a href="#-quick-start">Quick Start</a> | 👀 <a href="#-example-usage">Example Usage</a> | 🤔 <a href="#-faq">FAQs</a> <a href="https://arxiv.org/abs/2403.01632"><img src="https://img.shields.io/badge/Paper-arXiv-blue"></a>

ℹ️ About

SynCode is a novel framework for the grammar-guided generation of Large Language Models (LLMs) that is scalable to general-purpose programming languages and has soundness and completeness guarantees.
With SynCode, you can ensure that your Language Model is generating output that is syntactically valid with respect to the rules defined by a Context-Free Grammar (CFG).
For example, SynCode gets 99% accuracy in JSON generation with Gemma-2b (check here) and is 10-20% faster than standard unconstrained generation

Builtin Grammars

Check Grammars directory for supported grammars

Define your own grammar using simple EBNF syntax. Check out our notebooks directory for examples and a quick example at <img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />

📚 Features

| | |---------------------------------------------------------------------------------------------------------------------| | 🔥 Fast grammar-guided generation (as little as 10% generation overhead with Python and Go!) | | 🤖 Seamlessly work with any HuggingFace Language Model, including Code, Chat, and Instruct models | | 🖍️ Pass in any CFG in the EBNF format (even large grammars for programming languages like Python and Go!) | | 📝 Built-in CFGs for Python, Go, Java, SQL, Math, JSON, and more! | | 🎲 Sample with any existing decoding strategy (eg. greedy, beam search, nucleus sampling) |

🚀 Quick Start

Python Installation and Usage Instructions

You can install SynCode via PyPI:

pip install syncode

Alternatively, you can install the latest development version directly from GitHub:

pip install git+https://github.com/structuredllm/syncode.git

Version Compatibility

SynCode depends on HuggingFace transformers:

| SynCode version | Required transformers version | Python version | | -------------- | ----------------------------- | -------------- | | v0.4.16 (latest) | v4.53.2 | 3.6 - 3.12 |

Note: Python 3.13 is not currently supported due to dependency constraints.

Usage option 1:

SynCode can be used as a simple logit processor with HuggingFace transformers library interface. Check this notebook for example.

Just import with and initialize it with the appropriate grammar

from syncode import SyncodeLogitsProcessor

and this can be passed as an argument to generate function. For example,

output = model.generate(
          inputs,
          max_length=100, 
          pad_token_id=tokenizer.eos_token_id, 
          logits_processor=[syncode_logits_processor]
        )

Usage option 2:

The other option is to use the SynCode object for inference (this comes with additional optimizations),

from syncode import Syncode

Refer to <a href="###-syncode-arguments">SynCode Arguments</a> for the full list of arguments to initialize the SynCode class. In Python, inference is performed using the infer() method in the SynCode class. infer() has the following arguments:

prompt (str, optional): Prompt to the Language Model. Defaults to None.
task_id (int, optional): Problem task id for selecting a problem from a Dataset. Defaults to None.

If both prompt and task_id are not specified, infer() reads user input via stdin.

The following example shows the benefit of SynCode:

In the example below, the unconstrained original Phi-2 model fails to generate a valid JSON object and instead generates Python code.

from syncode import Syncode

# Load the unconstrained original model
llm = Syncode(model="microsoft/phi-2", mode='original', max_new_tokens=50)

prompt = "Please return a JSON object to represent the country India with name, capital, and population?"
output = llm.infer(prompt)[0]
print(f"LLM output:\n{output}\n")

# LLM output:
#
# A:
#
# You can use the following code:
# import json
#
# def get_country_info(country_name):
#    country_info = {
#        'name': country_name,
#        'capital':

When guided with the JSON grammar with SynCode, the model can generate a syntactically valid JSON object.

from syncode import Syncode

# Load the Syncode augmented model
syn_llm = Syncode(model = "microsoft/phi-2", grammar='json', parse_output_only=True, max_new_tokens=50)

prompt = "Please return a JSON object to represent the country India with name, capital, and population?"
output = syn_llm.infer(prompt)[0]
print(f"SynCode output:\n{output}")

# SynCode output:
# {
#     "name": "India",
#     "capital": "New Delhi",
#     "population": "1,366,417,754"
# }

Check more examples of using Python, Go, Java and other grammars in <a href="#-example-usage">Notebooks</a> and a quick example at <img align="center" src="https://colab.research.google.com/assets/colab-badge.svg" />

Instuct-tuned Models

SynCode can also be used with Instruct-tuned models. For example, the following code snippet shows how to use SynCode with the Instruct-tuned LLM model.

messages = [
    {"role": "system", "content": "You are a chatbot who always returns a JSON object."},
    {"role": "user", "content": "can you give me a JSON object describing University of Illinois at Urbana-Champaign?"},
]

out = syn_llm.infer(messages)

See the notebook for example.

Environment Variables

Optionally, you can set the directories for cache by exporting the following environment variables. Add the following lines to your .bashrc or .zshrc file:

export HF_CACHE="path_to_hf_cache"
export SYNCODE_CACHE="path_to_syncode_cache"

If these environment variables are not set, the tool will use the default cache directories. To use the gated models on HuggingFace such as Llamma models, you can set the environment variable HF_ACCESS_TOKEN

export HF_ACCESS_TOKEN="your_huggingface_api_key"

SynCode Arguments

<details> <summary>Click to Expand on the List of Arguments for SynCode</summary>

mode (str, optional): Mode for inference. grammar_mask and grammar_strict are the modes that enable our tool. original is the mode for the original LLM. Defaults to "grammar_strict". "original" mode are used for the original LLM without any grammar constraints and "grammar_strict" mode is a stricter mode for a grammar-constrained generation.
model (str): Model ID for Hugging Face model hub or model name if stored locally.
quantize (bool, optional): Quantize the model to bfloat16. Defaults to True.
device (str, optional): The device on which the model is run. Defaults to cuda.
grammar (str, optional): Grammar in EBNF form (string or file path) or language for constrained generation. Defaults to None. You can use one of the python, go, sql, json, java, calc or pass in a custom grammar (check notebooks for examples) in EBNF format.
num_samples (int, optional): Number of samples. Defaults to 1.
dataset (str, optional): Dataset. Defaults to "input". "input" indicates that the user can provide input via CLI or by passing in a prompt as a string.
num_few_shot (int, optional): Number of examples for few-shot prompting. Defaults to 0.
dev_mode (bool, optional): Development mode where we do not fail silently with parser errors. Defaults to False.
log_level (int, optional): 0 for no logs, 1 for minimal logs, 2 for all logs including time. Defaults to 2.
new_mask_store (bool, optional): Forces to use a new mask store otherwise use a cached mask store if available. Defaults to False.
parser (str, optional): Choose between LR(1) and LALR(1) parsing. Defaults to 'lalr'.
task_id (int, optional): Problem task id for selecting a problem from a Dataset.
device_map (str, optional): Device map for the model. Defaults to None.
kwargs(void, optional): Currently supported kwargs are max_length, max_new_tokens, min_length, min_new_tokens, early_stopping, do_sample, num_beams, use_cache, temperature, top_k, top_p, num_return_sequences, pad_token_id, and eos_token_id. Refer to the [HuggingFace Text Generation Documentation](https://huggingface.co/docs/transformers/

Syncode

Install / Use

README