SkillAgentSearch skills...

Minihf

MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user develop their prompts into full models.

Install / Use

/learn @JD-P/Minihf
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

vector art logo of the brain patterned in the style of a pictorial history of
Portuguese textiles, painted in the 1700s. With the logotext "MiniHF"

MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user develop their prompts into full models. Normally when we prompt a language model we're forced to think in that models latent space. MiniHF lets you go the other direction: Imagine the ideal context in which your prompt could take place and then add it to the model. To make this possible MiniHF provides several powerful features:

  • Lightweight web interface and inference server that lets you easily branch your session with the model into multiple completion chains and pick the best ones

  • Make your own feedback dataset by writing with local language models such as StableLM and NeoX 20b.

  • A monte carlo tree search (MCTS) based inference algorithm, Weave, which rejection samples from the model to improve output quality

  • The ability to finetune both the underlying generator LoRa and the evaluator reward LoRa used for the tree search on your own custom dataset

  • Easy bootstrapping of new document contexts and models using reinforcement learning from AI feedback (RLAIF)

  • Easy install with minimal dependencies

If you want to discuss MiniHF with other users, we have a discord server.

Setup

DataCrunch

If you want to use MiniHF with a large amount of VRAM, https://datacrunch.io/ is a good option. Provision an A6000, A6000 Ada, etc from their selection with the Ubuntu + Cuda + Docker image and more than the default amount of storage (I found it worked with 1tb, but you can probably make do with less, the default 40gb is not enough). Once you have it running SSH in with a listener on port 5000:

ssh root@IP_ADDRESS -L 5000:localhost:5000

Once you're in clone the repo and change directories to it:

git clone https://github.com/JD-P/minihf.git
cd minihf

Then run:

bash setup.sh

You should see the script update the server packages, install python dependencies from pip, download the models, and then finally start the inference server. At this point you can start using MiniHF by visiting http://localhost:5000/ in your browser. You can change which models you're using in the minihf_infer.py file. Later on we'll add a configuration file to change these settings.

To start the server yourself on subsequent logins use the commands:

cd minihf
source env_minihf/bin/activate
flask --app minihf_infer run

Tuning Models

MiniHF lets you tune two model types, both of which are LoRa tunes on an underlying foundation model such as GPT-J, NeoX, OpenLlama, or falcon-40b:

  1. Generator LoRa - Generates the text that the user or Weave algorithm evaluates.

  2. Evaluator LoRa - Reward model that selects between branches in the Weave tree search.

Furthermore each model has two kinds of tuning, self-supervised finetuning (SFT) and reinforcement learning from AI feedback (RLAIF).

A diagram of the training flow chart for the models and tuning scripts in MiniHF

Preparing The Tuning Dataset

The tuning dataset should consist of a zip file containing one or more plaintext files or json conversations exported from MiniHF. Because the model might not be adapted to your style or document context yet, it might be more efficient to write out the first drafts of what you want in text files and then start using MiniHF after you've tuned the generator on them.

Tuning The Generator

Once you have the tuning dataset it's easy to make a generator LoRa from it with the sft_generator.py script:

python3 sft_generator.py --user-dataset data.zip --model "EleutherAI/gpt-j-6b" --output example

Keep in mind that if your data is under 10 megabytes of tokens or so, other bulk pretraining data from the RedPajama dataset will be used to prevent overfitting.

Tuning The Evaluator

MiniHF doesn't currently support using the user data to tune the evaluator, but it will soon. In the meantime you can make your own evaluator on bulk pretraining data with the sft_evaluator.py script:

python sft_evaluator.py --output-dir example

RLAIF Tuning The Generator

Once you've finetuned your generator LoRa to narrow the hypothesis space (this step is optional, you can RL tune a base model directly to bootstrap if you don't have any training data) you can use RLAIF tuning to distill goals from the base models latent space into your generator using a value constitution.

python rlaif_generator.py --resume hermes --output-path hermes_rl --kl-weight 1.0 --constitution hermes/hermes_constitution.txt --prompts hermes/hermes_prompts.txt --length 256 --batch-size 2 --grad-accum-steps 8

Because both the generator and evaluator are separate LoRa models, it becomes possible for the base model to simultaneously hold two perspectives and update itself according to a value set without compromising those values. The evaluator is frozen while the generator is trained, preventing self updating from collapsing to a positive feedback loop.

Please note that the RLAIF tuning is still not robust, and if you tune too long the model converges to saying 'yes' with the current Yes/No zero-shot evaluator setup the model uses. This problem can hopefully be mitigated and then properly solved in future releases.

See the hermes directory for an example constitution and prompt set. In a future release you'll be able to use your MiniHF user dataset as a prompt database in addition to text files.

Philosophy

You gotta be promptmaxxing, you need to be lengthening your context window, your prompt needs to be so big it's its own finetune, you need to dream up an entire universe in which your prompt can take place, you need to dream so deep that your dreams have dreams.

— John David Pressman, Dec 10, 2022

MiniHF could be easily mistaken for a 'bag of tricks'. It incorporates features that have recently received a lot of attention like tree search and zero-shot reward modeling. A user might be tempted to believe the design was chosen by throwing together whatever seems trendy until something good emerges. Nothing could be further from the truth. MiniHF was written to realize a simple idea: Rather than just prompt language models for what can be inferred from existing documents, we should be inventing new kinds of document for these models that make it easy to infer the information we want. Every design element is meant to support this goal. This section is meant to help you productively use and improve MiniHF by explaining how.

Literature Simulators

When ChatGPT came out at the end of 2022 its unexpected popularity brought language models to a mass audience. Suddenly thousands of people were discovering the rabbit hole of language model prompting, and the strange capabilities lurking underneath ChatGPT's surface. ChatGPT could:

How is this possible? Skeptics claim that at best ChatGPT is a kind of 'stochastic parrot' that rearranges words and phrases, that it's learned the mere statistical correlation between different words at such a scale it fools the user into thinking it has a mind. To anyone who has used ChatGPT in good faith for more than 10 minutes this is an absurd claim. Indeed many critiques along these lines echo the chauvinistic impulses of Helen Keller's detractors. The statistical correlation generalization strategy could not do the things that ChatGPT does, no matter how you scaled it, any more than a massive Markov Chain could.

How it really works is much more interesting. When the network first begins learning the dataset it probably does use the statistical correlation strategy. This is of course the obvious thing to learn, and it can be picked up in bits and pieces. But eventually it stops working. There exist nuances of text that it would be supremely difficult to guess from mere correlation. In fact at some point the correlation strategy would become costly enough for the network that it becomes cheaper to start learning semantics. This is the basic theory behind deep learning: Create an information bottleneck through which a deep neural network has to predict some piece of information. The bottleneck means that what the network is given is much less than the original, and the size of the datasets involved ensures that memorization is a futile strategy. Under these conditions the network must learn to predict what should be present from the limited information given, so that the output is something vaguely related to the original. When we prompt such networks with our own unseen information, they hallucinate what they expect to find in our nonexistent documents.

Prompting the model then is an imaginative exercise: We must turn our minds eye toward the world and ask not just in what dusty tome or forgotten web page the information we want might exist, but what document with the potential to exist contai

Related Skills

View on GitHub
GitHub Stars184
CategoryDevelopment
Updated1mo ago
Forks12

Languages

Python

Security Score

95/100

Audited on Mar 2, 2026

No findings