Minihf
MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user develop their prompts into full models.
Install / Use
/learn @JD-P/MinihfREADME

MiniHF is an inference, human preference data collection, and fine-tuning tool for local language models. It is intended to help the user develop their prompts into full models. Normally when we prompt a language model we're forced to think in that models latent space. MiniHF lets you go the other direction: Imagine the ideal context in which your prompt could take place and then add it to the model. To make this possible MiniHF provides several powerful features:
-
Lightweight web interface and inference server that lets you easily branch your session with the model into multiple completion chains and pick the best ones
-
Make your own feedback dataset by writing with local language models such as StableLM and NeoX 20b.
-
A monte carlo tree search (MCTS) based inference algorithm, Weave, which rejection samples from the model to improve output quality
-
The ability to finetune both the underlying generator LoRa and the evaluator reward LoRa used for the tree search on your own custom dataset
-
Easy bootstrapping of new document contexts and models using reinforcement learning from AI feedback (RLAIF)
-
Easy install with minimal dependencies
If you want to discuss MiniHF with other users, we have a discord server.
Setup
DataCrunch
If you want to use MiniHF with a large amount of VRAM, https://datacrunch.io/ is
a good option. Provision an A6000, A6000 Ada, etc from their selection with
the Ubuntu + Cuda + Docker image and more than the default amount of storage
(I found it worked with 1tb, but you can probably make do with less, the default
40gb is not enough). Once you have it running SSH in with a listener on port 5000:
ssh root@IP_ADDRESS -L 5000:localhost:5000
Once you're in clone the repo and change directories to it:
git clone https://github.com/JD-P/minihf.git
cd minihf
Then run:
bash setup.sh
You should see the script update the server packages, install python dependencies
from pip, download the models, and then finally start the inference server. At
this point you can start using MiniHF by visiting http://localhost:5000/ in your
browser. You can change which models you're using in the minihf_infer.py file.
Later on we'll add a configuration file to change these settings.
To start the server yourself on subsequent logins use the commands:
cd minihf
source env_minihf/bin/activate
flask --app minihf_infer run
Tuning Models
MiniHF lets you tune two model types, both of which are LoRa tunes on an underlying foundation model such as GPT-J, NeoX, OpenLlama, or falcon-40b:
-
Generator LoRa - Generates the text that the user or Weave algorithm evaluates.
-
Evaluator LoRa - Reward model that selects between branches in the Weave tree search.
Furthermore each model has two kinds of tuning, self-supervised finetuning (SFT) and reinforcement learning from AI feedback (RLAIF).

Preparing The Tuning Dataset
The tuning dataset should consist of a zip file containing one or more plaintext files or json conversations exported from MiniHF. Because the model might not be adapted to your style or document context yet, it might be more efficient to write out the first drafts of what you want in text files and then start using MiniHF after you've tuned the generator on them.
Tuning The Generator
Once you have the tuning dataset it's easy to make a generator LoRa from it with
the sft_generator.py script:
python3 sft_generator.py --user-dataset data.zip --model "EleutherAI/gpt-j-6b" --output example
Keep in mind that if your data is under 10 megabytes of tokens or so, other bulk pretraining data from the RedPajama dataset will be used to prevent overfitting.
Tuning The Evaluator
MiniHF doesn't currently support using the user data to tune the evaluator, but it
will soon. In the meantime you can make your own evaluator on bulk pretraining data
with the sft_evaluator.py script:
python sft_evaluator.py --output-dir example
RLAIF Tuning The Generator
Once you've finetuned your generator LoRa to narrow the hypothesis space (this step is optional, you can RL tune a base model directly to bootstrap if you don't have any training data) you can use RLAIF tuning to distill goals from the base models latent space into your generator using a value constitution.
python rlaif_generator.py --resume hermes --output-path hermes_rl --kl-weight 1.0 --constitution hermes/hermes_constitution.txt --prompts hermes/hermes_prompts.txt --length 256 --batch-size 2 --grad-accum-steps 8
Because both the generator and evaluator are separate LoRa models, it becomes possible for the base model to simultaneously hold two perspectives and update itself according to a value set without compromising those values. The evaluator is frozen while the generator is trained, preventing self updating from collapsing to a positive feedback loop.
Please note that the RLAIF tuning is still not robust, and if you tune too long the model converges to saying 'yes' with the current Yes/No zero-shot evaluator setup the model uses. This problem can hopefully be mitigated and then properly solved in future releases.
See the hermes directory for an example constitution and prompt set. In a future release you'll be able to use your MiniHF user dataset as a prompt database in addition to text files.
Philosophy
You gotta be promptmaxxing, you need to be lengthening your context window, your prompt needs to be so big it's its own finetune, you need to dream up an entire universe in which your prompt can take place, you need to dream so deep that your dreams have dreams.
— John David Pressman, Dec 10, 2022
MiniHF could be easily mistaken for a 'bag of tricks'. It incorporates features that have recently received a lot of attention like tree search and zero-shot reward modeling. A user might be tempted to believe the design was chosen by throwing together whatever seems trendy until something good emerges. Nothing could be further from the truth. MiniHF was written to realize a simple idea: Rather than just prompt language models for what can be inferred from existing documents, we should be inventing new kinds of document for these models that make it easy to infer the information we want. Every design element is meant to support this goal. This section is meant to help you productively use and improve MiniHF by explaining how.
Literature Simulators
When ChatGPT came out at the end of 2022 its unexpected popularity brought language models to a mass audience. Suddenly thousands of people were discovering the rabbit hole of language model prompting, and the strange capabilities lurking underneath ChatGPT's surface. ChatGPT could:
- Emulate a Unix terminal and utilities
- Rewrite Blake's Jerusalem in different poetic meters
- Write just about anything you want in Donald Trump's speaking style
- Write and debug computer code
- And much, much more
How is this possible? Skeptics claim that at best ChatGPT is a kind of 'stochastic parrot' that rearranges words and phrases, that it's learned the mere statistical correlation between different words at such a scale it fools the user into thinking it has a mind. To anyone who has used ChatGPT in good faith for more than 10 minutes this is an absurd claim. Indeed many critiques along these lines echo the chauvinistic impulses of Helen Keller's detractors. The statistical correlation generalization strategy could not do the things that ChatGPT does, no matter how you scaled it, any more than a massive Markov Chain could.
How it really works is much more interesting. When the network first begins learning the dataset it probably does use the statistical correlation strategy. This is of course the obvious thing to learn, and it can be picked up in bits and pieces. But eventually it stops working. There exist nuances of text that it would be supremely difficult to guess from mere correlation. In fact at some point the correlation strategy would become costly enough for the network that it becomes cheaper to start learning semantics. This is the basic theory behind deep learning: Create an information bottleneck through which a deep neural network has to predict some piece of information. The bottleneck means that what the network is given is much less than the original, and the size of the datasets involved ensures that memorization is a futile strategy. Under these conditions the network must learn to predict what should be present from the limited information given, so that the output is something vaguely related to the original. When we prompt such networks with our own unseen information, they hallucinate what they expect to find in our nonexistent documents.
Prompting the model then is an imaginative exercise: We must turn our minds eye toward the world and ask not just in what dusty tome or forgotten web page the information we want might exist, but what document with the potential to exist contai
Related Skills
node-connect
344.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
96.8kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
