InstructGPT
For experiments involving instruct gpt. Currently used for documenting open research questions.
Install / Use
/learn @CarperAI/InstructGPTREADME
BigModelName
This repository is for open-questions relating to RLHF and InstructGPT as pertaining to BigModelName.
Open Questions
- What is the preference rate of PPO vs PPO-Ptx? Why was 27.8 chosen as the mixing factor between the pre-training gradients and the PPO gradients?
- What do the gradient norms and gradient noise scales look like for PPO grads vs pre-training grads?
- How important is SFT pretraining on human-written completions?
View on GitHub92/100
Security Score
Audited on Nov 19, 2025
No findings
