SkillAgent Search skills...⌘K

InstructGPT

For experiments involving instruct gpt. Currently used for documenting open research questions.

Generate Convert Improve

Install / Use

/learn @CarperAI/InstructGPT

About this skill

Quality Score

0/100

Category

Education & Research

Supported Platforms

Universal

README

BigModelName

This repository is for open-questions relating to RLHF and InstructGPT as pertaining to BigModelName.

Open Questions

What is the preference rate of PPO vs PPO-Ptx? Why was 27.8 chosen as the mixing factor between the pre-training gradients and the PPO gradients?
What do the gradient norms and gradient noise scales look like for PPO grads vs pre-training grads?
How important is SFT pretraining on human-written completions?

CarperAI

View profile

GitHub Stars71

CategoryEducation

Updated4mo ago

Forks4

CarperAI/InstructGPT

Security Score

92/100

Audited on Nov 19, 2025

No findings