Metamotivo
The first behavioral foundation model to control a virtual physics-based humanoid agent for a wide range of whole-body tasks.
Install / Use
/learn @facebookresearch/MetamotivoREADME
Meta Motivo
Overview
This repository provides a PyTorch implementation and pre-trained models for Meta Motivo. For details see the paper Zero-Shot Whole-Body Humanoid Control via Behavioral Foundation Models.
Features
- We provide 6 pretrained FB-CPR models for controlling the humanoid model defined in HumEnv.
- Fully reproducible scripts for evaluating the model in HumEnv.
- Fully reproducible FB-CPR training code in HumEnv for the full results in the paper, and FB training code in DMC for faster experimentation.
Installation
The project is pip installable in your environment.
pip install "metamotivo[huggingface,humenv] @ git+https://github.com/facebookresearch/metamotivo.git"
It requires Python 3.10+. Optional dependencies include humenv["bench"] and huggingface_hub for testing/training and loading models from HuggingFace.
Pretrained models
For reproducibility, we provide all the 5 models (metamotivo-S-X) we trained for producing the results in the paper, where each model is trained using a different random seed. We also provide our largest and most performant model (metamotivo-M-1), which can also be interactively tested in our demo.
| Model | # of params | Download | | :--- | :---: | :---: | | metamotivo-S-1 | 24.5M | link | | metamotivo-S-2 | 24.5M | link | | metamotivo-S-3 | 24.5M | link | | metamotivo-S-4 | 24.5M | link | | metamotivo-S-5 | 24.5M | link | | metamotivo-M-1 | 288M | link |
Quick start
Once the library is installed, you can easily create an FB-CPR agent and download a pre-trained model from the Hugging Face hub. Note that the model is an instance of torch.nn.Module and by default it is initialized in "inference" mode (no_grad and eval mode).
We provide some simple code snippets to demonstrate how to use the model below. For more detailed examples, see our tutorials on interacting with the model, running an evaluation, and training from scratch.
Download the pre-trained models
The following code snippet shows how to instantiate the model.
from metamotivo.fb_cpr.huggingface import FBcprModel
model = FBcprModel.from_pretrained("facebook/metamotivo-S-1")
Download the buffers
For each model we provide:
- The training buffer (that can be used for inference or offline training)
- A small reward inference buffer (that contains the minimum amount of information for doing reward inference)
from huggingface_hub import hf_hub_download
import h5py
local_dir = "metamotivo-S-1-datasets"
dataset = "buffer_inference_500000.hdf5" # a smaller buffer that can be used for reward inference
# dataset = "buffer.hdf5" # the full training buffer of the model
buffer_path = hf_hub_download(
repo_id="facebook/metamotivo-S-1",
filename=f"data/{dataset}",
repo_type="model",
local_dir=local_dir,
)
hf = h5py.File(buffer_path, "r")
print(hf.keys())
# create a DictBuffer object that can be used for sampling
data = {k: v[:] for k, v in hf.items()}
buffer = DictBuffer(capacity=data["qpos"].shape[0], device="cpu")
buffer.extend(data)
The FB-CPR model
The FB-CPR model contains several networks:
- forward net
- backward net
- critic net
- discriminator net
- actor net
We provide functions for evaluating these networks
def backward_map(self, obs: torch.Tensor) -> torch.Tensor: ...
def forward_map(self, obs: torch.Tensor, z: torch.Tensor, action: torch.Tensor) -> torch.Tensor: ...
def actor(self, obs: torch.Tensor, z: torch.Tensor, std: float) -> torch.Tensor: ...
def critic(self, obs: torch.Tensor, z: torch.Tensor, action: torch.Tensor) -> torch.Tensor: ...
def discriminator(self, obs: torch.Tensor, z: torch.Tensor) -> torch.Tensor: ...
We also provide simple functions for prompting the model and obtaining a context vector z representing the task to execute.
#reward prompt (standard and weighted regression)
def reward_inference(self, next_obs: torch.Tensor, reward: torch.Tensor, weight: torch.Tensor | None = None,) -> torch.Tensor: ...
def reward_wr_inference(self, next_obs: torch.Tensor, reward: torch.Tensor) -> torch.Tensor: ...
#goal prompt
def goal_inference(self, next_obs: torch.Tensor) -> torch.Tensor: ...
#tracking prompt
def tracking_inference(self, next_obs: torch.Tensor) -> torch.Tensor:
Once we have a context vector z we can call the actor to get actions. We provide a function for acting in the environment with a standard interface.
def act(self, obs: torch.Tensor, z: torch.Tensor, mean: bool = True) -> torch.Tensor:
Note that these functions do not allow gradient computation and use eval mode since they are expected to be used for inference (torch.no_grad() and model.eval()). For training, you should directly access the class attributes. For training we also define target networks for the forward, backward and critic networks.
Execute a policy
This is the minimal example on how to execute a random policy
from humenv import make_humenv
from gymnasium.wrappers import FlattenObservation, TransformObservation
import torch
from metamotivo.fb_cpr.huggingface import FBcprModel
device = "cpu"
env, _ = make_humenv(
num_envs=1,
wrappers=[
FlattenObservation,
lambda env: TransformObservation(
env, lambda obs: torch.tensor(obs.reshape(1, -1), dtype=torch.float32, device=device), env.observation_space # For gymnasium <1.0.0 remove the last argument: env.observation_space
),
],
state_init="Default",
)
model = FBcprModel.from_pretrained("facebook/metamotivo-S-1")
model.to(device)
z = model.sample_z(1)
observation, _ = env.reset()
for i in range(10):
action = model.act(observation, z, mean=True)
observation, reward, terminated, truncated, info = env.step(action.cpu().numpy().ravel())
Evaluation in HumEnv
For reproducibility of the paper, we provide a way of evaluating the models using HumEnv. We provide wrappers that can be used to interface Meta Motivo with humenv.bench reward, goal and tracking evaluation.
Here is an example of how to use the wrappers for reward evaluation:
from metamotivo.fb_cpr.huggingface import FBcprModel
from metamotivo.wrappers.humenvbench import RewardWrapper
import humenv.bench
model = FBcprModel.from_pretrained("facebook/metamotivo-S-1")
# this enable reward relabeling and context inference
model = RewardWrapper(
model=model,
inference_dataset=buffer, # see above how to download and create a buffer
num_samples_per_inference=100_000,
inference_function="reward_wr_inference",
max_workers=80,
)
# create the evaluation from humenv
reward_eval = humenv.bench.RewardEvaluation(
tasks=["move-ego-0-0"],
env_kwargs={
"state_init": "Default",
},
num_contexts=1,
num_envs=50,
num_episodes=100
)
scores = reward_eval.run(model)
You can do the same for the other evaluations provided in humenv.bench. Please refer to tutorial_benchmark.ipynb for a full evaluation loop.
Rendering a reward-based or tracking policy
We show how to render an episode with a reward-based policy.
import os
os.environ["OMP_NUM_THREADS"] = "1"
from humenv import STANDARD_TASKS
import mediapy as media
task = STANDARD_TASKS[0]
model = FBcprModel.from_pretrained("facebook/metamotivo-S-1", device="cpu")
rew_model = RewardWrapper(
model=model,
inference_dataset=buffer, # see above how to download and create a buffer
num_samples_per_inference=100_000,
inference_function="reward_wr_inference",
max_workers=40,
process_executor=True,
process_context="forkserver"
)
z = rew_model.reward_inference(task)
env, _ = make_humenv(num_envs=1, task=task, state_init="DefaultAndFall", wrappers=[gymnasium.wrappers.FlattenObservation])
done = False
observation, info = env.reset()
frames = [env.render()]
while not done:
obs = torch.tensor(observation.reshape(1,-1), dtype=torch.float32, device=rew_model.device)
action = rew_model.act(obs=obs, z=z).ravel()
observation, reward, terminated, truncated, info = env.step(action)
frames.append(env.render())
done = bool(terminated or truncated)
media.show_video(frames, fps=30)
It is also easy to render a policy for tracking a motion.
import os
os.environ["OMP_NUM_THREADS"] = "1"
from metamotivo.wrappers.humenvbench import TrackingWrapper
from pathlib import Path
from humenv.misc.motionlib import MotionBuffer
model = FBcprModel.from_pretrained("facebook/metamotivo-S-1", device="cpu")
track_model = TrackingWrapper(model=model)
motion_buffer = MotionBuffer(files=ADD_THE_DESIRED_MOTION, base_path=ADD_YOUR_MOTION_ROOT, keys=["qpos", "qvel", "observation"])
ep_ = motion_buffer.get(motion_buffer.get_
Related Skills
node-connect
340.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.2kCommit, push, and open a PR
