SkillAgentSearch skills...

AnthropomorphicIntelligence

Advancing AI by embracing human-likeness for better AI understanding, human–AI collaboration, and social simulation, bridging technology and genuine human experience.

Install / Use

/learn @microsoft/AnthropomorphicIntelligence
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Anthropomorphic Intelligence

Towards Human-Like AI: The Shaping of Artificial Mind


Table of Contents


Background

Recent years have witnessed remarkable advancements in AI’s ability to perform objective tasks—ranging from mathematics and coding to various forms of logical reasoning. However, as we approach a society of human-AI symbiosis, crucial dimensions of human-like cognition, ideology, and consciousness—essential for rich, meaningful human-AI interactions—have remained underexplored.

Looking forward, AI agents that can understand, empathize with, and support users—much like close family and friends—will increasingly deliver value across business and societal domains. Companion-like AI systems that offer emotional support, proactive engagement, and personalized experiences have the potential to act as digital humans in diverse scenarios, such as AI non-player characters (NPCs), Social companions, Virtual Tubers, Digital colleagues, Personal coaching, Large-scale societal simulations.


Project Goals

Unlike efforts focused on super-intelligent AI, this project centers on evaluating and promoting anthropomorphic intelligence—AI agents with a human-like mindset and a degree of awareness, capable of acting proactively and autonomously. Our goals are to:

  • Equip AI agents with cognitive traits and social reasoning capacities inspired by humans.
  • Enable rich, sustained social interactions and collaboration between AI agents, humans, and other AI agents.
  • Provide personalized, human-preferred services in business and societal domains.

This repository gathers and develops various techniques that contribute toward these objectives.


Techniques & Sub-Projects

1. PCC: Embedding-based Context Compression

PCC is a technique inspired by human cognitive patterns, aiming to enable large language models (LLMs) to efficiently process long contexts by converting context signals into compact, dense representations. This decoupled compressor-LLM framework leverages embedding-based context compression to significantly reduce inference costs while maintaining essential contextual information and accuracy. Thorough pretraining and adaptive compression rates allow PCC to improve LLM efficiency across various tasks, models, and domains—making it well-suited for real-world applications, especially in resource-constrained environments.

Key Features:

  • Embedding-based condensed compression for efficiency
  • Decoupled compressor-LLM architecture with downstream LLM untouched
  • Adaptability to various LLMs and downstream tasks

2. MotiveBench: Benchmarking Human-like Motivation

MotiveBench is a comprehensive benchmark designed to evaluate and advance the ability of AI agents to demonstrate human-like motivations and proactive behaviors. By presenting 200 rich contextual scenarios and 600 reasoning tasks across multiple motivational levels—including emotional, social, and practical drivers—MotiveBench rigorously tests whether LLMs can autonomously identify and pursue meaningful actions, not just respond reactively. Analysis across multiple popular model families reveals key challenges, such as reasoning about “love & belonging” motivations, and highlights the current gap between AI and true human-like motivational reasoning.

Key Features:

  • 200 rich contextual scenarios and 600 reasoning tasks
  • Multiple levels of motivation, including motivation reasoning, behavior reasoning, and behavior prediction.
  • Cross-model benchmarking and insights

3. SocialCC: Interactive Evaluation for Cultural Competence in Language Agents

SocialCC is a novel benchmark designed to evaluate cultural competence through multi-turn interactive intercultural scenarios. It comprises 3,060 human-written scenarios spanning 60 countries across six continents. Through extensive experiments on eight prominent LLMs, our findings reveal a significant gap between the cultural knowledge stored in these models and their ability to apply it effectively in cross-cultural communication.

Key Features:

  • 3,060 diverse intercultural scenarios spanning 60 countries across six continents.
  • Three core evaluation dimensions: cultural awareness, cultural knowledge, and cultural behaviour.
  • Interactive multi-turn assessment that measures cultural competence in dynamic, context-rich social interactions.
  • Comprehensive cross-model analysis identifying misinterpretation of implicit cultural cues and inconsistent handling of value conflictss.

4. LearnArena: Benchmarking Learning Ability

LearnArena is a cognitively grounded benchmark for assessing how LLMs learn—not just solve static tasks—across three dimensions: Learning from Instructor (interactive feedback), Learning from Concept (rule summaries), and Learning from Experience (self-selected trajectory reuse). Built on a modified TextArena setup, it standardizes a two-player loop where the evaluated model plays 20 matches per environment, receives teacher feedback, conditions on concise rules, and leverages prior games as in-context examples.

Key Features:

  • Three learning dimensions: instructor feedback (LfI), concept summaries (LfC), experience trajectories (LfE)
  • Unified protocol: 8 environments, 20 matches per model, fixed teacher opponent, win-rate metric
  • Cross-model benchmarking and insights on scale limits, instructor quality, and few- vs. many-shot behavior

5. PersonaArena: Role-Play Simulation and Evaluation

PersonaArena is a dynamic simulation framework for evaluating persona-level role-playing in LLMs. It builds persona-grounded social scenes, runs multi-turn interactions among a narrator, a protagonist model, and NPCs, and records full action–dialogue trajectories. A multi-agent debating judge then evaluates persona fidelity, coherence, and adaptability, producing detailed and aggregated metrics that support rigorous comparison and improvement.

Key Features:

  • A persona-grounded social simulation framework that elicits behaviors via dynamic, multi-turn interactions
  • A multi-agent debating judge for holistic and unbiased evaluation of role-playing quality
  • Elicited data that can be used for targeted post-training to improve persona consistency and realism

6. HumanLLM: Towards Personalized Understanding and Simulation of Human Nature

HumanLLM is a human-centric foundation model designed to enable large language models to understand and simulate individual human behaviors, cognition, and preferences. Built upon the Cognitive Genome Dataset, which aggregates millions of real-world user records from platforms such as Reddit, Twitter, Blogger, and Amazon, HumanLLM learns the relationship between a person’s identity, their environment, and resulting actions. Through a diverse set of training tasks—covering persona understanding, social reasoning, and personalized generation—the model is trained to predict user actions and inner thoughts, mimic user writing styles and preferences, and generate authentic user profiles. Extensive evaluations across in-domain tasks and out-of-domain social intelligence benchmarks demonstrate that HumanLLM significantly improves models’ ability to model human behavior and generate realistic, personalized responses.

Key Features:

  • A large-scale Cognitive Genome Dataset constructed from real-world user records across multiple platforms, supported by a rigorous multi-stage pipeline including data filtering, data synthesis, and automated quality control to produce high-quality behavior logs for training.

  • A model-agnostic multi-task training paradigm that enhances LLMs’ social intelligence through diverse tasks, including profile generation, scenario generation, social question answering, writing imitation, personalized commenting, and preference prediction.

  • HumanLLM achieves superior performance in predicting user actions and inner thoughts, more accurately mimics user writing styles and preferences, and generates more authentic user profiles compared to base models. Furthermore, it shows significant gains on out-of-domain social intelligence benchmarks such as MotiveBench and ToMBench, indicating enhanced generalization.


7. Proact-VL: A Proactive VideoLLM for Real-Time AI Companions

Proact-VL is a general framework that shapes multimodal language models into proactive, real-time interactive agents capable of human-like environment perception and interaction. It is built on multiple backbone models (Qwen2-VL, Qwe

View on GitHub
GitHub Stars84
CategoryDevelopment
Updated15h ago
Forks9

Languages

Python

Security Score

95/100

Audited on Mar 30, 2026

No findings