SkillAgentSearch skills...

CoTForAlignment

Code for Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment (EMNLP 2025)

Install / Use

/learn @YunfanZhang42/CoTForAlignment
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Code for Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment (EMNLP 2025)

Authors: Yunfan Zhang, Kathleen McKeown, Smaranda Muresan

Large Language Models (LLMs) are typically trained to reflect a relatively uniform set of values, which limits their applicability to tasks that require understanding of nuanced human perspectives. Recent research has underscored the importance of enabling LLMs to support steerable pluralism -- the capacity to adopt a specific perspective and align generated outputs with it. In this work, we investigate whether Chain-of-Thought (CoT) reasoning techniques can be applied to building steerable pluralistic models. We explore several methods, including CoT prompting, fine-tuning on human-authored CoT, fine-tuning on synthetic explanations, and Reinforcement Learning with Verifiable Rewards (RLVR). We evaluate these approaches using the Value Kaleidoscope and OpinionQA datasets. Among the methods studied, RLVR consistently outperforms others and demonstrates strong training sample efficiency. We further analyze the generated CoT traces with respect to faithfulness and safety.

Notes

Please refer to Appendix A.4 for our detailed experiment setup, including software requirements. In particular, to replicate our RL experiments, please use verl commit 1e75fc04b5a7b2.

BibTeX

If you find our work useful, please cite:

@inproceedings{zhang-etal-2025-exploring-chain,
    title = "Exploring Chain-of-Thought Reasoning for Steerable Pluralistic Alignment",
    author = "Zhang, Yunfan  and
      McKeown, Kathleen  and
      Muresan, Smaranda",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.emnlp-main.1301/",
    pages = "25647--25660",
    ISBN = "979-8-89176-332-6",
    abstract = "Large Language Models (LLMs) are typically trained to reflect a relatively uniform set of values, which limits their applicability to tasks that require understanding of nuanced human perspectives. Recent research has underscored the importance of enabling LLMs to support steerable pluralism {---} the capacity to adopt a specific perspective and align generated outputs with it. In this work, we investigate whether Chain-of-Thought (CoT) reasoning techniques can be applied to building steerable pluralistic models. We explore several methods, including CoT prompting, fine-tuning on human-authored CoT, fine-tuning on synthetic explanations, and Reinforcement Learning with Verifiable Rewards (RLVR). We evaluate these approaches using the Value Kaleidoscope and OpinionQA datasets. Among the methods studied, RLVR consistently outperforms others and demonstrates strong training sample efficiency. We further analyze the generated CoT traces with respect to faithfulness and safety."
}
View on GitHub
GitHub Stars5
CategoryDevelopment
Updated21d ago
Forks0

Languages

Python

Security Score

85/100

Audited on Mar 8, 2026

No findings