SkillAgentSearch skills...

LIME

Official code for paper LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

Install / Use

/learn @tonywu95/LIME
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

This is the official code repository for LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning

There are two dirs under this file:

  • reason/generate_data.py generates all but one task introduced in LIME (see Appendix B): induct, deduct, abduct, induct_v2, induct_v3, induct_rewrite, rewrite.
  • rewrite_multi/generate_data.py generates two variants of rewrite_multi_step introduced in LIME, Appendix B.1. The version "rewrite_multistep_hard" is the one described in the paper, which is also recommended to use.

Arguments of generate_data.py:

  • To generate synthetic data, we recommend generating 5 to 10 M examples, specified by the arg num_train.

  • To generate a mixed of tasks, one simply put multiple task names after arg mode. For example: python reason/generate_data.py --mode induct deduct abduct generates examples that's a mixed of three tasks, which is called task "mixed" in the paper.

  • The arg vocab_size is the vocab size of the synthetic task. In LIME, we had a discussion about the effect of this in Appendix C.1. It seems matter somewhat. Empirically we observed that the closer to the downstream task's vocab size, the better. But for larger vocab size it becomes harder to train. So we recommend vocab_size 1000, and it seems to work well for most of our tasks. Usually, it takes 3-5M steps to converge for vocab_size 1000, with batch size 444096.

Related Skills

View on GitHub
GitHub Stars29
CategoryEducation
Updated11mo ago
Forks5

Languages

Python

Security Score

67/100

Audited on Apr 7, 2025

No findings