IMO Lean formalization conventions

This repository presents my (Joseph Myers's) suggestions for appropriate conventions for Lean formalizations of IMO problem statements, along with examples of problem statements formalized following those conventions.

It represents my personal views on appropriate conventions for such formalizations (along with a set of formalizations intended to be written uniformly, by a single person, following those conventions) and is not associated with any project to create such formalizations or to use such formalizations as a challenge for human or AI problem solvers or formalizers. In parts it reflects my dissatisfaction with the lack of public documented conventions for such issues, or with some particular choices that have been made in the past, in existing public collections of such problems or solutions.

People are welcome to use and adapt these conventions for such projects (including ones for non-Lean ITPs) if they so wish. Problem statements here may also be of use for projects such as Compfiles, though it is important to check statements carefully and report any mistakes, given the high risk of making mistakes in formal statements when those statements are not validated by writing a formal solution based on a human solution. There are many common pitfalls to consider when writing Lean.

The conventions here draw on my experience with formalizing some past problem statements and solutions, including both formalizations of IMO problems with solutions for the mathlib archive, and formalizations of statements of all the non-geometry IMO 2024 shortlist problems (with solutions in only a handful of cases) ahead of that IMO while chairing the Problem Selection Committee. However, those past formalizations do not necessarily follow these conventions, both because my views on appropriate conventions evolved based on the experience writing those formalizations, and because of cases where appropriate conventions in the context of a solution seem different from appropriate conventions for a problem stated on its own (for potential use as a challenge by human or AI solvers or formalizers) without a corresponding formal solution present.

These conventions and associated formalizations may change over time (especially as the conventions expand to cover more issues and the formalizations are adapted to follow the conventions more closely), and more formalizations may be added. Corrections to mistakes in the formalizations are particularly welcome, as is pointing out areas of inconsistency between the formalizations and the conventions, or areas not covered by the conventions that should be so covered. Formalizing solutions to the formal statements here is encouraged as a way of checking the correctness of the statements; however, such solutions are out of scope for this repository so should go in other public repositories instead.

Where existing public repositories such as the mathlib archive and Compfiles do not have a formal solution to a problem, those may be appropriate locations for such a solution; I strongly encourage first contributing any more generally relevant definitions and lemmas to mathlib proper since that is one of the main benefits of formalizing competition problem solutions. Where such repositories do have a formal solution, it may be a useful basis for a formal solution to the formal statement here, whether through adapting it to prove my version of the statement directly, or through adding a proof that one version implies the other.

The formalizations here are intended to be based directly on the English wording of problems as originally given to contestants at the IMO. Note that most pre-2006 papers on imo-official are retyped versions, sometimes paraphrased rather than reflecting the original wording; if you'd like me to consider pre-2006 papers for addition to the formalizations here, please contribute scans of papers actually given to contestants for my collection of such scans.

By releasing such conventions I hope to encourage other people writing such formalizations to consider (and document) what conventions they wish to use, especially where a fully literal translation into Lean might involve making choices that are less idiomatic for Lean code (and so more awkward to work with when using mathlib APIs). I especially commend the following points from these conventions to people establishing their own conventions:

(a) use appropriate separate definitions within the problem statement when appropriate (so requiring any AI solvers to be able to handle a context of multiple top-level definitions rather than having all non-library content within a single theorem for which a proof is to be filled in, with local definitions artificially converted into hypotheses of that theorem, as was done for AlphaProof at IMO 2024; being able to handle local definitions is also likely to be important for practical usability of an AI solver in many contexts);

(b) state problems in standard mode (any answer to be determined by a human in the original competition is also to be determined by AI solvers using the formal statement) rather than easy mode (formal statement tells a solver what answer is to be proved), and generally make the formal statements reflect the entirety of what humans are to prove (so also do not ask for only one direction of an if-and-only-if, as some benchmark problems have been known to do, for example);

(c) when any problem is most naturally stated using a mathlib-appropriate definition of a standard concept, contribute that definition and associated API to mathlib rather than having a local definition as part of the problem (or in the case of challenges intended to be based on future problems, contribute the definitions most likely to be relevant to mathlib in advance to reduce the risk of needing a local definition for a future problem statement).

Anyone setting a challenge based on formal problem statements should note that having detailed, public formalization conventions would assist entrants in creating additional examples using those conventions as training material. Compare, for example, the detailed documentation of both English and LaTeX conventions used for problems in the AI Mathematical Olympiad Progress Prize 2.

These conventions are written specifically for IMO problems and the concepts that appear in such problems. It is likely that additional conventions would be appropriate for different competitions covering other subject material (especially those at undergraduate level, for example). If a challenge is based on new problems not originally set for humans in informal language, it might be appropriate in some cases to go further in the direction of idiomatic Lean rather than trying to following any particular English version closely.

Although having such public conventions stated well in advance of any such challenge would help make it fair rather than potentially biased to entrants who happened to prefer statements more similar to those used in the final challenge (and allowing multiple entrants each to produce their own Lean statements would also result in potential bias, as well as being inconsistent with a single natural language normally only having a single version of the problems for human contestants at the IMO), there are of course also valid and important uses for formalizations that deliberately do not follow any kind of uniform conventions. In particular, any kind of practical AI research assistant that could be given problems to attempt by human mathematicians should not expect such statements (if given in formal form at all rather than informal form) to follow any consistent conventions, so should be robust to a wide range of styles in problem statements, which may mean training on a wide range of styles is also beneficial. Even in such cases, it is still important to consider questions such as the handling of junk values, to ensure that the formal statement actually has the correct mathematical content.

These conventions are mainly concerned with problem statements rather than with the details of how to validate the correctness of a claimed formal solution in any kind of automated way; there are various subtleties about such validation and, as of July 2025, such validation is not as convenient as it might be. A solution should have a type definitionally equal to that in the problem statement; for "determine" problems, this applies with the answer substituted, and the answer must satisfy human inspection as being legitimate rather than a restatement of the problem; care must be taken about ensuring that neither meta code nor instances in the claimed solution affect the interpretation of the problem or the claimed answer (possibly by comparing a proof exported from the claimed solution when executed in a sandbox environment with the type of the statement in its own file with the text of a human-validated answer substituted into that file). A solution should depend only on the standard three axioms (in particular, Lean.ofReduceBool, as generated by native_decide, is not typically considered legitimate in validating formalized mathematics, and prevents passing an exported solution though an external checker).

Conventions for problems versus solutions

The conventions here are intended for problems stated on their own rather than problem statements with accompanying solutions. There are a few main ways in which I think conventions for a problem statement with a sol

IMOLean

Install / Use

README

IMO Lean formalization conventions

Conventions for problems versus solutions