EliteFurretAI

Attempt to create a superhuman bot to play VGC

Generate Convert Improve

Install / Use

/learn @caymansimpson/EliteFurretAI

About this skill

Quality Score

0/100

README

EliteFurretAI

The goal of this project is to build a superhuman bot to play Pokemon VGC. It is not to further research, nor is it to build a theoretically sound approach -- the goal is to be the best that no one ever was. We will only contribute to research or take sound approaches if it will help us towards our ultimate goal.

Table of Contents:

Goals & Priorities
Summary of the VGC Problem Space
Current Proposed Approach
Multi-Headed Transformer Model
Why the name?
Resources
Contributors & Acknowledgements

AI Pokeball

Goals and Priorities

This project is pretty big, and so there is a sequence of milestones we want to accomplish:

Basic Foundations: We want to build simple utilities extending off of poke-env to make it easier to build a VGC RL or supervised learning bot off-the-shelf for me and researchers.
Build a VGC Bot: We want to build a bot using the above utilities.
Derive Teambuilding: Once our bot gets to superhuman, we can use it and a sample of teams in the current Meta to derive an optimal team-building strategy via brute force.
Create Furret-based teams: With the above, we can contain our bot to force it to have and bring Furret in matchups to help derive the most optimal usage of this monster of a Pokemon. Imagine a world in which Furret dominates a VGC meta!
Incorporate into games: With a strong bot, playing Pokemon will become intensely challenging and strategic.

Summary of the VGC Problem Space

In the purest sense, a VGC battle is an imperfect information zero-sum two player game with a very large simultaneous action space, complex game mechanics, a high degree of stochasticity and an effectively continuous state space.
- VGC is an incredibly difficult AI problem. The fact that there is a large pool of top players (and they’re hard to sort) demonstrates the difficulty of this problem even for humans.
After reading a wide array of literature, we suggest we should tackle VGC directly (instead of through Singles) because of the 40x action space, 3000x branching factor and the additional importance given to game interactions. These factors necessitate that an agent more deeply understands game mechanics and be more computationally efficient.
Given these properties of VGC and top existing bots, we will attempt to use a model-based search algorithm with depth-limited + heavily pruned search and a Nash Equilibrium-based value function that does not assume unobservable information. We plan to initialize our agent with human data and train using self-play.
- There is still quite a lot we need to understand about specifically how VGC behaves in order to make more informed algorithmic choices, and so this approach is very likely to change as we learn more.
Industry’s dominance in making State of the Art agents demonstrates that with enough talent, capacity and infrastructure, virtually all problems with VGC’s nature can be solved. However, assessing the current state of resources available to us, the current bottlenecks for developing a successful agent is (in order):
- Talent – Very few agents have seen dedicated and organized support over a span more than 12 months; having a dedicated and organized team is crucial.
- Engine – Faster pokemon engine with ability to simulate (where we can control RNG). This is being worked on by pmarglia.
- Capacity – CPU for generating training data, GPU for inference
- Human Training Data – while not essential, this will accelerate training convergence by orders of magnitude, reduce capacity needs and accelerate our own internal learning speed tremendously. It will also help our bot transition to playing humans more easily.

Current Proposed Approach

From our synthesis of available literature, we’ve gleaned:

Model-free alone is unlikely to produce superhuman performance without the capacity that we don’t have available
Search is necessary for decision-time planning, and game abstractions are necessary to make search tractable
The behavior of VGC from a game-theoretic perspective is still unknown, and theory might not help the practical purposes of making a superhuman bot.

Because of this last point, any approach we suggest pre-hoc is very likely to change as we learn more about what works in practice and how VGC behaves. That being said, we feel the best approach will likely be both of:

Policy-based – based on Nash Equilibrium using Deep Learning to create the best policy/value networks that generalize to the game well. This allows for most flexibility for decision-time planning. These will likely have to be from a combination of classic self-play RL and imitation learning.
Search-based – during decision-time planning, we should expore MCTS guided by our Policy and Value networks. This allows us to better deal with nuances of game mechanics that RL might not be able to fully grasp. We can use different types of game abstractions to speed up this process and make it more tractable. This will unequivocally be critical given the game mechanic complexity and high cost of mistakes in VGC; RL with our current resources will unlikely be sufficient.

Ultimately, we think that Search-based will be the quickest way to get to peak human levels, and that policy-based (or methods that combine policies with search) will get to superhuman performance.

There is quite a lot of complexity in the above, and we encourage you to check out the doc linked above if you want to learn more about the sequencing of steps and models to build out the above.

Where am I right now?

Currently, I've built supervised deep learning models that predicts a human's action with the following accuracy (~135M parameters each):

Teampreview:
- Top-1: 99.9%
- Top-3: 99.9%
- Top-5: 99.9%
- Takeaway: The dataset has little strategic variation; analysis shows that for a given team composition, 88.6% of the time, a player will choose the same teampreview choice. This means our model is just creating a look-up table. This model is overfit, but is the nature of this dataset.
Turn Actions (Move Choice):
- Top-1: 28.9%
- Top-3: 43.5%
- Top-5: 53.8%
- Takeaway: Predicting the exact move a human makes is difficult due to playstyle variety and "rock-paper-scissors" scenarios. However, the Top-5 accuracy suggests the model consistently identifies the pool of reasonable moves.
Win Advantage:
- Correlation: 0.856
- Accuracy w/ Win Prediction: 68.7%
- Brier Score: 0.2026
- Takeaway: The win model is doing a decent job at predicting state advantage, better than the average state evaluation used by Foul Play.
Unified Model:
- Win Correlation: 0.726
- Top 1/3/5/10 Move Accuracy: 26%/40%/48%
- Top 1/3/5 Teampreview Accuracy: 79%/95%/99%
- Takeaway: This model is a single well-rounded model that balances all tasks. It doesn't overfit on teampreview and slightly underperforms against models that are trained on a single objective.

Now, I'm starting with RL -- for now, I'm going with the following approach:

RNaD based on the recent superhuman performance of the algorithm in Stratego, which is very similar to Pokemon
Arena training vs self, ghosts (past selves), exploiters and BV model -- to ensure our agent learns robust strategies with many teams (this will be key for learning how to play with Furret and future teambuilding applications)
Portfolio-based regularization -- I'll be experimenting with regularizing against a portfolio of models, vs a single reference model.

Why the name EliteFurretAI?

As mentioned above, the penultimate goal of this work is to make Furret central to the VGC meta. Because Nintendo refuses to give Furret the Simple/Shadow Tag/Pure Power/Adaptability/Prankster buffs it desperately needs, only a superhuman AI will be able to build around this dark horse and use it in a way that unleashes its latent potential. This bot is the first step to doing so; once it can appropriately accurately value starting positions, we can use it to start building teams with basic meta stats.

Eventually, we hope that this AI can be used to build and use a competitive team centered around Furret -- one that will be deserving of surpassing all Elite Fours, and even potentially replacing in-game AI. Hence the name "EliteFurret". We chose to stick with AI at the end of the name so players internalize they are being owned by a robot that profoundly understands the capabilities of this monster.

OG Furret

Resources

More details on this approach, thinking and understanding that led to everything in this README can be found here.

Contributors & Acknowledgements

It's definitely presumptuous to acknowledge people before EliteFurretAI amounts to anything, but I do have a couple of people I want to call out that have been instrumental to even getting this project off the ground.

First and foremost, a huge shoutout to hsahovic both for building poke-env, but also teaching me quite a lot about how to code better
Second, a shoutout to attraylor who brought me i

Related Skills

best-practices-researcher

The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app

last30days-skill

15.9k

AI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary

autoresearch

2.8k

Claude Autoresearch Skill — Autonomous goal-directed iteration for Claude Code. Inspired by Karpathy's autoresearch. Modify → Verify → Keep/Discard → Repeat forever.

omg-learn

Learning from user corrections by creating skills and patterns. Patterns can prevent mistakes (block/warn/ask) or inject helpful context into prompts