Daaf

DAAF, the Data Analyst Augmentation Framework: An open-source, extensible workflow for Claude Code that allows skilled researchers to rapidly scale their expertise and accelerate data analysis by as much as 5-10x -- without sacrificing the transparency, rigor, or reproducibility demanded by our core scientific principles.

Generate Convert Improve

Install / Use

/learn @DAAF-Contribution-Community/Daaf

About this skill

Quality Score

0/100

README

Summary: What is DAAF?

DAAF, the Data Analyst Augmentation Framework, is an open-source, extensible workflow for Claude Code that allows skilled researchers to rapidly scale their expertise and accelerate data analysis by as much as 5-10x -- without sacrificing the transparency, rigor, or reproducibility demanded by our core scientific principles. Install and begin using it in as little as 10 minutes from a fresh computer with a high-usage Anthropic account.

DAAF explicitly embraces the fact that LLM research assistants will never be perfect and can never be trusted as a matter of course. But by providing strict guardrails, enforcing best practices, and ensuring the highest levels of auditability possible, DAAF ensures that LLM research assistants can still be immensely valuable for critically-minded researchers capable of verifying and reviewing their work. In energetic and vocal opposition to the deeply misguided attempts to replace human researchers, DAAF is intended to be a force-multiplying "exo-skeleton" for human researchers (i.e., firmly keeping humans-in-the-loop).

The base framework comes ready to analyze any or all of the 40+ foundational public education datasets available via the Urban Institute Education Data Portal, and is readily extensible to new data domains and methodologies with a suite of built-in tools to ingest new data sources and craft new Skill files at will (see 10-minute tutorial here).

With DAAF, you can go from a research question to a shockingly nuanced research report with sections for key findings, data/methodology, and limitations, as well as bespoke data visualizations, with only five minutes of active engagement time, plus the necessary time to fully review and audit the results. To that crucial end of facilitating expert human validation, all projects come complete with a fully reproducible, documented analytic code pipeline and consolidated analytic notebooks for exploration. Then: request revisions, rethink measures, conduct new subanalyses, run robustness checks, and even add additional deliverables like interactive dashboards, policymaker-focused briefs, and more -- all with just a quick ask to Claude. And all of this can be done in parallel with multiple projects simultaneously.

By open-sourcing DAAF as a forever-free and open and extensible framework (see more on Why open-source? What does it mean for DAAF? below), I hope to provide a foundational resource that the entire community of researchers and data scientists can use, benefit from, learn from, and extend via critical conversations and collaboration together. By pairing DAAF with an intensive array of educational materials, tutorials, blog deep-dives, and videos via project documentation and the DAAF Field Guide Substack (much, much more to come!!), I also hope to rapidly accelerate the readiness of the scientific community to genuinely and critically engage with AI disruption and transformation in our field writ large.

I don't want to oversell it: DAAF is far from perfect (much more on that below!). But it is already extremely useful, and my intention is that this is the worst that DAAF will ever be from now on given the rapid pace of AI progress and (hopefully) community contributions from here. What will tools like this look like by the end of next month? End of the year? In two years? Opus 4.6 and Codex 5.3 came out literally as I was writing this! The implications of this frontier, in my view, are equal parts existentially terrifying and potentially utopic. With that in mind – more than anything – I just hope all of this work can somehow be useful for my many peers and colleagues trying to "catch up" to this rapidly developing (and extremely scary) frontier.

Get the gist and just want to see it in action? Watch the 10-minute demo walking you through all the main functionalities of DAAF with the corresponding full sample project ready for your review (full archive here; main report file here). Never used Claude Code? No idea where you'd even start? My full installation guide walks you through every step -- but hopefully this video shows how quick a full DAAF installation can be from start-to-finish. Just 3mins!

Learn more about my vision for DAAF, what makes DAAF different from other attempts to create LLM research assistants, what DAAF currently can and cannot do as of today, how you can get involved, and how you can get started with DAAF yourself by diving in below.

User Documentation Table of Contents

00. README — [This document] Vision and purpose, project goals, what DAAF does and does not do, core design philosophy, acknowledgments
01. Installation & Quick Start — Get started! Installation prerequisites, step-by-step 5-minute setup, day-to-day usage, and troubleshooting
02. Understanding and Working with DAAF — Learn to work with DAAF for the first time: what to expect, how to use it, and how to test its strengths and limitations
03. Best Practices — Tips for working with Claude Code, writing effective prompts, ensuring quality and rigor with DAAF, reviewing outputs, and managing context
04. Extending DAAF — How to add new data source skills, analytical tools and methodologies, and creating your own additional specialized agents
05. Contributing to DAAF — Get involved in developing DAAF! How to file issues via GitHub, support expanding the capabilities of the framework, contribute to educational tutorials and how-to's, and more!
06. FAQ: Philosophy — Grapples with the broader implications of this work, AI automation in general, model advancement pace, approaching the "exponential", environmental ethics, what this means for the next generation of researchers, and more
07. FAQ: Technical Support — Covers frequently asked questions about Docker, issues with Claude Code, usage limits, authentication errors, and other common errors

README Table of Contents

Summary: What is DAAF?
Vision & Purpose
What DAAF can do today
What DAAF can do with your help
Why open-source? What does it mean for DAAF?
Recommended Next Steps
Acknowledgments
About the Author

Vision & Purpose

This project attempts to answer a critical question facing all of scientific research and data science today: How can responsible researchers use modern AI/LLM agents to meaningfully accelerate complex quantitative data analysis tasks while maintaining the rigor and reproducibility that good science demands?

I firmly believe any real answer to this question in the current LLM-centric paradigm of AI will need to fulfill four core requirements to be useful:

1. Transparent: Because LLMs will always be susceptible to lying, hallucinating, and cutting corners, researchers need to be able to easily audit and inspect everything an LLM produces at every step
2. Scalable: Because most LLMs are trained as generalists susceptible to sycophancy and overconfidence, researchers need to be able to easily provide structured, targeted expertise and guidance to LLM agents at scale -- injecting the right information at the right time for any specific task the LLM engages with, using minimal effort
3. Rigorous: Because LLMs can work at speeds orders of magnitude faster than humans, researchers need to be assured that the general output of any LLM assistants are high-enough quality by default to be worth producing and reviewing -- minimizing (but not eliminating) the share of work produced that is AI slop
4. Reproducible: Because good science needs to be reproducible, researchers need to be able to reproduce everything LLMs do on their behalf through well-documented and executable code from start-to-finish

DAAF is a set of forever free, completely open-source, and highly structured workflows to operationalize and enforce those four core requirements when using Claude Code, a popular LLM agent program built on Anthropic's Claude AI. Any researcher willing to pay for a high-usage Anthropic account can begin using this framework in as little as 10 minutes (a regrettably high barrier-to-entry with prices starting at $100-200/mo given the resource-intensivity of the frontier models capable of this work). DAAF ultimately allows researchers to more confidently and productively leverage Claude Code for rigorous research work by addressing each core requirement in its fundamental design principles:

1. Transparent: DAAF forces Claude Code to operate using file-first principles: All data inspections, operations, or analyses it conducts are drafted and then run first as actual Python files, enforcing full transparency into everything it does data-wise. Any and all "thinking" it does during the analytic process are