SkillAgentSearch skills...

Datatrees

Wrapper for dataclasses with auto field injection, binding, self-default and more

Install / Use

/learn @owebeeone/Datatrees
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Datatrees

A wrapper to Python's dataclasses for simplifying class composition with automatic field injection, binding, self-defaults and more.

Datatrees is particularly useful for composing a class from other classes or functions in a hierarchical manner, where fields from classes deeper in the hierarchy need to be propagated as root class parameters. The boilerplate code for such a composition is often error-prone and difficult to maintain. The impetus for this library came from building hierarchical 3D models where each node in the hierarchy collected fields from nodes nested deeper in the hierarchy. Using datatrees, almost all the boilerplate management of model parameters was eliminated, resulting in clean and maintainable 3D model classes.

Installation

pip install datatrees

Core Features

  • Field Injection and Binding: Automatically inject fields from other classes or functions
  • Field Mapping: Map fields between classes with custom naming
  • Self-Defaulting: Fields can default based on other fields
  • Field Documentation: Field documentation is preserved through the injection chain
  • Post-Init Chaining: Automatically chain inherited post_init functions
  • Type Annotations: Typing support for static type checkers and shorthands for Node[T]

Exports:

  • datatree: The decorator for creating datatrees akin to dataclasses.dataclass(). It accepts all standard dataclasses.dataclass arguments (e.g., init, repr, eq, order, frozen, unsafe_hash, match_args, kw_only, slots, weakref_slot) in addition to datatrees-specific parameters like chain_post_init.
  • dtfield: The decorator for creating fields akin to dataclasses.field()
  • Node: The class for creating node factories. IMPORTANT: When accessed on an instance, a Node field is itself a callable factory that produces new instances of the target type on each invocation. For example, if obj.node_field is a Node[SomeClass], then obj.node_field() creates and returns a new SomeClass instance each time it's called.
  • field_docs: The function for getting the documentation for a datatree field
  • get_injected_fields: Produces documentation on how fields are injected and bound

Datatrees as a Domain-Specific Composition API

The Challenge of Complex Object Composition

Traditional Python object composition often leads to verbose, error-prone boilerplate code. Consider a typical manual approach:

# Without datatrees - verbose and error-prone
class DatabaseConfig:
    def __init__(self, host="localhost", port=5432, database="mydb", 
                 username="user", password="pass", timeout_ms=5000):
        self.host = host
        self.port = port
        self.database = database
        self.username = username
        self.password = password
        self.timeout_ms = timeout_ms

class RetryPolicy:
    def __init__(self, max_attempts=3, delay_ms=1000, exponential_backoff=True):
        self.max_attempts = max_attempts
        self.delay_ms = delay_ms
        self.exponential_backoff = exponential_backoff

class ConnectionPool:
    def __init__(self, host="localhost", port=5432, database="mydb",
                 username="user", password="pass", timeout_ms=5000,
                 max_attempts=3, delay_ms=1000, exponential_backoff=True,
                 min_connections=5, max_connections=20):
        # Manually forwarding all parameters - error prone!
        self.config = DatabaseConfig(host, port, database, username, password, timeout_ms)
        self.retry = RetryPolicy(max_attempts, delay_ms, exponential_backoff)
        self.min_connections = min_connections
        self.max_connections = max_connections

This approach has several problems:

  • Parameter duplication across constructors
  • Manual parameter forwarding is error-prone
  • No clear visual hierarchy of composition
  • Difficult to maintain as requirements change

Datatrees: A Predictable Pattern for Composition

Datatrees provides a declarative, consistent pattern that acts as a Domain-Specific API (DSAPI) for object composition:

# With datatrees - declarative and maintainable
@datatree
class DatabaseConfig:
    host: str = "localhost"
    port: int = 5432
    database: str = "mydb"
    username: str = "user"
    password: str = "pass"
    timeout_ms: int = 5000

@datatree
class RetryPolicy:
    max_attempts: int = 3
    delay_ms: int = 1000
    exponential_backoff: bool = True

@datatree
class ConnectionPool:
    config: Node[DatabaseConfig] = Node(DatabaseConfig, prefix="db_")
    retry: Node[RetryPolicy]
    min_connections: int = 5
    max_connections: int = 20

Benefits for Developers

  1. Reduced Boilerplate: No manual __init__ methods or parameter forwarding
  2. Clear Composition Hierarchy: Visual structure shows how components relate
  3. Automatic Parameter Management: Fields are automatically injected and managed
  4. Consistent Patterns: Every datatree class follows the same predictable structure
  5. Self-Documenting: The structure itself documents the composition relationships

Benefits for AI and LLMs

The predictable patterns created by datatrees are particularly valuable for AI-assisted development:

  1. Pattern Recognition: LLMs excel at recognizing and replicating consistent patterns. Datatrees provides a clear, repeatable structure that LLMs can easily learn and apply.

  2. Reduced Ambiguity: The declarative syntax limits the "search space" for code generation, leading to more accurate outputs:

    # LLMs can reliably predict this pattern:
    pool = ConnectionPool(
        db_host="prod.example.com",     # Prefix clearly indicates origin
        db_port=5432,                   # Consistent naming pattern
        max_attempts=5,                 # Direct injection from RetryPolicy
        min_connections=10              # Local field
    )
    
  3. Inferrable Usage: The structure makes it clear how to interact with objects:

    # LLMs can infer this usage from the pattern:
    config = pool.config()          # Node fields are callable factories
    retry_policy = pool.retry()     # Consistent access pattern
    
  4. Fewer Hallucinations: The well-defined structure reduces the likelihood of LLMs generating incorrect boilerplate or inventing non-existent parameters.

  5. Composition Understanding: LLMs can easily understand and suggest appropriate compositions based on the Node field patterns.

By adopting datatrees, you're not just writing cleaner code for humans – you're creating a codebase that's inherently more understandable and predictable for AI tools, leading to better code suggestions, more accurate refactoring, and reduced errors in AI-generated code.

Basic Usage

The "Node[T]" annotation is used to indicate that the field is used to inject fields from a class or parameters from a function. Crucially, Node fields become callable factories after initialization - you must call them with parentheses () to get a new instance of the wrapped type. The default value (an instance of a Node) contains options on how the fields are injected, namely prefix, suffix etc. If the default value is not specified, the a Node object will be created with the T parameter used as the class or function to inject e.g. the following are equivalent:

Various ways to specify a Node[T]

class A:
    a: int = 1

class B:
    a: Node[A] = Node(A)

# The following shorthand declarations are available in datatrees v0.1.9 and later.
class C:
    a: Node[A] # Shorthand for Node(A)

class D:
    a: Node[A] = Node('a') # Shorthand for Node(A, 'a')

class E:
    a: Node[A] = dtfield(init=False) # Shorthand for dtfield(Node(A), init=False)

Notably, in the shorthand declarations, the annotation arg is used to specify the class to inject if it is not already specified. (This feature is only availble for datatrees v0.1.9 and later)

Here's an example showing how datatrees can simplify configuration for a database connection pool:

from datatrees import datatree, Node, dtfield

@datatree
class RetryPolicy:
    max_attempts: int = 3
    delay_ms: int = 1000
    exponential_backoff: bool = True
    
    def get_delay(self, attempt: int) -> int:
        if self.exponential_backoff:
            return self.delay_ms * (2 ** (attempt - 1))
        return self.delay_ms

@datatree
class ConnectionConfig:
    host: str = "localhost"
    port: int = 5432
    database: str = "mydb"
    username: str = "user"
    password: str = "pass"
    timeout_ms: int = 5000
    
    def get_connection_string(self) -> str:
        return f"postgresql://{self.username}:{self.password}@{self.host}:{self.port}/{self.database}"

@datatree
class ConnectionPool:
    # Inject all fields from ConnectionConfig and RetryPolicy
    connection: Node[ConnectionConfig] = Node(ConnectionConfig, prefix="db_")  # All fields will be injected and prefixed with db_
    retry: Node[RetryPolicy] # default value is Node(RetryPolicy)
    
    min_connections: int = 5
    max_connections: int = 20
    
    # Self-defaulting field that depends on other fields
    connection_string: str = dtfield(
        self_default=lambda self: self.connection().get_connection_string(),
        init=False  # Won't appear in __init__ (default is False for self_default)
    )
    
    def __post_init__(self):
        print(f"Initializing pool: {self.connection_string}")
        print(f"Pool size: {self.min_connections}-{self.max_connections}")
        print(f"Retry policy: {self.max_attempts} attempts, starting at {self.delay_ms}ms")

# Usage
pool = ConnectionPool(
    db_host="db.example.com",      # Prefixed field from ConnectionConfig
    db_port=5432,                  # Prefixed field from ConnectionConfig  
    max_attempts=5,                # Field from RetryPolicy
    delay_ms=200,                  # Field from RetryPolicy
    min_connections=10       

Related Skills

View on GitHub
GitHub Stars6
CategoryDevelopment
Updated5mo ago
Forks0

Languages

Python

Security Score

82/100

Audited on Nov 2, 2025

No findings