Datatrees
Wrapper for dataclasses with auto field injection, binding, self-default and more
Install / Use
/learn @owebeeone/DatatreesREADME
Datatrees
A wrapper to Python's dataclasses for simplifying class composition with automatic field injection, binding, self-defaults and more.
Datatrees is particularly useful for composing a class from other classes or functions in a hierarchical manner, where fields from classes deeper in the hierarchy need to be propagated as root class parameters. The boilerplate code for such a composition is often error-prone and difficult to maintain. The impetus for this library came from building hierarchical 3D models where each node in the hierarchy collected fields from nodes nested deeper in the hierarchy. Using datatrees, almost all the boilerplate management of model parameters was eliminated, resulting in clean and maintainable 3D model classes.
Installation
pip install datatrees
Core Features
- Field Injection and Binding: Automatically inject fields from other classes or functions
- Field Mapping: Map fields between classes with custom naming
- Self-Defaulting: Fields can default based on other fields
- Field Documentation: Field documentation is preserved through the injection chain
- Post-Init Chaining: Automatically chain inherited post_init functions
- Type Annotations: Typing support for static type checkers and shorthands for Node[T]
Exports:
- datatree: The decorator for creating datatrees akin to dataclasses.dataclass(). It accepts all standard
dataclasses.dataclassarguments (e.g.,init,repr,eq,order,frozen,unsafe_hash,match_args,kw_only,slots,weakref_slot) in addition to datatrees-specific parameters likechain_post_init. - dtfield: The decorator for creating fields akin to dataclasses.field()
- Node: The class for creating node factories. IMPORTANT: When accessed on an instance, a Node field is itself a callable factory that produces new instances of the target type on each invocation. For example, if
obj.node_fieldis a Node[SomeClass], thenobj.node_field()creates and returns a newSomeClassinstance each time it's called. - field_docs: The function for getting the documentation for a datatree field
- get_injected_fields: Produces documentation on how fields are injected and bound
Datatrees as a Domain-Specific Composition API
The Challenge of Complex Object Composition
Traditional Python object composition often leads to verbose, error-prone boilerplate code. Consider a typical manual approach:
# Without datatrees - verbose and error-prone
class DatabaseConfig:
def __init__(self, host="localhost", port=5432, database="mydb",
username="user", password="pass", timeout_ms=5000):
self.host = host
self.port = port
self.database = database
self.username = username
self.password = password
self.timeout_ms = timeout_ms
class RetryPolicy:
def __init__(self, max_attempts=3, delay_ms=1000, exponential_backoff=True):
self.max_attempts = max_attempts
self.delay_ms = delay_ms
self.exponential_backoff = exponential_backoff
class ConnectionPool:
def __init__(self, host="localhost", port=5432, database="mydb",
username="user", password="pass", timeout_ms=5000,
max_attempts=3, delay_ms=1000, exponential_backoff=True,
min_connections=5, max_connections=20):
# Manually forwarding all parameters - error prone!
self.config = DatabaseConfig(host, port, database, username, password, timeout_ms)
self.retry = RetryPolicy(max_attempts, delay_ms, exponential_backoff)
self.min_connections = min_connections
self.max_connections = max_connections
This approach has several problems:
- Parameter duplication across constructors
- Manual parameter forwarding is error-prone
- No clear visual hierarchy of composition
- Difficult to maintain as requirements change
Datatrees: A Predictable Pattern for Composition
Datatrees provides a declarative, consistent pattern that acts as a Domain-Specific API (DSAPI) for object composition:
# With datatrees - declarative and maintainable
@datatree
class DatabaseConfig:
host: str = "localhost"
port: int = 5432
database: str = "mydb"
username: str = "user"
password: str = "pass"
timeout_ms: int = 5000
@datatree
class RetryPolicy:
max_attempts: int = 3
delay_ms: int = 1000
exponential_backoff: bool = True
@datatree
class ConnectionPool:
config: Node[DatabaseConfig] = Node(DatabaseConfig, prefix="db_")
retry: Node[RetryPolicy]
min_connections: int = 5
max_connections: int = 20
Benefits for Developers
- Reduced Boilerplate: No manual
__init__methods or parameter forwarding - Clear Composition Hierarchy: Visual structure shows how components relate
- Automatic Parameter Management: Fields are automatically injected and managed
- Consistent Patterns: Every datatree class follows the same predictable structure
- Self-Documenting: The structure itself documents the composition relationships
Benefits for AI and LLMs
The predictable patterns created by datatrees are particularly valuable for AI-assisted development:
-
Pattern Recognition: LLMs excel at recognizing and replicating consistent patterns. Datatrees provides a clear, repeatable structure that LLMs can easily learn and apply.
-
Reduced Ambiguity: The declarative syntax limits the "search space" for code generation, leading to more accurate outputs:
# LLMs can reliably predict this pattern: pool = ConnectionPool( db_host="prod.example.com", # Prefix clearly indicates origin db_port=5432, # Consistent naming pattern max_attempts=5, # Direct injection from RetryPolicy min_connections=10 # Local field ) -
Inferrable Usage: The structure makes it clear how to interact with objects:
# LLMs can infer this usage from the pattern: config = pool.config() # Node fields are callable factories retry_policy = pool.retry() # Consistent access pattern -
Fewer Hallucinations: The well-defined structure reduces the likelihood of LLMs generating incorrect boilerplate or inventing non-existent parameters.
-
Composition Understanding: LLMs can easily understand and suggest appropriate compositions based on the Node field patterns.
By adopting datatrees, you're not just writing cleaner code for humans – you're creating a codebase that's inherently more understandable and predictable for AI tools, leading to better code suggestions, more accurate refactoring, and reduced errors in AI-generated code.
Basic Usage
The "Node[T]" annotation is used to indicate that the field is used to inject fields from a class or parameters from a function. Crucially, Node fields become callable factories after initialization - you must call them with parentheses () to get a new instance of the wrapped type. The default value (an instance of a Node) contains options on how the fields are injected, namely prefix, suffix etc. If the default value is not specified, the a Node object will be created with the T parameter used as the class or function to inject e.g. the following are equivalent:
Various ways to specify a Node[T]
class A:
a: int = 1
class B:
a: Node[A] = Node(A)
# The following shorthand declarations are available in datatrees v0.1.9 and later.
class C:
a: Node[A] # Shorthand for Node(A)
class D:
a: Node[A] = Node('a') # Shorthand for Node(A, 'a')
class E:
a: Node[A] = dtfield(init=False) # Shorthand for dtfield(Node(A), init=False)
Notably, in the shorthand declarations, the annotation arg is used to specify the class to inject if it is not already specified. (This feature is only availble for datatrees v0.1.9 and later)
Here's an example showing how datatrees can simplify configuration for a database connection pool:
from datatrees import datatree, Node, dtfield
@datatree
class RetryPolicy:
max_attempts: int = 3
delay_ms: int = 1000
exponential_backoff: bool = True
def get_delay(self, attempt: int) -> int:
if self.exponential_backoff:
return self.delay_ms * (2 ** (attempt - 1))
return self.delay_ms
@datatree
class ConnectionConfig:
host: str = "localhost"
port: int = 5432
database: str = "mydb"
username: str = "user"
password: str = "pass"
timeout_ms: int = 5000
def get_connection_string(self) -> str:
return f"postgresql://{self.username}:{self.password}@{self.host}:{self.port}/{self.database}"
@datatree
class ConnectionPool:
# Inject all fields from ConnectionConfig and RetryPolicy
connection: Node[ConnectionConfig] = Node(ConnectionConfig, prefix="db_") # All fields will be injected and prefixed with db_
retry: Node[RetryPolicy] # default value is Node(RetryPolicy)
min_connections: int = 5
max_connections: int = 20
# Self-defaulting field that depends on other fields
connection_string: str = dtfield(
self_default=lambda self: self.connection().get_connection_string(),
init=False # Won't appear in __init__ (default is False for self_default)
)
def __post_init__(self):
print(f"Initializing pool: {self.connection_string}")
print(f"Pool size: {self.min_connections}-{self.max_connections}")
print(f"Retry policy: {self.max_attempts} attempts, starting at {self.delay_ms}ms")
# Usage
pool = ConnectionPool(
db_host="db.example.com", # Prefixed field from ConnectionConfig
db_port=5432, # Prefixed field from ConnectionConfig
max_attempts=5, # Field from RetryPolicy
delay_ms=200, # Field from RetryPolicy
min_connections=10
Related Skills
node-connect
352.2kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.1kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.2kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.2kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
