Demae
A framework to build a machine learning batch
Install / Use
/learn @uiur/DemaeREADME
demae
demae is a framework to build a batch program using Machine Learning. Makes it easier to deploy your ML model into production.
Main features:
- handle data source and destination easily
- support parallel execution
- print stats of execution time
This example is to fetch input from S3, transform it and push output to S3.
S3 -> transform -> S3
from demae import Base
from demae.source import S3Source
from demae.dest import S3Dest
"""
requires `source`, `dest` and `transform` to be implemented
"""
class Batch(Base):
"""
Set data source
This reads input from files with the prefix in `redshift-copy-buffer` bucket.
Input files must be in tsv format.
"""
source = S3Source(
bucket='bucket',
prefix='{env}/example_input/{date}/example_input.tsv',
columns=['id', 'text'],
)
"""
Specify output destination in s3.
key_map : a function (input key -> output key)
This example maps input:
from: development/example_input/2017-12-24/example_input.0000_part_00.gz
to: development/example_output/2017-12-24/example_output.0000_part_00.gz
"""
dest = S3Dest(
key_map=lambda key: re.sub('_input', '_output', key)
)
"""
Write your inference code here
data : pandas DataFrame
columns is automatically set from source.columns.
must returns array-like objects (DataFrame, numpy array or list)
"""
def transform(self, data):
output = predict(data[:, 'text'])
return output
To run:
batch = Batch(
env='development',
date='2017-02-13'
)
batch.run()
Parallel execution
Parallel execution is supported by providing environment variables that are specified in parallel_env.
A batch handles only a corresponding part of input.
source = S3Source(
bucket='bucket',
prefix='development/foo/foo.tsv',
columns=['id', 'text'],
parallel_env={'index': 'PARALLEL_INDEX', 'size': 'PARALLEL_SIZE'},
)
For example,
input files: input.tsv.part0 input.tsv.part1 input.tsv.part2
When PARALLEL_INDEX=1 and PARALLEL_SIZE=3 are provided, it handles only input.tsv.part1.
License
MIT
This software is developed while working for Cookpad Inc.
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
groundhog
398Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
sec-edgar-agentkit
10AI agent toolkit for accessing and analyzing SEC EDGAR filing data. Build intelligent agents with LangChain, MCP-use, Gradio, Dify, and smolagents to analyze financial statements, insider trading, and company filings.
Kiln
4.7kBuild, Evaluate, and Optimize AI Systems. Includes evals, RAG, agents, fine-tuning, synthetic data generation, dataset management, MCP, and more.
