SkillAgentSearch skills...

Ttv

A command line tool for splitting files into test, train, and validation sets.

Install / Use

/learn @sd2k/Ttv
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Dependabot Status

ttv - create train, test, validation sets

ttv is a command line tool for splitting large files up into chunks suitable for train/test/validation splits for machine learning. It arose from the need to split files that were too large to fit into memory to split, and the desire to do it in a clean way.

ttv requires Rust 2021.

Installation

Build using cargo build --release to get a binary at ./target/release/ttv. Copy this into your path to use it.

Usage

Run ttv --help to get help, or infer what you can from one of these examples:

# Split CSV file into two sets of a fixed number of rows
$ ttv split data.csv --rows=train=9000 --rows=test=1000

# Accepts gzipped data (no flag required). Shorthand argument version. As many splits as you like!
$ ttv split data.csv.gz --rows=train=65000,validation=15000,test=15000 -d

# Alternatively, specify proportion-based splits.
$ ttv split data.csv --prop=train=0.8,test=0.2

# When using proportions, include the total rows to get a progress bar
$ ttv split data.csv --prop=train=0.8,test=0.2 --total-rows=1234

# Accepts data from stdin, compressed or not (must give a filename)
$ cat data.csv | ttv split --rows=test=10000,train=90000 --output-prefix data -u
$ cat data.csv.gz | ttv split --rows=test=10000,train=90000 --output-prefix data -d

# Using pigz for faster decompression
$ pigz -dc data.csv.gz | ttv split --prop=test=0.1,train=0.9 --chunk-size 5000 --output-prefix data

# Split outputs into chunks for faster writing/reading later
$ ttv split data.csv.gz --rows=test=100000,train=900000 --chunk-size 5000 -d

# Write outputs uncompressed
$ ttv split data.csv.gz --prop=test=0.5,train=0.5

# Reproducible splits using seed
$ ttv split data.csv.gz --prop=test=0.5,train=0.5 --chunk-size 1000 --seed 5330 -d

Development

You'll need a recent version of the Rust nightly toolchain and Cargo. Then just hack away as normal.

Related Skills

View on GitHub
GitHub Stars40
CategoryDevelopment
Updated10mo ago
Forks2

Languages

Rust

Security Score

72/100

Audited on Jun 9, 2025

No findings