SkillAgentSearch skills...

Datagen

Big Data Generator for testing

Install / Use

/learn @ingkle-oss/Datagen
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Description

Big Data Generator for testing

Run on local

Set up python environment

direnv allow
#or
pipenv install
#or
pip3 install -r requirements.txt

Run Kafka producer

# Produce fake data
python3 src/produce_fake.py \
--kafka-bootstrap-servers BOOTSTRAP_SERVER --kafka-security-protocol SASL_PLAINTEXT --kafka-sasl-username USERNAME --kafka-sasl-password PASSWORD --kafka-topic test-kafka-topic --kafka-report-interval 1 \
--nz-schema-file samples/fake.schema.csv --nz-schema-file-type csv \
--output-type json

# Post fake data to pandas http
python3 src/pandas_http_fake.py \
--host PANDAS_PROXY_HOST --port PANDAS_PROXY_PORT --kafka-sasl-username USERNAME --kafka-sasl-password PASSWORD --ssl --kafka-topic test-kafka-topic \
--nz-schema-file samples/fake.schema.csv --nz-schema-file-type csv \
--output-type json

# Produce a file
python3 src/produce_file.py \
--kafka-bootstrap-servers BOOTSTRAP_SERVER --kafka-security-protocol SASL_PLAINTEXT --kafka-sasl-username USERNAME --kafka-sasl-password PASSWORD \
--kafka-topic fake_test \
--input-filepath samples/fake.jsonl --input-type jsonl --output-type json \
--kafka-report-interval 1 \
--loglevel DEBUG

# Post a file to pandas http
python3 src/pandas_http_file.py \
--host PANDAS_PROXY_HOST --port PANDAS_PROXY_PORT --kafka-sasl-username USERNAME --kafka-sasl-password PASSWORD --ssl --kafka-topic test-kafka-topic \
--nz-schema-file samples/fake.schema.csv --nz-schema-file-type csv \
--input-filepath samples/fake.json --input-type json --output-type json

Consumer Kafka data

python src/consumer_loop.py \
--kafka-bootstrap-servers BOOTSTRAP_SERVER --kafka-security-protocol SASL_PLAINTEXT --kafka-sasl-username USERNAME --kafka-sasl-password PASSWORD \
--kafka-topic fake_test --input-type json \
--loglevel DEBUG

Run MQTT publisher

# Publish fake data
python3 src/publish_fake.py --mqtt-host MQTT_HOST --mqtt-port MQTT_PORT --mqtt-username MQTT_USERNAME --mqtt-password MQTT_PASSWORD  --mqtt-kafka-topic MQTT_TOPIC --mqtt-tls --mqtt-tls-insecure \
--nz-schema-file samples/fake.schema.csv --nz-schema-file-type csv \
--output-type json

# Publish a file
python3 src/publish_file.py --mqtt-host MQTT_HOST --mqtt-port MQTT_PORT --mqtt-username MQTT_USERNAME --mqtt-password MQTT_PASSWORD  --mqtt-kafka-topic MQTT_TOPIC --mqtt-tls --mqtt-tls-insecure \
--nz-schema-file samples/fake.schema.csv --nz-schema-file-type csv \
--input-filepath samples/fake.json --input-type json --output-type json

Create, Delete a Nazare pipeline

# Create pipeline
python3 src/nazare_pipeline_create.py \
--nz-api-url STORE_API_URL --nz-api-username STORE_API_USERNAME --nz-api-password STORE_API_PASSWORD \
--nz-pipeline-name PIPELINE_NAME --nz-pipeline-type PIPELINE_TYPE -no-pipeline-deltasync --pipeline-retention '60,d' \
--nz-schema-file SCHEMA_FILE --nz-schema-file-type SCHEMA_FILE_TYPE

# Delete pipeline
python3 src/nazare_pipeline_delete.py \
--nz-api-url STORE_API_URL --nz-api-username STORE_API_USERNAME --nz-api-password STORE_API_PASSWORD \
--nz-pipeline-name PIPELINE_NAME \

Run on docker

# Produce fake data
docker run --rm -it ingkle/datagen python3 produce_fake.py --kafka-bootstrap-servers BOOTSTRAP_SERVER --kafka-security-protocol SASL_PLAINTEXT --kafka-sasl-username USERNAME --kafka-sasl-password PASSWORD --kafka-topic test-kafka-topic --rate 1  --kafka-report-interval 1

Run on K8s

Build

Create buildx docker-container driver for multi-target build

docker buildx create --name multi-builder --driver docker-container --bootstrap

Build and load

docker buildx build -t ingkle/datagen:test --platform linux/arm64 --load .

Build and push

docker buildx build -t ingkle/datagen:test --platform linux/amd64,linux/arm64 --push .

Related Skills

View on GitHub
GitHub Stars4
CategoryDevelopment
Updated1mo ago
Forks2

Languages

Python

Security Score

70/100

Audited on Feb 26, 2026

No findings