SkillAgentSearch skills...

Avro2parquet

Hadoop MapReduce tool to convert Avro data files to Parquet format.

Install / Use

/learn @laserson/Avro2parquet
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

avro2parquet

Hadoop MapReduce program to convert Avro data files to Parquet format.

Installation

git clone https://github.com/laserson/avro2parquet.git
cd avro2parquet
mvn clean package

This will generate the jar files in the target/ directory.

Usage

This tool will work on Avro container files (which I believe is just the standard Avro data file format). It contains the Avro GenericRecord objects as the key and a NullWritable as the value.

The tool is currently hardcoded to output Snappy-compressed Parquet. It is simply a MapReduce job using the Tool interface.

The command is like so:

hadoop jar <avro2parquet jar file> \
com.cloudera.science.avro2parquet.Avro2Parquet \
<and generic options to the JVM> \
hdfs:///path/to/avro/schema.avsc \
hdfs:///path/to/avro/data \
hdfs:///output/path

so for example:

hadoop jar avro2parquet-0.1.0-jar-with-dependencies.jar \
com.cloudera.science.avro2parquet.Avro2Parquet \
-D mapred.child.java.opts=-Xmx1024M \
hdfs:///user/lasersou/schemas/data.avsc \
hdfs:///user/lasersou/data \
hdfs:///user/lasersou/output

Related Skills

View on GitHub
GitHub Stars34
CategoryDevelopment
Updated3y ago
Forks23

Languages

Java

Security Score

60/100

Audited on Nov 22, 2022

No findings