SkillAgentSearch skills...

Wrangler

Wrangler Transform: A DMD system for transforming Big Data

Install / Use

/learn @data-integrations/Wrangler

README

Data Prep

cm-available cdap-transform Build Status Coverity Scan Build Status Maven Central Javadoc License Join CDAP community

A collection of libraries, a pipeline plugin, and a CDAP service for performing data cleansing, transformation, and filtering using a set of data manipulation instructions (directives). These instructions are either generated using an interative visual tool or are manually created.

New Features

More here on upcoming features.

  • User Defined Directives, also known as UDD, allow you to create custom functions to transform records within CDAP DataPrep or a.k.a Wrangler. CDAP comes with a comprehensive library of functions. There are however some omissions, and some specific cases for which UDDs are the solution. Additional information on how you can build your custom directives here.

    • Migrating directives from version 1.0 to version 2.0 here
    • Information about Grammar here
    • Various TokenType supported by system here
    • Custom Directive Implementation Internals here
  • A new capability that allows CDAP Administrators to restrict the directives that are accessible to their users. More information on configuring can be found here

Demo Videos and Recipes

Videos and Screencasts are best way to learn, so we have compiled simple, short screencasts that shows some of the features of Data Prep. Additional videos can be found here

Videos

Recipes

Available Directives

These directives are currently available:

| Directive | Description | | ---------------------------------------------------------------------- | ---------------------------------------------------------------- | | Parsers | | | JSON Path | Uses a DSL (a JSON path expression) for parsing JSON records | | Parse as AVRO | Parsing an AVRO encoded message - either as binary or json | | Parse as AVRO File | Parsing an AVRO data file | | Parse as CSV | Parsing an input record as comma-separated values | | Parse as Date | Parsing dates using natural language processing | | Parse as Excel | Parsing excel file. | | Parse as Fixed Length | Parses as a fixed length record with specified widths | | Parse as HL7 | Parsing Health Level 7 Version 2 (HL7 V2) messages | | Parse as JSON | Parsing a JSON object | | Parse as Log | Parses access log files as from Apache HTTPD and nginx servers | | Parse as Protobuf | Parses an Protobuf encoded in-memory message using descriptor | | Parse as Simple Date | Parses date strings | | Parse XML To JSON | Parses an XML document into a JSON structure | | Parse as Currency | Parses a string representation of currency into a number. | | Parse as Datetime | Parses strings with datetime values to CDAP datetime type | | Output Formatters | | | Write as CSV | Converts a record into CSV format | | Write as JSON | Converts the record into a JSON map | | Write JSON Object | Composes a JSON object based on the fields specified. | | Format as Currency | Formats a number as currency as specified by locale. | | Transformations | | | Changing Case | Changes the case of column values | | Cut Character | Selects parts of a string value | | Set Column | Sets the column value to the result of an expression execution | | Find and Replace | Transforms string column values using a "sed"-like expression | | Index Split | (Deprecated) | | Invoke HTTP | Invokes an HTTP Service (Experimental, potentially slow) | | Quantization | Quantizes a column based on specified ranges | | Regex Group Extractor | Extracts the data from a regex group into its own column | | Setting Character Set | Sets the encoding and then converts the data to a UTF-8 String | | Setting Record Delimiter | Sets the record delimiter

Related Skills

View on GitHub
GitHub Stars106
CategoryData
Updated1mo ago
Forks1.2k

Languages

Java

Security Score

100/100

Audited on Jan 31, 2026

No findings