Wrangler
Wrangler Transform: A DMD system for transforming Big Data
Install / Use
/learn @data-integrations/WranglerREADME
Data Prep
A collection of libraries, a pipeline plugin, and a CDAP service for performing data cleansing, transformation, and filtering using a set of data manipulation instructions (directives). These instructions are either generated using an interative visual tool or are manually created.
- Data Prep defines few concepts that might be useful if you are just getting started with it. Learn about them here
- The Data Prep Transform is separately documented.
- Data Prep Cheatsheet
New Features
More here on upcoming features.
-
User Defined Directives, also known as UDD, allow you to create custom functions to transform records within CDAP DataPrep or a.k.a Wrangler. CDAP comes with a comprehensive library of functions. There are however some omissions, and some specific cases for which UDDs are the solution. Additional information on how you can build your custom directives here.
-
A new capability that allows CDAP Administrators to restrict the directives that are accessible to their users. More information on configuring can be found here
Demo Videos and Recipes
Videos and Screencasts are best way to learn, so we have compiled simple, short screencasts that shows some of the features of Data Prep. Additional videos can be found here
Videos
- [SCREENCAST] Creating Lookup Dataset and Joining
- [SCREENCAST] Restricted Directives
- [SCREENCAST] Parse Excel files in CDAP
- [SCREENCAST] Parse File As AVRO File
- [SCREENCAST] Parsing Binary Coded AVRO Messages
- [SCREENCAST] Parsing Binary Coded AVRO Messages & Protobuf messages using schema registry
- [SCREENCAST] Quantize a column - Digitize
- [SCREENCAST] Data Cleansing capability with send-to-error directive
- [SCREENCAST] Building Data Prep from the GitHub source
- [VOICE-OVER] End-to-End Demo Video
- [SCREENCAST] Ingesting into Kudu
- [SCREENCAST] Realtime HL7 CCDA XML from Kafka into Time Parititioned Parquet
- [SCREENCAST] Parsing JSON file
- [SCREENCAST] Flattening arrays
- [SCREENCAST] Data cleansing with send-to-error directive
- [SCREENCAST] Publishing to Kafka
- [SCREENCAST] Fixed length to JSON
Recipes
Available Directives
These directives are currently available:
| Directive | Description | | ---------------------------------------------------------------------- | ---------------------------------------------------------------- | | Parsers | | | JSON Path | Uses a DSL (a JSON path expression) for parsing JSON records | | Parse as AVRO | Parsing an AVRO encoded message - either as binary or json | | Parse as AVRO File | Parsing an AVRO data file | | Parse as CSV | Parsing an input record as comma-separated values | | Parse as Date | Parsing dates using natural language processing | | Parse as Excel | Parsing excel file. | | Parse as Fixed Length | Parses as a fixed length record with specified widths | | Parse as HL7 | Parsing Health Level 7 Version 2 (HL7 V2) messages | | Parse as JSON | Parsing a JSON object | | Parse as Log | Parses access log files as from Apache HTTPD and nginx servers | | Parse as Protobuf | Parses an Protobuf encoded in-memory message using descriptor | | Parse as Simple Date | Parses date strings | | Parse XML To JSON | Parses an XML document into a JSON structure | | Parse as Currency | Parses a string representation of currency into a number. | | Parse as Datetime | Parses strings with datetime values to CDAP datetime type | | Output Formatters | | | Write as CSV | Converts a record into CSV format | | Write as JSON | Converts the record into a JSON map | | Write JSON Object | Composes a JSON object based on the fields specified. | | Format as Currency | Formats a number as currency as specified by locale. | | Transformations | | | Changing Case | Changes the case of column values | | Cut Character | Selects parts of a string value | | Set Column | Sets the column value to the result of an expression execution | | Find and Replace | Transforms string column values using a "sed"-like expression | | Index Split | (Deprecated) | | Invoke HTTP | Invokes an HTTP Service (Experimental, potentially slow) | | Quantization | Quantizes a column based on specified ranges | | Regex Group Extractor | Extracts the data from a regex group into its own column | | Setting Character Set | Sets the encoding and then converts the data to a UTF-8 String | | Setting Record Delimiter | Sets the record delimiter
Related Skills
feishu-drive
340.5k|
things-mac
340.5kManage Things 3 via the `things` CLI on macOS (add/update projects+todos via URL scheme; read/search/list from the local Things database)
clawhub
340.5kUse the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com
postkit
PostgreSQL-native identity, configuration, metering, and job queues. SQL functions that work with any language or driver
