SkillAgentSearch skills...

Corb2

MarkLogic tool for bulk loading, processing, and reporting on content.

Install / Use

/learn @marklogic-community/Corb2
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

Maven Central - download the latest version Codecov code coverage Snyk Known Vulnerabilities Badge

What is CoRB?

CoRB is a Java tool designed for bulk content-reprocessing of documents stored in MarkLogic. CoRB stands for Content Reprocessing in Bulk and is a multi-threaded workhorse tool at your disposal. In a nutshell, CoRB works off a list of documents in a database and performs operations against those documents. CoRB operations can include generating a report across all documents, manipulating the individual documents, or a combination thereof.

User Guide

This document and the wiki provide a comprehensive overview of CoRB and the options available to customize the execution of a CoRB job, as well as the ModuleExecutor Tool, which can be used to execute a single (XQuery or JavaScript) module in MarkLogic.

For additional information, refer to the CoRB Wiki.

Downloads

Download the latest release directly from https://github.com/marklogic-community/corb2/releases or resolve dependencies through Maven Central.

Compatability

Note: marklogic-xcc 8 is backwards compatible to MarkLogic 5 and runs on Java 1.6 or later.

Getting Help

To get help with CoRB

Running CoRB

The entry point is the main method in the com.marklogic.developer.corb.Manager class. CoRB requires the MarkLogic XCC JAR in the classpath, preferably the version that corresponds to the MarkLogic server version, which can be downloaded from https://developer.marklogic.com/products/xcc. Use Java 1.8 or later.

CoRB needs options specified through one or more of the following mechanisms:

  1. command-line parameters
  2. Java system properties ex: -DXCC-CONNECTION-URI=xcc://user:password@localhost:8202
  3. As properties file in the class path specified using -DOPTIONS-FILE=myjob.properties. Relative and full file system paths are also supported.

If specified in more than one place, a command line parameter takes precedence over a Java system property, which take precedence over a property from the OPTIONS-FILE properties file.

Note: Any or all of the properties can be specified as Java system properties or key value pairs in properties file.

Note: CoRB exit codes 0 - successful, 0 - nothing to process (ref: EXIT-CODE-NO-URIS), 1 - initialization or connection error and 2 - execution error

Note: CoRB now supports Logging Job Metrics back to the MarkLogic database log and/or as document in the database.

Options

Option | Description ---|--- <a name="INIT-MODULE"></a>INIT-MODULE | An XQuery or JavaScript module which, if specified, will be invoked prior to URIS-MODULE. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively. <a name="INIT-TASK"></a>INIT-TASK | Java Task which, if specified, will be called prior to URIS-MODULE. This can be used addition to INIT-MODULE for custom implementations. <a name="OPTIONS-FILE"></a>OPTIONS-FILE | A properties file containing any of the CoRB options. Relative and full file system paths are supported. <a name="OPTIONS-FILE-ENCODING"></a>OPTIONS-FILE-ENCODING | Specifies the character encoding of the OPTIONS-FILE. Otherwise, the system default will be used. <a name="PROCESS-MODULE"></a>PROCESS-MODULE | XQuery or JavaScript to be executed in a batch for each URI from the URIS-MODULE or URIS-FILE. Module is expected to have at least one external or global variable with name URI. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively. If returning multiple values from a JavaScript module, values must be returned as Sequence. <a name="PROCESS-TASK"></a>PROCESS-TASK | <div>Java Class that implements com.marklogic.developer.corb.Task or extends com.marklogic.developer.corb.AbstractTask. Typically, it can talk to PROCESS-MODULE and the do additional processing locally such save a returned value. <ul><li> com.marklogic.developer.corb.ExportBatchToFileTask Generates a single file, typically used for reports. Writes the data returned by the PROCESS-MODULE to a single file specified by EXPORT-FILE-NAME. All returned values from entire CoRB will be streamed into the single file. If EXPORT-FILE-NAME is not specified, CoRB uses URIS_BATCH_REF returned by URIS-MODULE as the file name. <li> com.marklogic.developer.corb.ExportToFileTask Generates multiple files. Saves the documents returned by each invocation of PROCESS-MODULE to a separate local file within EXPORT-FILE-DIR where the file name for each document will be the based on the URI.</ul> <a name="PRE-BATCH-MODULE"></a>PRE-BATCH-MODULE | An XQuery or JavaScript module which, if specified, will be run before batch processing starts. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively. <a name="PRE-BATCH-TASK"></a>PRE-BATCH-TASK | Java Class that implements com.marklogic.developer.corb.Task or extends com.marklogic.developer.corb.AbstractTask. If PRE-BATCH-MODULE is also specified, the implementation is expected to invoke the XQuery and process the result if any. It can also be specified without PRE-BATCH-MODULE and an example of this is to add a static header to a report. <ul><li> com.marklogic.developer.corb.PreBatchUpdateFileTask included - Writes the data returned by the PRE-BATCH-MODULE to EXPORT-FILE-NAME, which can particularly be used to to write dynamic headers for CSV output. Also, if EXPORT-FILE-TOP-CONTENT is specified, this task will write this value to the EXPORT-FILE-NAME - this option is especially useful for writing fixed headers to reports. If EXPORT-FILE-NAME is not specified, CoRB uses URIS_BATCH_REF returned by URIS-MODULE as the file name.</li><ul> <a name="POST-BATCH-MODULE"></a>POST-BATCH-MODULE | An XQuery or JavaScript module which, if specified, will be run after batch processing is completed. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively. <a name="POST-BATCH-TASK"></a>POST-BATCH-TASK | Java Class that implements com.marklogic.developer.corb.Task or extends com.marklogic.developer.corb.AbstractTask. If POST-BATCH-MODULE is also specified, the implementation is expected to invoke the XQuery and process the result if any. It can also be specified without POST-BATCH-MODULE and an example of this is to add static content to the bottom of the report. <ul><li> com.marklogic.developer.corb.PostBatchUpdateFileTask included - Writes the data returned by the POST-BATCH-MODULE to EXPORT-FILE-NAME. Also, if EXPORT-FILE-BOTTOM-CONTENT is specified, this task will write this value to the EXPORT-FILE-NAME. If EXPORT-FILE-NAME is not specified, CoRB uses URIS_BATCH_REF returned by URIS-MODULE as the file name.</li></ul> <a name="THREAD-COUNT"></a>THREAD-COUNT | The number of worker threads. Default is 1. <a name="URIS-MODULE"></a>URIS-MODULE | URI selector module written in XQuery or JavaScript. Expected to return a sequence containing the uris count, followed by all the uris. Optionally, it can also return an arbitrary string as a first item in this sequence - refer to URIS_BATCH_REF section below. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively. JavaScript modules must return a Sequence. <a name="URIS-FILE"></a>URIS-FILE | If defined instead of URIS-MODULE, URIs will be loaded from the file located on the client. There should only be one URI per line. This path may be relative or absolute. For example, a file containing a list of document identifiers can be used as a URIS-FILE and the PROCESS-MODULE can query for the document based on this document identifier. <a name="XCC-CONNECTION-URI"></a>XCC-CONNECTION-URI | Connection string to MarkLogic XDBC Server. Multiple connection strings can be specified with comma as a separator.

Additional options

Option | Description ---|--- <a name="BATCH-SIZE"></a>BATCH-SIZE | The number of URIs to be executed in single transform. Default is 1. If more than 1, PROCESS-MODULE will receive a delimited string as the $URI variable, which needs to be tokenized to get individual URIs. The default delimi

Related Skills

View on GitHub
GitHub Stars23
CategoryContent
Updated7mo ago
Forks16

Languages

Java

Security Score

72/100

Audited on Aug 13, 2025

No findings