Corb2
MarkLogic tool for bulk loading, processing, and reporting on content.
Install / Use
/learn @marklogic-community/Corb2README
What is CoRB?
CoRB is a Java tool designed for bulk content-reprocessing of documents stored in MarkLogic. CoRB stands for Content Reprocessing in Bulk and is a multi-threaded workhorse tool at your disposal. In a nutshell, CoRB works off a list of documents in a database and performs operations against those documents. CoRB operations can include generating a report across all documents, manipulating the individual documents, or a combination thereof.
User Guide
This document and the wiki provide a comprehensive overview of CoRB and the options available to customize the execution of a CoRB job, as well as the ModuleExecutor Tool, which can be used to execute a single (XQuery or JavaScript) module in MarkLogic.
For additional information, refer to the CoRB Wiki.
Downloads
Download the latest release directly from https://github.com/marklogic-community/corb2/releases or resolve dependencies through Maven Central.
Compatability
- CoRB v2.4.0 (or later) requires Java 8 (or later) to run.
- CoRB v2.3.2 is the last release compatable with Java 7 and 6.
- CoRB v2.2.0 (or later) requires marklogic-xcc 8.0.* (or later) to run.
Note: marklogic-xcc 8 is backwards compatible to MarkLogic 5 and runs on Java 1.6 or later.
Getting Help
To get help with CoRB
- Post a question to Stack Overflow with the <code>markogic</code> and <code>corb</code> tags.
- Submit issues or feature requests at https://github.com/marklogic-community/corb2/issues
Running CoRB
The entry point is the main method in the com.marklogic.developer.corb.Manager class. CoRB requires the MarkLogic XCC JAR in the classpath,
preferably the version that corresponds to the MarkLogic server version, which can be downloaded from https://developer.marklogic.com/products/xcc.
Use Java 1.8 or later.
CoRB needs options specified through one or more of the following mechanisms:
- command-line parameters
- Java system properties ex:
-DXCC-CONNECTION-URI=xcc://user:password@localhost:8202 - As properties file in the class path specified using
-DOPTIONS-FILE=myjob.properties. Relative and full file system paths are also supported.
If specified in more than one place, a command line parameter takes precedence over a Java system property, which take precedence over a property from the OPTIONS-FILE properties file.
Note: Any or all of the properties can be specified as Java system properties or key value pairs in properties file.
Note: CoRB exit codes
0- successful,0- nothing to process (ref: EXIT-CODE-NO-URIS),1- initialization or connection error and2- execution error
Note: CoRB now supports Logging Job Metrics back to the MarkLogic database log and/or as document in the database.
Options
Option | Description
---|---
<a name="INIT-MODULE"></a>INIT-MODULE | An XQuery or JavaScript module which, if specified, will be invoked prior to URIS-MODULE. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively.
<a name="INIT-TASK"></a>INIT-TASK | Java Task which, if specified, will be called prior to URIS-MODULE. This can be used addition to INIT-MODULE for custom implementations.
<a name="OPTIONS-FILE"></a>OPTIONS-FILE | A properties file containing any of the CoRB options. Relative and full file system paths are supported.
<a name="OPTIONS-FILE-ENCODING"></a>OPTIONS-FILE-ENCODING | Specifies the character encoding of the OPTIONS-FILE. Otherwise, the system default will be used.
<a name="PROCESS-MODULE"></a>PROCESS-MODULE | XQuery or JavaScript to be executed in a batch for each URI from the URIS-MODULE or URIS-FILE. Module is expected to have at least one external or global variable with name URI. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively. If returning multiple values from a JavaScript module, values must be returned as Sequence.
<a name="PROCESS-TASK"></a>PROCESS-TASK | <div>Java Class that implements com.marklogic.developer.corb.Task or extends com.marklogic.developer.corb.AbstractTask. Typically, it can talk to PROCESS-MODULE and the do additional processing locally such save a returned value. <ul><li> com.marklogic.developer.corb.ExportBatchToFileTask Generates a single file, typically used for reports. Writes the data returned by the PROCESS-MODULE to a single file specified by EXPORT-FILE-NAME. All returned values from entire CoRB will be streamed into the single file. If EXPORT-FILE-NAME is not specified, CoRB uses URIS_BATCH_REF returned by URIS-MODULE as the file name. <li> com.marklogic.developer.corb.ExportToFileTask Generates multiple files. Saves the documents returned by each invocation of PROCESS-MODULE to a separate local file within EXPORT-FILE-DIR where the file name for each document will be the based on the URI.</ul>
<a name="PRE-BATCH-MODULE"></a>PRE-BATCH-MODULE | An XQuery or JavaScript module which, if specified, will be run before batch processing starts. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively.
<a name="PRE-BATCH-TASK"></a>PRE-BATCH-TASK | Java Class that implements com.marklogic.developer.corb.Task or extends com.marklogic.developer.corb.AbstractTask. If PRE-BATCH-MODULE is also specified, the implementation is expected to invoke the XQuery and process the result if any. It can also be specified without PRE-BATCH-MODULE and an example of this is to add a static header to a report. <ul><li> com.marklogic.developer.corb.PreBatchUpdateFileTask included - Writes the data returned by the PRE-BATCH-MODULE to EXPORT-FILE-NAME, which can particularly be used to to write dynamic headers for CSV output. Also, if EXPORT-FILE-TOP-CONTENT is specified, this task will write this value to the EXPORT-FILE-NAME - this option is especially useful for writing fixed headers to reports. If EXPORT-FILE-NAME is not specified, CoRB uses URIS_BATCH_REF returned by URIS-MODULE as the file name.</li><ul>
<a name="POST-BATCH-MODULE"></a>POST-BATCH-MODULE | An XQuery or JavaScript module which, if specified, will be run after batch processing is completed. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively.
<a name="POST-BATCH-TASK"></a>POST-BATCH-TASK | Java Class that implements com.marklogic.developer.corb.Task or extends com.marklogic.developer.corb.AbstractTask. If POST-BATCH-MODULE is also specified, the implementation is expected to invoke the XQuery and process the result if any. It can also be specified without POST-BATCH-MODULE and an example of this is to add static content to the bottom of the report. <ul><li> com.marklogic.developer.corb.PostBatchUpdateFileTask included - Writes the data returned by the POST-BATCH-MODULE to EXPORT-FILE-NAME. Also, if EXPORT-FILE-BOTTOM-CONTENT is specified, this task will write this value to the EXPORT-FILE-NAME. If EXPORT-FILE-NAME is not specified, CoRB uses URIS_BATCH_REF returned by URIS-MODULE as the file name.</li></ul>
<a name="THREAD-COUNT"></a>THREAD-COUNT | The number of worker threads. Default is 1.
<a name="URIS-MODULE"></a>URIS-MODULE | URI selector module written in XQuery or JavaScript. Expected to return a sequence containing the uris count, followed by all the uris. Optionally, it can also return an arbitrary string as a first item in this sequence - refer to URIS_BATCH_REF section below. XQuery and JavaScript modules need to have .xqy and .sjs extensions respectively. JavaScript modules must return a Sequence.
<a name="URIS-FILE"></a>URIS-FILE | If defined instead of URIS-MODULE, URIs will be loaded from the file located on the client. There should only be one URI per line. This path may be relative or absolute. For example, a file containing a list of document identifiers can be used as a URIS-FILE and the PROCESS-MODULE can query for the document based on this document identifier.
<a name="XCC-CONNECTION-URI"></a>XCC-CONNECTION-URI | Connection string to MarkLogic XDBC Server. Multiple connection strings can be specified with comma as a separator.
Additional options
Option | Description
---|---
<a name="BATCH-SIZE"></a>BATCH-SIZE | The number of URIs to be executed in single transform. Default is 1. If more than 1, PROCESS-MODULE will receive a delimited string as the $URI variable, which needs to be tokenized to get individual URIs. The default delimi
Related Skills
qqbot-channel
346.4kQQ 频道管理技能。查询频道列表、子频道、成员、发帖、公告、日程等操作。使用 qqbot_channel_api 工具代理 QQ 开放平台 HTTP 接口,自动处理 Token 鉴权。当用户需要查看频道、管理子频道、查询成员、发布帖子/公告/日程时使用。
docs-writer
100.1k`docs-writer` skill instructions As an expert technical writer and editor for the Gemini CLI project, you produce accurate, clear, and consistent documentation. When asked to write, edit, or revie
model-usage
346.4kUse CodexBar CLI local cost usage to summarize per-model usage for Codex or Claude, including the current (most recent) model or a full model breakdown. Trigger when asked for model-level usage/cost data from codexbar, or when you need a scriptable per-model summary from codexbar cost JSON.
Design
Campus Second-Hand Trading Platform \- General Design Document (v5.0 \- React Architecture \- Complete Final Version)1\. System Overall Design 1.1. Project Overview This project aims t
