Lightgbm4j
Java LightGBM binding
Install / Use
/learn @metarank/Lightgbm4jREADME
LightGBM4j: a java wrapper for LightGBM
LightGBM4j is a zero-dependency Java wrapper for the LightGBM project. Its main goal is to provide a 1-1 mapping for all LightGBM API methods in a Java-friendly flavor.
Purpose
LightGBM itself has a SWIG-generated JNI interface, which is possible to use directly from Java. The problem with SWIG wrappers is that they are extremely low-level. For example, to pass a java array thru SWIG, you need to do something horrible:
SWIGTYPE_p_float dataBuffer = new_floatArray(input.length);
for (int i = 0; i < input.length; i++) {
floatArray_setitem(dataBuffer, i, input[i]);
}
int result = <...>
if (result < 0) {
delete_floatArray(dataBuffer);
throw new Exception(LGBM_GetLastError());
} else {
delete_floatArray(dataBuffer);
<...>
}
This wrapper does all the dirty job for you:
- exposes native java types for all supported API methods (so
float[]insteadSWIGTYPE_p_float) - handles manual memory management internally (so you don't need to care about JNI memory leaks)
- supports both
float[]anddouble[]API flavours. - reduces the amount of boilerplate for basic tasks.
The library is in an early development stage and does not cover all 100% of LightGBM API, but the eventual future goal will be merging with the upstream LightGBM and becoming an official Java binding for the project.
Installation
To install, use the following maven coordinates:
<dependency>
<groupId>io.github.metarank</groupId>
<artifactId>lightgbm4j</artifactId>
<version>4.6.0-2</version>
</dependency>
Versioning schema attempts to match the upstream, but with extra -N suffix, if there were a couple of extra lightgbm4j-specific
changes released on top.
MacOS & Linux native library dependencies installation
LightGBM native library requires the libomp dependency for OpenMP support, but this library is often missing on some systems by default.
For MacOS:
brew install libomp
For Debian Linux:
apt install libgomp1
GPU support
It is possible to force GPU support for a training:
- rebuild the [LightGBM with GPU support]: use
-DUSE_CUDA=1 -DUSE_SWIG=ONCMake options. You should also match the native/JNI versions precisely. - LightGBM4j loads native libraries by default from bundled resources. This can be overridden by setting the
LIGHTGBM_NATIVE_LIB_PATHenvironment variable. It should point to a directory withlib_lightgbm.soandlib_lightgbm_swig.sofiles (or withdll/dylibextensions on Windows/MacOS).
If the native override was able to successfully load a custom library you've built, then you'll see the following line in logs:
LIGHTGBM_NATIVE_LIB_PATH is set: loading /home/user/code/LightGBM/lib_lightgbm.so
LIGHTGBM_NATIVE_LIB_PATH is set: loading /home/user/code/LightGBM/lib_lightgbm_swig.so
Usage
There are two main classes available:
LGBMDatasetto manage input training and validation data.LGBMBoosterto do training and inference.
All the public API methods in these classes should map to the LightGBM C API methods directly.
Note that both LGBMBooster and LGBMDataset classes contain handles of native memory
data structures from the LightGBM, so you need to explicitly call .close() when they are not used. Otherwise, you may catch
a native code memory leak.
To load an existing model and run it:
LGBMBooster loaded = LGBMBooster.loadModelFromString(model);
float[] input = new float[] {1.0f, 1.0f, 1.0f, 1.0f};
double[] pred = booster.predictForMat(input, 2, 2, true);
To load a dataset from a java matrix:
float[] matrix = new float[] {1.0f, 1.0f, 1.0f, 1.0f};
LGBMDataset ds = LGBMDataset.createFromMat(matrix, 2, 2, true, "", null);
There are some rough parts in the LightGBM API in loading the dataset from matrices:
createFromMatparameters cannot set the label or weight column. So if you doparameters = "label=some_column_name", it will be ignored by the LightGBM.- label/weight/group columns are magical and should NOT be included in the input matrix for
createFromMat - to set these magical columns, you need to explicitly call
LGBMDataset.setField()method. labelandweightcolumns must befloat[]groupandpositioncolumn must beint[]
A full example of loading dataset from a matrix for a cancer dataset:
String[] columns = new String[] {
"Age","BMI","Glucose","Insulin","HOMA","Leptin","Adiponectin","Resistin","MCP.1"
};
double[] values = new double[] {
71,30.3,102,8.34,2.098344,56.502,8.13,4.2989,200.976,
66,27.7,90,6.042,1.341324,24.846,7.652055,6.7052,225.88,
75,25.7,94,8.079,1.8732508,65.926,3.74122,4.49685,206.802,
78,25.3,60,3.508,0.519184,6.633,10.567295,4.6638,209.749,
69,29.4,89,10.704,2.3498848,45.272,8.2863,4.53,215.769,
85,26.6,96,4.462,1.0566016,7.85,7.9317,9.6135,232.006,
76,27.1,110,26.211,7.111918,21.778,4.935635,8.49395,45.843,
77,25.9,85,4.58,0.960273333,13.74,9.75326,11.774,488.829,
45,21.30394858,102,13.852,3.4851632,7.6476,21.056625,23.03408,552.444,
45,20.82999519,74,4.56,0.832352,7.7529,8.237405,28.0323,382.955,
49,20.9566075,94,12.305,2.853119333,11.2406,8.412175,23.1177,573.63,
34,24.24242424,92,21.699,4.9242264,16.7353,21.823745,12.06534,481.949,
42,21.35991456,93,2.999,0.6879706,19.0826,8.462915,17.37615,321.919,
68,21.08281329,102,6.2,1.55992,9.6994,8.574655,13.74244,448.799,
51,19.13265306,93,4.364,1.0011016,11.0816,5.80762,5.57055,90.6,
62,22.65625,92,3.482,0.790181867,9.8648,11.236235,10.69548,703.973
};
float[] labels = new float[] {
0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1
};
LGBMDataset dataset = LGBMDataset.createFromMat(values, 16, columns.length, true, "", null);
dataset.setFeatureNames(columns);
dataset.setField("label", labels);
return dataset;
Also, see a working example of different ways to deal with input datasets in the LightGBM4j tests.
Example
// cancer dataset from https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra
// with labels altered to fit the [0,1] range
LGBMDataset train = LGBMDataset.createFromFile("cancer.csv", "header=true label=name:Classification", null);
LGBMDataset test = LGBMDataset.createFromFile("cancer-test.csv", "header=true label=name:Classification", train);
LGBMBooster booster = LGBMBooster.create(train, "objective=binary label=name:Classification");
booster.addValidData(test);
for (int i=0; i<10; i++) {
booster.updateOneIter();
double[] evalTrain = booster.getEval(0);
double[] evalTest = booster.getEval(1);
System.out.println("train: " + eval[0] + " test: " + );
}
booster.close();
train.close();
test.close();
Categorical features
LightGBM supports defining features as categorical. To make this work with LightGBM4j, you need to do the following:
- Set their names with
setFeatureNamesso you can reference them later in options - Mark them as
categorical_featurein booster options.
Given the dataset file in the LibSVM format, where categories are index-encoded:
1 0:7 1:2 2:3 3:20 4:15 5:38 6:29 7:201
0 0:5 1:15 2:2 3:1859 4:1 5:156 6:164 7:2475
0 0:2 1:12 2:6 3:648 4:13 5:29 6:38 7:201
1 0:10 1:26 2:5 3:1235 4:14 5:82 6:205 7:931
0 0:6 1:18 2:1 3:737 4:12 5:224 6:162 7:2176
0 0:4 1:12 3:1845 4:18 5:83 6:49 7:1491
0 0:3 2:3 3:1652 4:20 5:2 6:180 7:332
0 0:3 1:21 2:3 3:2010 4:16 5:216 6:69 7:911
0 0:3 1:3 3:1555 4:1 5:84 6:81 7:1192
0 0:8 1:2 2:6 3:1008 4:16 5:216 6:228 7:130
You can load and use them in the following way:
LGBMDataset ds = LGBMDataset.createFromFile("./src/test/resources/categorical.data", "", null);
ds.setFeatureNames(new String[]{"f0", "f1", "f2", "f3", "f4", "f5", "f6", "f7"});
String params = "objective=binary label=name:Classification categorical_feature=f0,f1,f2,f3,f4,f5,f6,f7";
LGBMBooster booster = LGBMBooster.create(ds, params);
for (int i=0; i<10; i++) {
booster.updateOneIter();
double[] eval1 = booster.getEval(0);
System.out.println("train " + eval1[0]);
}
Position bias removal
LightGBM 4.1+ can perform a position-bias aware LTR/LambdaMART training. To perform it with lightgbm4j you need to explicitly define the position field as described in the upstream LightGBM docs:
float[] matrix = new float[] {
// query group 1
1.0f, 2.0f, // doc1
3.0f, 4.0f, // doc2
// query group 2
1.0f, 2.0f, // doc1
3.0f, 4.0f}; // doc2
LGBMDataset ds = LGBMDataset.createFromMat(matrix, 4, 2, true, "", null);
ds.setField("label", new float[] {1.0, 0.0, 1.0, 0.0}); // s
Related Skills
node-connect
340.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
84.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
340.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
commit-push-pr
84.2kCommit, push, and open a PR
