Inferkt
A llama.cpp binding for Kotlin multiplatform API for common use on (Android, iOS).
Install / Use
/learn @Dilivva/InferktREADME
InferKt
InferKt is a llama.cpp binding for Kotlin multiplatform API for common use on (Android, iOS).
How to use
This library is experimental and api may be subject to change.
Add the dependency
- Add the dependency to your module's
build.gradle.ktsfile:
kotlin {
sourceSets {
commonMain.dependencies{
implementation("com.dilivva:inferkt:${version}")
}
}
}
On iOS: Add Accelerate.framework and Metal.framework to your project on Xcode.
- Explore API in common code:
//Create an instance:
private val inference = createInference()
//Load a model:
val modelSettings = ModelSettings(
val modelPath: String, //model absolute path
val numberOfGpuLayers: Int = 0, //number of GPU layers to use for computation. Defaults to 0 (CPU only).
val useMmap: Boolean = true, //whether to use memory mapping for model loading. Defaults to true.
val useMlock: Boolean = true, //whether to lock the model in memory. Defaults to half of the total threads.
val numberOfThreads: Int = -1, //number of threads to use for inference. Defaults to -1 (half of the total threads).
val context: Int = 512, //context window size for the model. Defaults to 512. Setting higher context sizes may result in out-of-memory errors.
val batchSize: Int = 512
)
inference.preloadModel(modelSettings = modelSettings, progressCallback: { progress: Float -> true })
//Set sampling settings you can tune the model before each inference:
val samplingSettings = SamplingSettings(..)
inference.setSamplingParams(samplingSettings)
//Completion
inference.completion(prompt: "I am a nice cat:", maxTokens: 100, onGenerate: { event: GenerationEvent -> })
//Chat
inference.chat(prompt: "Tell me a joke about Kotlin", maxTokens: 100, onGenerate: { event: GenerationEvent -> })
//Observe events:
when(event){
is GenerationEvent.Error -> println("Error: ${it.error}")
GenerationEvent.Generated -> // Done generating
is GenerationEvent.Generating -> Streaming generated text
GenerationEvent.Loading -> // Evaluating prompt
}
Acknowledgements
llama.cpp: for their awesome work on local inference.
llama.rn: inspired the build process implemented in this project.
Kotlin Multiplatform: awesome cross-platform framework.
Contributing
We welcome contributions to InferKt!
License
InferKt is licensed under the MIT License. See the LICENSE file for details.
Related Skills
node-connect
344.4kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
99.2kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
344.4kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
344.4kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
