ImageCaptioningAndroid
Image captioning on Android
Install / Use
/learn @droidfringe/ImageCaptioningAndroidREADME
ImageCaptioningAndroid
Image captioning on Android
This project has 2 parts:
- Convert a pretrained .ckpt model to .tflite model. This tflite model can be run on android devices.
- Android application which uses .tflite file to perform image captioning on android device.
Steps:
- For show and tell, get the base code from tensorflow repository: https://github.com/tensorflow/models/tree/master/research/im2txt
This does not contain pretrained weights.
-
Get pretrained weights from https://github.com/KranthiGV/Pretrained-Show-and-Tell-model Download the model trained for 2M iterations. Use tensorflow 1.0 in Python 2 to test that the downloaded weights are used and image captioning starts working on laptop.
-
Freeze the model to convert the weights loaded from .ckpt file to .pb file. The required changes are in this commit: https://github.com/fringedroid/ImageCaptioningAndroid/commit/4c3444cb95045e6500c42bbb940567b8f174863c
When saving to .pb succeds, make shapes of input and output tensors fixed (no None in shapes).
-
Convert .pb file to .tflite. This is done in model_generation/im2txt/im2txt/convert_to_tflite.py in Test if the .tflite model performs as exptected by using tflite interpreter in Python. This is done in model_generation/im2txt/im2txt/cherry_pick.py
-
After getting a working .tflite model, use it in android app. In the andorid app, Captioner.java performs image captioning by using .tflite model.
The final app borrows components from 2 apps: a) Google's tflite demo app: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/java/demo b) Deepsemantic image captioning app: https://github.com/deepsemantic/Captioner
The MainActivity from (b) is used in (a). Captioner.java contains the logic for image captioning.
Related Skills
node-connect
354.5kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
112.4kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
354.5kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
354.5kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
