ImageCaptioningAndroid

Image captioning on Android

Generate Convert Improve

Install / Use

/learn @droidfringe/ImageCaptioningAndroid

About this skill

Quality Score

0/100

README

ImageCaptioningAndroid

Image captioning on Android

This project has 2 parts:

Convert a pretrained .ckpt model to .tflite model. This tflite model can be run on android devices.
Android application which uses .tflite file to perform image captioning on android device.

Steps:

For show and tell, get the base code from tensorflow repository: https://github.com/tensorflow/models/tree/master/research/im2txt

This does not contain pretrained weights.

Get pretrained weights from https://github.com/KranthiGV/Pretrained-Show-and-Tell-model Download the model trained for 2M iterations. Use tensorflow 1.0 in Python 2 to test that the downloaded weights are used and image captioning starts working on laptop.
Freeze the model to convert the weights loaded from .ckpt file to .pb file. The required changes are in this commit: https://github.com/fringedroid/ImageCaptioningAndroid/commit/4c3444cb95045e6500c42bbb940567b8f174863c

When saving to .pb succeds, make shapes of input and output tensors fixed (no None in shapes).

Convert .pb file to .tflite. This is done in model_generation/im2txt/im2txt/convert_to_tflite.py in Test if the .tflite model performs as exptected by using tflite interpreter in Python. This is done in model_generation/im2txt/im2txt/cherry_pick.py
After getting a working .tflite model, use it in android app. In the andorid app, Captioner.java performs image captioning by using .tflite model.

The final app borrows components from 2 apps: a) Google's tflite demo app: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/java/demo b) Deepsemantic image captioning app: https://github.com/deepsemantic/Captioner

The MainActivity from (b) is used in (a). Captioner.java contains the logic for image captioning.

Related Skills

node-connect

354.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。