ImageCaptioning

No description available

Generate Convert Improve

Install / Use

/learn @ChSatyaSavith/ImageCaptioning

About this skill

Quality Score

0/100

README

DIFFMOD

DIFFMOD is an image captioning model. Image captioning models are currently owned by some big companies such as Instagram, Facebook and Google. And the models which are available, are either monetised or not working at all.

DIFFMOD is different from such models as we want our model to be publically available. Inspired by the open source models like stable diffusion and auto gpt, we want DIFFMOD to be an open source library to revolutionise the image captioning community.

Executable File : app.py

Tech Stack

Python: Keras, Tensorflow, Flask, OpenCV

Model: EfficientNet

Demo:

Future Plans

We trained it over flickr8k. We're now planning to upscale the model, and training it to MSCOCO with over 330,000 images. Also planning to deploy online on one of our subdomains.

Authors

Avdhan
Satya
Mansi

Related Skills

node-connect

354.5k

Diagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps

frontend-design

112.4k

Create distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.

openai-whisper-api

354.5k

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

qqbot-media

354.5k

QQBot 富媒体收发能力。使用 <qqmedia> 标签，系统根据文件扩展名自动识别类型（图片/语音/视频/文件）。