BabyDoctor
The AI Radiologist You Can Chat With
Install / Use
/learn @photomz/BabyDoctorREADME
🩻 BabyDoctor
The AI Radiologist You Can Chat With
<p align="center"> <img src="images/babydoctor_cute.png" width="300" height="300" alt="BabyDoctor"> </p><a href='https://huggingface.co/photonmz/llava-roco-8bit'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a> <a href='https://huggingface.co/datasets/photonmz/roco-instruct-65k'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Dataset-blue'></a> <a href='https://pitch.com/public/40a91030-5b7d-4973-9519-f2d1c06ab935'><img src='https://img.shields.io/badge/%F0%9F%8E%A4%20Pitch-4Catalyzer-blue'></a> <a href='https://github.com/photomz/BabyDoctor/tree/main/MODEL_CARD.md'><img src='https://img.shields.io/badge/%F0%9F%93%9C%20Model%20Card-gray'></a>
Welcome to BabyDoctor, your personal "Ultrasound Radiologist in a Box"! Let's face it, most of us try to avoid seeing the doctor as much as we can, especially when it involves cryptic ultrasound scans. BabyDoctor is here to bridge the gap and demystify medical jargon for you.
BabyDoctor uses a LLaVA (Large Language and Vision Assistant) to generate ultrasound analysis. It's a combination of the cutting-edge LLaMa 2 text generator and OpenAI's CLiP for image embedding. The model was fine-tuned for ultrasound scans with a dataset of 65,000 text-image pairs, and trained using a 4-bit quantised LoRA on a Lambda Labs' A10 GPU for 8 hours.
<div align="center"> <a href="https://www.loom.com/share/54c1f5ed36f74914b689695dae9e8e20"> <p>Demo</p> </a> <a href="https://www.loom.com/share/54c1f5ed36f74914b689695dae9e8e20"> <img style="max-width:400px;" src="https://cdn.loom.com/sessions/thumbnails/54c1f5ed36f74914b689695dae9e8e20-with-play.gif"> </a> </div>🚼 Reproduce
To reproduce the results with BabyDoctor, follow these steps on a system with at least 16 vCPUs, 32GB RAM, and a NVIDIA GPU of >12GB VRAM:
- Clone the repository:
git clone https://github.com/<username>/babydoctor.git - Install CUDA following the official NVIDIA setup instructions.
- Install Conda.
- Run
mkdir -p ~/git; cd ~/git. - Clone this repository into
~/git/BabyDoctor. - Run
conda env create -f BabyDoctor/llmenv.yaml. This will take a while. - Run
conda activate llmforbio. From this step onward, execute all commands under this environment. - Run
MAX_JOBS=8 python3 -m pip install flash-attn. - Download the dataset:
git clone https://github.com/razorx89/roco-dataset; cd roco-dataset; python3 scripts/fetch.py; popd. - Prepare training data:
python3 BabyDoctor/scripts/massage_data.py. - Start fine-tuning:
mv BabyDoctor/finetune.sh .; bash finetune.sh. This took 8 hours on the A10. - Modify and run
./BabyDoctor/scripts/inference.shto prompt it!
A Web UI is available following instructions from the BabyDoctor repository.
<p align="center"> <img src="images/chat.png" width="500" height="450" alt="BabyDoctor"> </p>🧪 Curious?
Try running BabyDoctor on your own ultrasound scans or experiment with different prompts. Let's see how well BabyDoctor can bridge the language gap between medical jargon and everyday English for you. You might be surprised!
And, of course, contributions to improve BabyDoctor are always welcome.
Check out these links for more details:
🤝 Contributing
We welcome contributions to BabyDoctor! If you have a feature request, bug report, or proposal, please submit an issue. If you wish to contribute code, please fork this repository and submit a pull request.
📜 License
BabyDoctor is subject to the licenses of Meta's LLaMa 2, OpenAI's CLiP, OpenAI's GPT-4 User License Agreement, and LLaVa. Our data, code and checkpoints is intended and licensed for research use only.
Attribution is appreciated but not necessary:
@misc{photomz2023,
author = {Markus Zhang, Vir Chau},
title = {BabyDoctor},
year = {2023},
howpublished = {\url{https://github.com/photomz/BabyDoctor}},
note = {GitHub}
}
Related Skills
node-connect
353.1kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.6kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
353.1kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
353.1kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
