Voice-Based Healthcare Q&A Application

Abstract

A voice-based search and Q&A system for disseminating reliable healthcare information in Indian languages, leveraging Sarvam-M through advanced prompt engineering for accuracy and safety.

Overview

This project, developed for DA 225o Deep Learning (Summer 2025), aims to provide accessible healthcare information to diverse Indian users through a voice interface, ensuring cultural relevance and safety.

Project Motivation

Addresses the gap in reliable, language-accessible healthcare information in India, improving health literacy and awareness. Key challenges in the Indian healthcare landscape include a skewed doctor-patient ratio, limited access to qualified medical advice in remote and rural areas, and significant linguistic diversity. This project specifically aims to mitigate these by:

Providing initial healthcare information and guidance in multiple Indian languages, reducing language as a barrier.
Offering a first point of contact for common health queries, potentially reducing the load on overwhelmed healthcare professionals for non-critical issues.
Improving health literacy by making information more understandable and accessible, particularly for users who might rely more on oral communication or have difficulty with text-based resources.

Key Features

Multi-language voice input/output (10 Indian languages).
AI-driven responses for general healthcare questions via prompt-engineered Sarvam-M.
Safety guardrails with disclaimers and emergency redirection.
Interactive Symptom Checker with Preliminary Triage Advice.

Technologies Used

Sarvam AI Platform: Utilized for its comprehensive suite of AI services for Indian languages, including:
- Speech-to-Text (STT): For converting user's voice input in various Indian languages into text. (Leveraging Sarvam AI's Saarika v2 STT or similar models).
- Text-to-Speech (TTS): For synthesizing voice output from the generated text responses in a natural-sounding Indian voice.
- Large Language Model (Sarvam-M): For Natural Language Understanding (NLU) to interpret user queries, and for generating responses through sophisticated prompt engineering techniques. Sarvam-M's capabilities in handling Indian languages and generating contextually relevant, conversational text are key to the application's core logic, including symptom assessment summaries and general health information.

Application Flow

The following diagram illustrates the workflow of the application:

graph TD
    A[User Voice Input] --> B[STT Engine]
    B --> C[Text Query]
    C --> D[Sarvam-M: NLU]

    subgraph Symptom Checker Flow
        direction LR
        D --> |Intent: SYMPTOM_QUERY| SC1[Initialize SymptomChecker]
        SC1 --> SC2{Has Follow-up Questions?}
        SC2 -- Yes --> SC3[Ask Follow-up Question]
        SC3 --> SC4[User Voice Answer]
        SC4 --> SC5[STT for Answer]
        SC5 --> SC6[Record Answer in SymptomChecker]
        SC6 --> SC2
        SC2 -- No --> SC7["Generate Preliminary Assessment (Sarvam-M + KB Triage Points)"]
        SC7 --> AssessmentText[Assessment Text]
    end

    subgraph Standard Query Flow
        direction LR
        D --> |Other Intents| F[Sarvam-M: Answer Generation via Prompt Engineering]
        F --> StandardText[Standard Answer Text]
    end

    AssessmentText --> G[Safety Layer]
    StandardText --> G[Safety Layer]
    G --> |Validate/Redirect| H[TTS Engine]
    H --> I[Voice Output with Disclaimer]

System Architecture Overview

The application integrates several key components to deliver a voice-based healthcare Q&A experience:

Voice Interface (STT/TTS): User interacts via voice. Sarvam AI services handle speech-to-text conversion of the user's query and text-to-speech for delivering the system's response.
NLU Processor (nlu_processor.py): The transcribed text query is processed by Sarvam-M to identify the user's intent (e.g., asking about a disease, describing symptoms) and extract relevant medical entities (symptoms, diseases, etc.).
Core Logic Orchestration (main.py): This script orchestrates the overall flow. Based on the NLU output, it decides whether to invoke the Symptom Checker or the standard prompt-based Q&A flow.
Symptom Checker (symptom_checker.py):
- If activated, this module manages an interactive dialogue to gather more details about the user's symptoms using predefined questions from symptom_knowledge_base.json.
- It then compiles this information and uses Sarvam-M to generate a preliminary assessment, which is further augmented by rule-based triage points from the local knowledge base.
Response Generation (Standard Queries - response_generator.py):
- For non-symptom related health queries, response_generator.py constructs a detailed prompt using the user's query and NLU output.
- This prompt is then sent to Sarvam-M, which generates an informed answer based on its general knowledge and the guidance provided in the system prompt (see src/prompts.py). This process relies on effective prompt engineering rather than external knowledge base retrieval for general queries.
Safety Layer: All generated responses (from Symptom Checker or standard query responses) pass through a safety layer. This includes hardcoded checks for emergencies or diagnosis requests and ensures appropriate disclaimers are appended.
Knowledge Bases:
- symptom_knowledge_base.json: A structured JSON file defining symptoms, keywords, follow-up questions, and basic triage points for the Symptom Checker.

Symptom Checker and Triage

The application includes an interactive symptom checker to help users understand potential implications of their symptoms and receive general guidance.

How it works:

Activation: If the NLU module identifies a user's query as relating to symptoms (e.g., "I have a fever and a cough"), the Symptom Checker is activated.
Interactive Q&A: The checker may ask a series of follow-up questions based on the initially reported symptoms. These questions are drawn from the symptom_knowledge_base.json file. This step is interactive, requiring further voice input from the user for each question.
Preliminary Assessment: Once sufficient information is gathered, the Symptom Checker generates a preliminary assessment. This involves:
- Sending the collected symptom details (initial query + answers to follow-ups) to the Sarvam-M model for a summarized interpretation and suggested next steps.
- Augmenting this with relevant basic_triage_points from the symptom_knowledge_base.json.
Output: The user receives this assessment, which includes a summary, suggested severity, recommended general next steps, potential warnings, and relevant triage points from the knowledge base.

Important Disclaimer: The information provided by the symptom checker is for general guidance only and is not a medical diagnosis. Users are always advised to consult a qualified healthcare professional for any health concerns or before making any decisions related to their health. This disclaimer is consistently provided with any assessment.

Project Structure

main.py: Main application script to run the voice-based Q&A.
src/: Contains the core application logic.
- nlu_processor.py: Handles Natural Language Understanding using Sarvam-M.
- nlu_config.json: Configuration for intent detection and entity extraction
- hinglish_symptoms.json: Hinglish symptom mappings for hybrid language support
- common_misspellings.json: Common misspellings dictionary for text normalization
- prompts.py: Defines system prompt used by Sarvam-M for response generation
- response_generator.py: Generates responses for standard queries using prompt engineering with Sarvam-M, guided by NLU output.
- symptom_checker.py: Module for interactive symptom analysis and assessment generation.
- symptom_knowledge_base.json: Configuration file for symptoms, keywords, and follow-up questions.
- audio_capture.py: (Placeholder/Actual) For audio input and STT integration.
- tts_service.py: (Placeholder/Actual) For Text-to-Speech integration.
- utils.py: Utility/helper functions used across modules
tests/: Unit and evaluation tests for various components.
- test_nlu_corrections.py: Tests for NLU correction logic and normalization.
- test_nlu_hinglish.py: Tests Hinglish input parsing and understanding.
- test_evaluation.py: Evaluates overall system outputs vs expected responses.
- evaluation_results_metrics.json: JSON log of evaluation metrics and results
.env: Stores API keys and other environment variables (not tracked by Git).
requirements.txt: Lists project dependencies.
README.md: This file.

Setup and Usage

Prerequisites

Ensure Python 3.10+ is installed.
Clone the repository: git clone <repository-url>
Navigate to the project directory: cd <repository-name>

Create a Python virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Create a .env file in the project root. You can copy from .env.example if provided, or create it manually. Add your Sarvam API key:
```
SARVAM_API_KEY="your_actual_api_key_here"
```
(Obtainable from the Sarvam AI dashboard).
Create a .streamlit/secrets.toml file in the project root by copying the contents of your Firebase service account JSON (downloaded from the Firebase Console) under a [FIREBASE] section.

Running the Application (Streamlit UI)

The primary way to interact with the application is through the Streamlit

HealHub

Install / Use

README