SkillAgentSearch skills...

HRGenie

An agentic HR automation system that generates personalized offer letters based on employee metadata, salary structure, and HR policy documents. Combines document parsing, intelligent chunking, vector-based retrieval, and LLM-powered generation via a simple chat interface.

Install / Use

/learn @MadsDoodle/HRGenie
About this skill

Quality Score

0/100

Supported Platforms

Zed

README

🧠 Offer Letter Generator - Agentic System

This repository contains an Agentic system to generate offer letters for candidates based on salary breakup and HR policies applicable to their function, team, and salary band. Built with modular components, it integrates document parsing, embedding storage, RAG pipelines, LLM-based generation, Jinja2 fallbacks, and a Streamlit frontend.


🎯 Project Overview

Objective:
Automate creation of formal offer letters by combining company policies and candidate metadata through a retrieval-augmented generation (RAG) pipeline, with fallback templating for reliability.

System Architecture Video Drive Link


✨ Key Features

  • 📄 Document Parsing
    Intelligent chunking of HR PDFs and company policies.

  • 🔍 Vector Embeddings
    Embedding generation and storage using Qdrant vector database.

  • 🤖 Contextual Retriever
    Retrieves relevant policy context based on candidate metadata.

  • 🧠 LLM-based RAG Generation
    Uses GPT to generate personalized offer letters using contextual information.

  • 🔧 Jinja2 Templating Fallback
    Provides reliable fallback in case LLM generation fails or is toggled off.

  • FastAPI Backend
    Serves generation endpoints as modular APIs.

  • 🌐 Streamlit Frontend
    Includes a user-friendly interface for HR users to generate letters.

  • 📋 PDF Export
    Generates offer letters in downloadable PDF format with Unicode font support.

  • 🔄 Toggle Modes
    Allows switching between GPT-based and template-based generation dynamically.


📁 Tech Stack Summary

  • Backend: Python, FastAPI, Jinja2
  • Frontend: Streamlit, HTML/CSS
  • LLM Integration: OpenAI GPT
  • Vector DB: Qdrant
  • PDF Generation: WeasyPrint / ReportLab (Unicode support)
  • Document Parsing: PyMuPDF, custom chunking logic

🗂️ Directory Structure

project-root/
├── data/                         # 📦 All input/output data
│   ├── raw_pdfs/                 # HR policies and sample letters
│   │   ├── HR Leave Policy.pdf
│   │   ├── HR Travel Policy.pdf
│   │   └── HR Offer Letter.pdf
│   ├── docs_chunks/             # Chunked JSONs from PDFs
│   ├── embeddings/              # Embeddings (raw)
│   ├── qdrant_ready_embeddings/ # Qdrant-compatible embeddings
│   ├── employee_list.csv        # Source employee metadata
│   ├── employee_list.json       # Converted JSON
│   ├── wfo_policy.json          # Mapping of team to WFO policy
│   ├── generated_letters/       # Markdown/Plaintext outputs
│   └── offer_letters/           # Final offer letter PDFs
│
├── backend/                     # 🧠 Core logic + model pipeline
│   ├── ingest/                  # Chunking, embedding, upload
│   │   ├── chunk_and_embed.py
│   │   └── upload_qdrant.py
│   ├── retriever.py             # Qdrant-based retriever
│   ├── generate_offer_letter.py # RAG-based generation (LLM + retriever)
│   ├── generate_offer_withoutrag.py # LLM-only generator (no retrieval)
│   └── generate_offer_letter_nollm.py # Jinja2 fallback generator
│
├── utils/                       # 🔧 Shared helpers/utilities
│   ├── load_employee_metadata.py
│   └── save_offer_letter_pdf.py
│
├── templates/                   # 📄 Jinja2 fallback templates
│   └── offer_template.txt
│
├── frontend/                    # 🎛️ UI layer
│   ├── app.py                   # Streamlit UI
│   └── static_ui/               # (Optional) HTML-based UI
│       └── index.html
│
├── api/                         # 🌐 REST API
│   └── api_server.py            # FastAPI backend server
│
├── logs/
│   └── chunking.log             # Chunking log
│
├── requirements.txt             # Python dependencies
└── README.md                    # Project documentation


Entire Workflow

flowchart LR
    %% Ingestion
    subgraph Ingestion_Indexing["📥 Ingestion & Indexing"]
        A["📝 Raw PDFs (HR Policies & Templates)"] --> B["🔪 Chunking <br>('unstructured' + heuristics)"]
        B --> C["📦 Chunks JSON <br>(docs_chunks/)"]
        C --> D["🧠 Embedding Generation <br>(text-embedding-3-small)"]
        D --> E["🧩 Embedding Vectors <br>(qdrant_ready_embeddings/)"]
        E --> F["📤 Qdrant Upload <br>('policy_chunks' collection)"]
    end

    %% Retrieval + Metadata
    subgraph Retrieval_Metadata["🔍 Retrieval + Metadata"]
        G["👤 User Query: 'Generate for X'"] --> H["📚 Retriever <br>(retrieve_relevant_chunks)"]
        H --> I["🏆 Top-k Chunks"]
        J["🗂️ employee_list.json"] --> K["📥 load_employee_metadata"]
        K --> L["👨‍💼 Employee Metadata Dict"]
    end

    %% Generation
    subgraph Generation_Pipeline["🧾 Offer Letter Generation"]
        subgraph RAG["🤖 RAG (GPT-4o)"]
            I --> M["📝 generate_offer_letter.py <br>(LLM + Context)"]
            L --> M
            M --> N{"✅ Success?"}
        end
        subgraph Fallback["📄 Fallback (Jinja2)"]
            L --> O["📄 generate_offer_letter_jinja.py"]
            I --> O
        end
        N -- "Yes" --> P["📃 Offer Letter Text"]
        N -- "No" --> O
        O --> P
    end

    %% API Layer
    subgraph API_Layer["🔌 API Interface"]
        P --> Q["🌐 FastAPI Endpoint <br>/generate-offer-letter"]
    end

    %% Frontend
    subgraph Frontends["🖥️ Frontend Interfaces"]
        Q --> R1["💬 Static Chat UI <br>(index.html)"]
        Q --> R2["🎛️ Streamlit App <br>(app.py)"]
    end

    %% Output
    subgraph Output["📤 Output"]
        P --> S["🖨️ PDF Export <br>(save_offer_letter_pdf)"]
        S --> T["📎 Downloadable PDF"]
        R2 --> U["🕘 Session History"]
    end

    %% Styling blocks for readability
    style Ingestion_Indexing fill:#f9f,stroke:#444,stroke-width:2px
    style Retrieval_Metadata fill:#bbf,stroke:#444,stroke-width:2px
    style Generation_Pipeline fill:#bfb,stroke:#444,stroke-width:2px
    style API_Layer fill:#ffd,stroke:#444,stroke-width:2px
    style Frontends fill:#fbf,stroke:#444,stroke-width:2px
    style Output fill:#ff9,stroke:#444,stroke-width:2px

🏗️ System Architecture

3.1 Document Ingestion & Chunking

  • Tool: unstructured.partition.pdf with custom heuristics

  • Custom Logic:

    • Detect Title and Table elements
    • Skip short captions before tables
    • Flush and group non-table elements into sections
    • Label orphan text as "Untitled Section"
  • Output:

    • JSON chunks with section_title, type (text/table), and raw_text
    • Saved under docs_chunks/
    • Logged in chunking.log

3.2 Embedding & Vector Store

  • Embeddings Model: text-embedding-3-small (1536-dim)

  • Embedding Script: Located in src/ingest

    Process:

    • Reads JSON from docs_chunks/
    • Generates embeddings using OpenAI API
    • Formats payload with metadata
    • Writes formatted data to qdrant_ready_embeddings/
  • Qdrant Upload:

    • Connects to local Docker-hosted Qdrant instance
    • Ensures collection policy_chunks exists (COSINE distance)
    • Uploads as PointStruct objects with:
      • UUID5-based id
      • 1536-dim vector
      • Associated payload

3.3 Retriever

  • Function: retrieve_relevant_chunks(query, top_k)

    Process:

    • Computes embedding of input query
    • Searches Qdrant with hnsw_ef=128
    • Returns top-k most relevant text chunks for RAG-based generation

3.4 Employee Metadata Loader

  • Utility: load_employee_metadata(name)

    Features:

    • Reads from Employee_List.json
    • Normalizes fields to keys:
      • name, team, band, base_salary, etc.
    • Raises descriptive errors if:
      • File is missing
      • Employee is not found

3.5 Offer Letter Generation

✅ Primary Generator: generate_offer_letter.py

  • Loads employee metadata and retrieved policy chunks
  • Constructs strict system and user prompts
  • Calls gpt-4o-mini with low temperature for deterministic output
  • Verifies presence of candidate name and letter length
  • Fallback to Jinja2 templating if generation fails

🔄 No-RAG Variant: generate_offer_withoutrag.py

  • Skips retrieval stage
  • Generates letter using metadata-only prompt
  • Enforces strict formatting in GPT prompt

🧰 Jinja2 Fallback: generate_offer_letter_jinja(emp, chunks?)

  • Uses offer_template.txt from templates/
  • Injects:
    • Candidate metadata
    • TITLE_BY_TEAM and WFO_POLICY_BY_TEAM lookups
  • Applies comma filter for formatting salary values

3.6 API Layer

  • Framework: FastAPI (api_server.py)

    Endpoints:

    • GET / – Health check
    • POST /generate-offer-letter
      • Accepts: employee_name, use_jinja flag
      • Routes to GPT or Jinja2 generator
      • Returns JSON with:
        • Generation status
        • source (GPT or Jinja2)
        • Generated letter_text
  • CORS enabled for frontend access


3.7 Frontend Interfaces

🌐 Static HTML Chatbot (index.html)

  • Simple HTML + JS
  • Sends generation requests to FastAPI (directly or via ngrok)

📺 Streamlit App (app.py)

  • Toggle: GPT-based or template-based generation
  • Input: Dropdown or text box for employee selection
  • Preview: Compensation table preview
  • Output: Multi-line display of generated letter
  • Export: PDF download via save_offer_letter_pdf
  • Tracks session-based generation history

🚀 Deployment

  • Backend:

    • Hosted on Render.com
    • Uses Docker + Uvicorn for FastAPI server
  • Frontend:

    • Static site hosted on Vercel
    • Streamlit app hosted on Hugging Face Spaces

🔄 Complete Workflow


1. Preprocessing & Chunking

graph LR
    A[📄 PDF Files in data/] --> B[🧱 Chunking with unstructured]
    B --> C[📦 JSON chunks in docs_chunks/]
    C --> D[🔢 OpenAI Embeddings]
    D --> E[📋 Qdrant-ready format]

Steps:

  • 📄 Input PDFs →

Related Skills

View on GitHub
GitHub Stars7
CategoryDevelopment
Updated18h ago
Forks0

Languages

Python

Security Score

85/100

Audited on Apr 8, 2026

No findings