HRGenie
An agentic HR automation system that generates personalized offer letters based on employee metadata, salary structure, and HR policy documents. Combines document parsing, intelligent chunking, vector-based retrieval, and LLM-powered generation via a simple chat interface.
Install / Use
/learn @MadsDoodle/HRGenieREADME
🧠 Offer Letter Generator - Agentic System
This repository contains an Agentic system to generate offer letters for candidates based on salary breakup and HR policies applicable to their function, team, and salary band. Built with modular components, it integrates document parsing, embedding storage, RAG pipelines, LLM-based generation, Jinja2 fallbacks, and a Streamlit frontend.
🎯 Project Overview
Objective:
Automate creation of formal offer letters by combining company policies and candidate metadata through a retrieval-augmented generation (RAG) pipeline, with fallback templating for reliability.
✨ Key Features
-
📄 Document Parsing
Intelligent chunking of HR PDFs and company policies. -
🔍 Vector Embeddings
Embedding generation and storage using Qdrant vector database. -
🤖 Contextual Retriever
Retrieves relevant policy context based on candidate metadata. -
🧠 LLM-based RAG Generation
Uses GPT to generate personalized offer letters using contextual information. -
🔧 Jinja2 Templating Fallback
Provides reliable fallback in case LLM generation fails or is toggled off. -
⚡ FastAPI Backend
Serves generation endpoints as modular APIs. -
🌐 Streamlit Frontend
Includes a user-friendly interface for HR users to generate letters. -
📋 PDF Export
Generates offer letters in downloadable PDF format with Unicode font support. -
🔄 Toggle Modes
Allows switching between GPT-based and template-based generation dynamically.
📁 Tech Stack Summary
- Backend: Python, FastAPI, Jinja2
- Frontend: Streamlit, HTML/CSS
- LLM Integration: OpenAI GPT
- Vector DB: Qdrant
- PDF Generation: WeasyPrint / ReportLab (Unicode support)
- Document Parsing: PyMuPDF, custom chunking logic
🗂️ Directory Structure
project-root/
├── data/ # 📦 All input/output data
│ ├── raw_pdfs/ # HR policies and sample letters
│ │ ├── HR Leave Policy.pdf
│ │ ├── HR Travel Policy.pdf
│ │ └── HR Offer Letter.pdf
│ ├── docs_chunks/ # Chunked JSONs from PDFs
│ ├── embeddings/ # Embeddings (raw)
│ ├── qdrant_ready_embeddings/ # Qdrant-compatible embeddings
│ ├── employee_list.csv # Source employee metadata
│ ├── employee_list.json # Converted JSON
│ ├── wfo_policy.json # Mapping of team to WFO policy
│ ├── generated_letters/ # Markdown/Plaintext outputs
│ └── offer_letters/ # Final offer letter PDFs
│
├── backend/ # 🧠 Core logic + model pipeline
│ ├── ingest/ # Chunking, embedding, upload
│ │ ├── chunk_and_embed.py
│ │ └── upload_qdrant.py
│ ├── retriever.py # Qdrant-based retriever
│ ├── generate_offer_letter.py # RAG-based generation (LLM + retriever)
│ ├── generate_offer_withoutrag.py # LLM-only generator (no retrieval)
│ └── generate_offer_letter_nollm.py # Jinja2 fallback generator
│
├── utils/ # 🔧 Shared helpers/utilities
│ ├── load_employee_metadata.py
│ └── save_offer_letter_pdf.py
│
├── templates/ # 📄 Jinja2 fallback templates
│ └── offer_template.txt
│
├── frontend/ # 🎛️ UI layer
│ ├── app.py # Streamlit UI
│ └── static_ui/ # (Optional) HTML-based UI
│ └── index.html
│
├── api/ # 🌐 REST API
│ └── api_server.py # FastAPI backend server
│
├── logs/
│ └── chunking.log # Chunking log
│
├── requirements.txt # Python dependencies
└── README.md # Project documentation
Entire Workflow
flowchart LR
%% Ingestion
subgraph Ingestion_Indexing["📥 Ingestion & Indexing"]
A["📝 Raw PDFs (HR Policies & Templates)"] --> B["🔪 Chunking <br>('unstructured' + heuristics)"]
B --> C["📦 Chunks JSON <br>(docs_chunks/)"]
C --> D["🧠 Embedding Generation <br>(text-embedding-3-small)"]
D --> E["🧩 Embedding Vectors <br>(qdrant_ready_embeddings/)"]
E --> F["📤 Qdrant Upload <br>('policy_chunks' collection)"]
end
%% Retrieval + Metadata
subgraph Retrieval_Metadata["🔍 Retrieval + Metadata"]
G["👤 User Query: 'Generate for X'"] --> H["📚 Retriever <br>(retrieve_relevant_chunks)"]
H --> I["🏆 Top-k Chunks"]
J["🗂️ employee_list.json"] --> K["📥 load_employee_metadata"]
K --> L["👨💼 Employee Metadata Dict"]
end
%% Generation
subgraph Generation_Pipeline["🧾 Offer Letter Generation"]
subgraph RAG["🤖 RAG (GPT-4o)"]
I --> M["📝 generate_offer_letter.py <br>(LLM + Context)"]
L --> M
M --> N{"✅ Success?"}
end
subgraph Fallback["📄 Fallback (Jinja2)"]
L --> O["📄 generate_offer_letter_jinja.py"]
I --> O
end
N -- "Yes" --> P["📃 Offer Letter Text"]
N -- "No" --> O
O --> P
end
%% API Layer
subgraph API_Layer["🔌 API Interface"]
P --> Q["🌐 FastAPI Endpoint <br>/generate-offer-letter"]
end
%% Frontend
subgraph Frontends["🖥️ Frontend Interfaces"]
Q --> R1["💬 Static Chat UI <br>(index.html)"]
Q --> R2["🎛️ Streamlit App <br>(app.py)"]
end
%% Output
subgraph Output["📤 Output"]
P --> S["🖨️ PDF Export <br>(save_offer_letter_pdf)"]
S --> T["📎 Downloadable PDF"]
R2 --> U["🕘 Session History"]
end
%% Styling blocks for readability
style Ingestion_Indexing fill:#f9f,stroke:#444,stroke-width:2px
style Retrieval_Metadata fill:#bbf,stroke:#444,stroke-width:2px
style Generation_Pipeline fill:#bfb,stroke:#444,stroke-width:2px
style API_Layer fill:#ffd,stroke:#444,stroke-width:2px
style Frontends fill:#fbf,stroke:#444,stroke-width:2px
style Output fill:#ff9,stroke:#444,stroke-width:2px
🏗️ System Architecture
3.1 Document Ingestion & Chunking
-
Tool:
unstructured.partition.pdfwith custom heuristics -
Custom Logic:
- Detect Title and Table elements
- Skip short captions before tables
- Flush and group non-table elements into sections
- Label orphan text as
"Untitled Section"
-
Output:
- JSON chunks with
section_title,type(text/table), andraw_text - Saved under
docs_chunks/ - Logged in
chunking.log
- JSON chunks with
3.2 Embedding & Vector Store
-
Embeddings Model:
text-embedding-3-small(1536-dim) -
Embedding Script: Located in
src/ingestProcess:
- Reads JSON from
docs_chunks/ - Generates embeddings using OpenAI API
- Formats payload with metadata
- Writes formatted data to
qdrant_ready_embeddings/
- Reads JSON from
-
Qdrant Upload:
- Connects to local Docker-hosted Qdrant instance
- Ensures collection
policy_chunksexists (COSINE distance) - Uploads as
PointStructobjects with:- UUID5-based
id - 1536-dim
vector - Associated
payload
- UUID5-based
3.3 Retriever
-
Function:
retrieve_relevant_chunks(query, top_k)Process:
- Computes embedding of input query
- Searches Qdrant with
hnsw_ef=128 - Returns top-k most relevant text chunks for RAG-based generation
3.4 Employee Metadata Loader
-
Utility:
load_employee_metadata(name)Features:
- Reads from
Employee_List.json - Normalizes fields to keys:
name,team,band,base_salary, etc.
- Raises descriptive errors if:
- File is missing
- Employee is not found
- Reads from
3.5 Offer Letter Generation
✅ Primary Generator: generate_offer_letter.py
- Loads employee metadata and retrieved policy chunks
- Constructs strict
systemanduserprompts - Calls
gpt-4o-miniwith low temperature for deterministic output - Verifies presence of candidate name and letter length
- Fallback to Jinja2 templating if generation fails
🔄 No-RAG Variant: generate_offer_withoutrag.py
- Skips retrieval stage
- Generates letter using metadata-only prompt
- Enforces strict formatting in GPT prompt
🧰 Jinja2 Fallback: generate_offer_letter_jinja(emp, chunks?)
- Uses
offer_template.txtfromtemplates/ - Injects:
- Candidate metadata
TITLE_BY_TEAMandWFO_POLICY_BY_TEAMlookups
- Applies
commafilter for formatting salary values
3.6 API Layer
-
Framework: FastAPI (
api_server.py)Endpoints:
GET /– Health checkPOST /generate-offer-letter- Accepts:
employee_name,use_jinjaflag - Routes to GPT or Jinja2 generator
- Returns JSON with:
- Generation
status source(GPT or Jinja2)- Generated
letter_text
- Generation
- Accepts:
-
CORS enabled for frontend access
3.7 Frontend Interfaces
🌐 Static HTML Chatbot (index.html)
- Simple HTML + JS
- Sends generation requests to FastAPI (directly or via
ngrok)
📺 Streamlit App (app.py)
- Toggle: GPT-based or template-based generation
- Input: Dropdown or text box for employee selection
- Preview: Compensation table preview
- Output: Multi-line display of generated letter
- Export: PDF download via
save_offer_letter_pdf - Tracks session-based generation history
🚀 Deployment
-
Backend:
- Hosted on Render.com
- Uses Docker + Uvicorn for FastAPI server
-
Frontend:
- Static site hosted on Vercel
- Streamlit app hosted on Hugging Face Spaces
🔄 Complete Workflow
1. Preprocessing & Chunking
graph LR
A[📄 PDF Files in data/] --> B[🧱 Chunking with unstructured]
B --> C[📦 JSON chunks in docs_chunks/]
C --> D[🔢 OpenAI Embeddings]
D --> E[📋 Qdrant-ready format]
Steps:
- 📄 Input PDFs →
Related Skills
node-connect
352.9kDiagnose OpenClaw node connection and pairing failures for Android, iOS, and macOS companion apps
frontend-design
111.5kCreate distinctive, production-grade frontend interfaces with high design quality. Use this skill when the user asks to build web components, pages, or applications. Generates creative, polished code that avoids generic AI aesthetics.
openai-whisper-api
352.9kTranscribe audio via OpenAI Audio Transcriptions API (Whisper).
qqbot-media
352.9kQQBot 富媒体收发能力。使用 <qqmedia> 标签,系统根据文件扩展名自动识别类型(图片/语音/视频/文件)。
