SkillAgentSearch skills...

Fojin

Buddhist Digital Text Platform — 9,200+ texts, 500+ sources, 8 UI languages, AI Q&A (RAG), knowledge graph, full-text search

Install / Use

/learn @xr843/Fojin

README

<div align="center">

FoJin 佛津

The World's Encyclopedic Buddhist Digital Text Platform

503 sources. 30 languages. 30 countries. 23,500+ full-text volumes. One search.

Aggregating the world's Buddhist digital heritage — 10,500+ texts with 23,500+ volumes of full content in Pali, Classical Chinese, Tibetan, and Sanskrit from 503 data sources — with CBETA-style reading, AI-powered Q&A (RAG + reranking + citations + data source recommendations), knowledge graph with 31K+ entities and 28K+ relations (including 23K teacher-student lineage chains), 31 dictionaries with 679K entries across 6 languages, timeline visualization, collections, citations, annotations, bookmarks, and multi-language parallel reading.

Live Demo  ·  API Docs  ·  中文文档  ·  Discussions  ·  Discord  ·  Report Bug

CI Security Scan License GitHub stars

FoJin — Global Buddhist Digital Text Platform

</div>

Why FoJin?

Buddhist texts are scattered across hundreds of databases worldwide — CBETA, SuttaCentral, BDRC, SAT, 84000, GRETIL, and many more. Each has different interfaces, languages, and data formats. Researchers spend more time finding texts than reading them.

FoJin solves this. It aggregates 503 sources into a single, searchable platform with features no other tool provides:

| What you need | How FoJin helps | |---|---| | Find a sutra across databases | Multi-dimensional search across 10,500+ texts from 503 sources | | Read the full text online | 8,900+ texts with 23,500+ volumes of full content, CBETA-style layout | | Compare translations | Parallel reading in 30 languages side by side | | Look up Buddhist terms | 31 dictionaries, 679K entries (Chinese/Sanskrit/Pali/Tibetan/English) | | Explore relationships | Knowledge graph with 31K+ entities and 28K+ relations (23K lineage chains) | | Discover similar texts | Semantic similarity powered by 678K+ embedding vectors (pgvector + HNSW) | | View original manuscripts | IIIF manuscript viewer connected to BDRC and more | | Ask questions about texts | AI Q&A ("XiaoJin") with RAG, reranking, clickable citations, and follow-up suggestions | | Explore history visually | Timeline & Dashboard — dynasty charts, translation trends, category analytics | | Save and organize | Collections, bookmarks, annotations for personal study | | Cite in research | Citation export (BibTeX, RIS, APA) for academic use |

Quick Start

git clone https://github.com/xr843/fojin.git
cd fojin
cp .env.example .env        # edit POSTGRES_PASSWORD before starting
docker compose up -d         # database migrations run automatically

Then visit: http://localhost:3000

API docs at http://localhost:8000/docs

After first startup, the platform has the database schema and source metadata but no text content. To import texts from public data sources:

# Import CBETA catalog (auto-scans local xml-p5 directory or fetches from remote)
docker exec fojin-backend python scripts/import_catalog.py

# Import CBETA full text content (requires xml-p5 repository)
docker exec fojin-backend python scripts/import_content.py --all --xml-dir /data/xml-p5

# Generate embeddings for AI Q&A (supports incremental processing)
docker exec fojin-backend python -m scripts.generate_embeddings --source cbeta

# Import SuttaCentral Early Buddhist Texts
docker exec fojin-backend python scripts/import_suttacentral.py

# See all available importers
ls backend/scripts/import_*.py

Each importer downloads data directly from the original source (CBETA, SuttaCentral, etc.) — no data is bundled in this repository.

Features

Multi-Dimensional Search

Search across Buddhist canons by title, translator, catalog number, or full-text keyword. Powered by Elasticsearch with ICU tokenizer for multi-language support.

<p align="center"><img src="./docs/screenshots/search.png" alt="Search results for Avatamsaka Sutra" width="800"></p>

Full-Text Reading

Read 8,900+ Buddhist texts with 23,500+ volumes of full content online. CBETA-style typography with intelligent verse/prose detection, paragraph reflow, and adjustable font size. Navigate by volume, scroll through content, and jump between related texts.

Parallel Reading (30 Languages)

Compare translations side by side — Classical Chinese, Sanskrit, Pali, Tibetan, English, Japanese, Korean, Gandhari, and 21 more languages.

Dictionary Lookup

31 authoritative dictionaries with 679,000+ entries across Chinese, Pali, Sanskrit, Tibetan, and English:

Chinese Buddhist Dictionaries (14)

  • NTI Reader (佛学辞典) — 161K entries, Chinese↔English
  • Suihan Lu (新集藏經音義隨函錄) — 72K entries, Tang dynasty phonetic glossary
  • Fo Guang (佛光大辭典) — 32K entries
  • Ding Fubao (丁福保佛学大辞典) — 31K entries
  • Yiqiejing Yinyi (一切經音義, 慧琳音義) — 23K entries, Buddhist scriptural phonetics
  • Faxiang Dictionary (法相辭典, 朱芾煌) — 15K entries, Yogācāra terminology
  • Zhonghua Encyclopedia (中華佛教百科全書) — 6K entries
  • Common Buddhist Terms (佛學常見詞彙, 陳義孝) — 6K entries
  • Agama Dictionary (阿含辭典, 莊春江) — 5K entries
  • Fanfanyu (翻梵語) — 4K entries, Sanskrit-Chinese translation glossary
  • Xu Yinyi (續一切經音義, 希麟) — 2K entries
  • Yogācāra Glossary (唯識名詞白話新解) — 2K entries
  • Sanzang Fashu (三藏法數) — 1K entries
  • Buddhist Origins of Idioms (俗語佛源) — 567 entries

Pali Dictionaries (5)

  • Digital Pali Dictionary (DPD) — 89K entries, grammar + etymology + examples
  • NCPED (New Concise Pali-English Dictionary) — 21K entries
  • PTS PED (Pali Text Society) — 16K entries
  • Buddhadatta (巴利語辭典, 達摩比丘中譯) — 11K entries, Pali→Chinese
  • SuttaCentral Glossary — 6K entries

Sanskrit Dictionaries (3)

  • Monier-Williams (Sanskrit-English Dictionary) — 32K entries
  • Edgerton BHS (Buddhist Hybrid Sanskrit Dictionary) — 18K entries
  • Fanyi Mingyi Ji (翻譯名義集) — 1K entries

Tibetan Dictionaries (2)

  • Rangjung Yeshe (Tibetan-English Dictionary) — 74K entries
  • Hopkins (Tibetan-Sanskrit-English Dictionary) — 18K entries

Multilingual Reference (4)

  • Soothill-Hodous (Chinese Buddhist Terms, Chinese↔English) — 17K entries
  • Mahāvyutpatti (翻譯名義大集, Sanskrit↔Tibetan↔Chinese) — 9K entries
  • Nanshan Vinaya (南山律学辞典) — 3K entries
  • Pentaglot (五體清文鑑, Manchu-Mongolian-Tibetan-Chinese-Sanskrit) — 1K entries

Specialized (3)

  • Abhidharma Dictionary (阿毗達磨辭典) — 1K entries
  • Tiantai Dictionary (天台教學辭典) — 1K entries
  • DDB (Digital Dictionary of Buddhism) — CJK Buddhist terminology

Knowledge Graph

31,000+ entities (persons, monasteries, texts, schools, concepts) and 28,000+ relationships — including 23,000 teacher-student lineage chains from the DILA Authority Database — visualized as an interactive force-directed graph. Click any node to explore connections.

AI Q&A — "XiaoJin"

Ask questions in natural language. XiaoJin answers based on canonical Buddhist texts using RAG (Retrieval-Augmented Generation) with 678K+ embedding vectors and HNSW index for fast semantic search. Features include:

  • Multi-turn conversation with context awareness
  • Keyword + optional API cross-encoder reranking for higher answer quality
  • Clickable citations in 【《经名》第N卷】 format — click to jump to the text reader
  • Progressive follow-up suggestions (concept → related texts → practice)
  • Smart data source recommendations — when users ask about finding databases, AI automatically recommends relevant sources from 503 data sources via semantic similarity
  • "Ask XiaoJin" button on the reader page — select text to ask about it
  • Tab key cycles through suggested questions in the input box
  • BYOK (Bring Your Own Key) support for multiple LLM providers
<p align="center"><img src="./docs/screenshots/ai-chat-answer.png" alt="AI Q&A answering about Xuanzang's disciples" width="800"></p>

Similar Passages Discovery

When reading any text, the sidebar automatically finds semantically similar passages from other texts using pgvector cosine similarity. Discover cross-textual parallels, related commentaries, and thematic connections across the entire canon.

Timeline & Statistics Dashboard

Visualize Buddhist textual history with interactive D3 charts — dynasty distribution, translation trends, language breakdown, category treemap, and top translators. Toggle between scholarly and popular presentation modes.

Collections, Bookmarks & Annotations

Save texts to personal collections, bookmark specific passages, and add annotations for study and research.

Citation Export

Export citations in BibTeX, RIS, and APA formats for academic papers and reference managers.

Manuscript Viewer

Browse digitized manuscripts and rare editions from BDRC and other institutions via IIIF protocol.

Multi-Language UI

Available in 9 languages: Simplified Chinese, Traditional Chinese, English, Japanese, Korean, Thai, Vietnamese, Sinhala, and Burmese.

Data Sources

<p align="center"><img src="./docs/screenshots/sources.png" alt="503 data sources from 30 countries" width="800"></p>

FoJin aggregates data from major Buddhist digital projects worldwide. Sources are categorized by research field (Han, Theravada, Tibetan, Sanskrit, Dunhuang, Art,

Related Skills

View on GitHub
GitHub Stars196
CategoryDevelopment
Updatedjust now
Forks42

Languages

Python

Security Score

100/100

Audited on Apr 3, 2026

No findings