SkillAgentSearch skills...

Peraturan.go.id

Platform kecerdasan buatan terdepan untuk navigasi peraturan perundang-undangan Indonesia yang memproses 5,817 dokumen hukum (2001-2025) menjadi 541,445 segmen teks yang dapat dicari secara semantik.

Install / Use

/learn @Open-Technology-Foundation/Peraturan.go.id
About this skill

Quality Score

0/100

Supported Platforms

Universal

README

peraturan.go.id Knowledge Base System

Sistem pencarian hukum Indonesia bertenaga AI: 5,817 peraturan (2001-2025), 541K+ segmen untuk profesional

Platform kecerdasan buatan terdepan untuk navigasi peraturan perundang-undangan Indonesia yang memproses 5,817 dokumen hukum (2001-2025) menjadi 541,445 segmen teks yang dapat dicari secara semantik. Memanfaatkan teknologi embedding OpenAI text-embedding-3-large dan respons Claude AI, sistem ini menyediakan akses instan terhadap kompleksitas regulasi Indonesia dengan pemahaman kontekstual dalam bahasa Indonesia.

📊 System Overview

Current Status

  • Database: 541,445 text chunks (1.1GB SQLite + 6.1GB FAISS index)
  • Coverage: Legal documents from 2001-2025 (100% embedded)
  • Users: Legal professionals, SMEs, government officials across Indonesia
  • Language: Indonesian with multilingual stopword support

Production Statistics

  • Documents: 5,817 legal texts (perban, permen, perda, uu, pp, perpres, perppu)
  • Text Chunks: 541,445 searchable segments (300-500 tokens each)
  • Embeddings: 100% complete (541,445/541,445)
  • Storage: 7.1GB total (1.1GB SQLite + 6.1GB FAISS index)
  • Time Range: 2001-2025 legal regulations
  • Largest Document: perda_2024_5.md (9,826 chunks)

🎯 Purpose & Problem Solved

Primary Problem

Indonesia's regulatory landscape contains thousands of overlapping regulations from multiple government bodies, creating significant barriers for:

  • Legal professionals seeking specific regulations
  • Businesses ensuring regulatory compliance
  • Government officials drafting consistent policies
  • Citizens understanding their legal obligations

Solution Provided

The system transforms static legal documents into an intelligent, searchable knowledge base that:

  • Understands context: Uses AI embeddings to find relevant regulations even when exact terms don't match
  • Speaks Indonesian: Optimized for Indonesian language queries and legal terminology
  • Provides comprehensive answers: Combines multiple relevant regulations in responses
  • Stays current: Includes regulations from 2001 to 2025

🏗️ System Architecture

Data Flow Pipeline

MySQL/Files → export_for_rag → embed_data.text/ → customkb database → SQLite (1.1GB)
                                                 → customkb embed → FAISS Index (6.1GB)

Key Components

  1. Data Source: 5,817 Indonesian legal documents in markdown format
  2. Processing Pipeline: Python-based customkb tool with external dependencies
  3. Storage Layer:
    • SQLite database (541,445 text chunks)
    • FAISS vector index (1536-dimensional embeddings)
  4. AI Integration:
    • OpenAI text-embedding-3-large for embeddings
    • Claude claude-3-7-sonnet-latest for query responses

Document Structure

Documents in embed_data.text/ follow this structure:

# PERATURAN [TYPE] NOMOR [NUMBER] TAHUN [YEAR]
## TENTANG
[Subject/Title]
## JENIS
[Document Type: perban/permen/perda]
## DOKUMEN
[PDF path]
## KONTEN
[Full legal text]

🚀 Getting Started

Prerequisites

  • Python 3.x with specific modules for embeddings
  • SQLite for document storage
  • FAISS library for vector indexing
  • OpenAI API access for embeddings
  • Claude API access for query responses
  • Linux environment (currently on Ubuntu)

Build and Deploy

# Full rebuild - exports data and generates embeddings
./0_build.sh

# Update embeddings only (with checkpoint support)
./embed_with_checkpoints.sh

# Query the knowledge base
/ai/scripts/customkb/customkb query peraturan.go.id.cfg "pertanyaan hukum dalam bahasa Indonesia"

Database Operations

# Check database integrity (should return 541445)
sqlite3 peraturan.go.id.db "SELECT COUNT(*) FROM docs;"

# Check embedding status (should show 541445 embedded, 0 pending)
sqlite3 peraturan.go.id.db "SELECT SUM(embedded) as embedded_docs, COUNT(*) - SUM(embedded) as pending_docs, COUNT(*) as total_docs FROM docs;"

# View database and FAISS index size (1.1GB + 6.1GB)
ls -lh peraturan.go.id.db peraturan.go.id.faiss

# Backup database
cp peraturan.go.id.db backups/peraturan.go.id.db.$(date +%Y%m%d)

System Health Checks

# Verify system integrity
sqlite3 peraturan.go.id.db "SELECT COUNT(*) FROM docs;" # Should return 541445
ls -lh *.db *.faiss # Check file sizes (1.1GB + 6.1GB)
find embed_data.text -name "*.md" | wc -l # Should return 5817

# Test basic query functionality
/ai/scripts/customkb/customkb query peraturan.go.id.cfg "test sistem"

⚙️ Configuration

Technical Specifications (peraturan.go.id.cfg)

  • Vector Model: text-embedding-3-large (1536 dimensions)
  • Query Model: claude-3-7-sonnet-latest
  • Performance: 562 embeddings per batch, 24 concurrent API calls
  • Language: Indonesian with multilingual support

AI Assistant Configuration

The system uses a sophisticated query role configured as a leading Indonesian digital legal consultant that:

  • Masters 5,817 legal regulations (2001-2025) in 541,445 integrated text segments
  • Serves Indonesia's legal ecosystem from Top 100 law firms to 66 million SMEs
  • Provides comprehensive legal analysis with practical implementation guidance
  • Adapts communication based on user expertise level (legal practitioners vs SMEs vs government officials)

Response Framework (8 Categories)

  1. Comprehensive Regulation Identification - Full legal citations with current status
  2. Adaptive Communication - Language adjusted to user expertise level
  3. Practical Implementation Guidance - Reporting obligations, deadlines, sanctions
  4. Regulatory Change Analysis - Transition impacts and adaptation recommendations
  5. SME/Startup Focus - PBBR compliance, OSS navigation, capital requirements
  6. Fintech Sector - Latest OJK regulations, sandbox requirements, AML compliance
  7. Data Protection - UU PDP implementation post-October 2024
  8. Cross-sectoral Issues - Norm conflicts identification and harmonization solutions

📈 Core Functionality

1. Document Processing

  • Input: Legal documents in structured markdown format
  • Processing: Chunks documents into 300-500 token segments with 150-token overlap
  • Output: Searchable database with vector embeddings

2. Semantic Search

  • Vector Search: Uses FAISS index for similarity matching
  • Hybrid Search: Optional BM25 + vector combination (disabled by default)
  • Language Support: Indonesian with multilingual stopwords

3. Natural Language Querying

Query Types Supported:

  • Specific regulation searches (40%): "Peraturan OJK No. 3/2024"
  • Topic-based queries (35%): "persyaratan izin usaha retail"
  • Compliance questions (15%): "kewajiban pelaporan SPT tahunan"
  • Comparative searches (10%): "perbedaan peraturan lama dan baru"

4. AI-Powered Responses

  • Contextual Understanding: Retrieves top 30 relevant chunks
  • Comprehensive Answers: Combines multiple sources in responses
  • Legal Expertise: Configured as Indonesian legal assistant with specific persona

👥 User Demographics

Primary Users (by usage volume)

  • Legal Professionals (30%): Specific regulation searches, Top 100 law firms
  • Business Owners (35%): Compliance and licensing queries, 66 million SMEs
  • Government Officials (25%): Policy research and consistency checks
  • Academic/Others (10%): Research and comparative analysis

Geographic Distribution

  • Java Island (65%): Jakarta (35%), Surabaya (10%), Bandung (8%)
  • Sumatra (15%): Medan, Palembang, Batam
  • Other Islands (15%): Bali, Kalimantan, Sulawesi
  • International (5%): Indonesian businesses abroad

Usage Patterns

  • Daily Users (30%): Law firms, government officials, compliance officers
  • Weekly Users (40%): Business consultants, corporate legal teams
  • Monthly Users (20%): SME owners, researchers
  • Occasional Users (10%): Students, individual citizens

🌐 2024 Regulatory Context

Current Compliance Environment

Addresses Indonesia's regulatory complexity intensified in 2024, including:

  • UU Perlindungan Data Pribadi: Full implementation as of October 17, 2024
  • Global Minimum Tax: New compliance requirements affecting multiple industries
  • Enhanced Fintech Regulations: OJK Regulation No. 3/2024 refining regulatory sandbox framework
  • PBBR Complexity: Risk-based business licensing navigation for 66 million SMEs

Industry-Specific Challenges

  • Capital Markets: 558 legal obligations for public companies post-IPO
  • Data Protection: GDPR-aligned requirements with enforcement penalties
  • Financial Technology: Enhanced sandbox, AML programs, consumer protection
  • E-commerce: Multi-compliance areas including taxation, cybersecurity, advertising ethics

⚠️ System Dependencies & Limitations

Critical Context

IMPORTANT: The actual Python source code for the customkb tool and embedding modules is NOT present in this repository. The system depends on external code located at:

  • /ai/scripts/customkb/customkb - Main knowledge base tool
  • /ai/datasets/peraturan.go.id/export_for_rag - Data export script
  • Python modules under embedding.embed_manager_improved

Known Issues and Limitations

  1. ~~No version control~~ - RESOLVED: Git repository initialized
  2. Hardcoded paths in scripts - system expects specific directory structure
  3. No error recovery in build scripts - failures leave inconsistent state
  4. Missing Python dependencies in requirements.txt (file exists but incomplete)
  5. No authentication/authorization mechanisms
  6. No automated testing or validation framework
  7. No monitoring/health check systems

💪 Strengths & Capabilities

  1. Comprehensive Coverage: 24 years of Indonesian regulations (2001-2025)
  2. Semantic Understanding: AI-powered contextual search beyond keyword matching
  3. Production Ready: Handles 541K+ document chunks efficiently
  4. **La

Related Skills

View on GitHub
GitHub Stars6
CategoryDevelopment
Updated25d ago
Forks0

Languages

Shell

Security Score

90/100

Audited on Mar 13, 2026

No findings