DSG - Data Semantic Governance

Language / 语言: English | 中文

A comprehensive enterprise-grade Data Semantic Governance (DSG) platform that provides unified data catalog management, data view management, data exploration, authentication and authorization, task orchestration, and semantic search capabilities.

Overview

DSG (Data Semantic Governance) is a microservices-based platform designed for enterprise data governance and semantic management. It provides a complete solution for data cataloging, data view management, data exploration, access control, workflow orchestration, and semantic search across multiple data sources.

The platform follows a microservices architecture pattern, with each service handling specific domain responsibilities. All services are built with Go, using clean architecture principles, and can be deployed independently or together using Docker Compose.

System Architecture

DSG consists of the following components:

Core Services

1. Data Catalog Service (Port: 8153)

Data resource catalog management
Information catalog and system management
Data push workflows
Data comprehension and assessment
Category and tree management
Statistics and analytics

2. Data View Service (Port: 8123)

Metadata view (Form View) management
Logic view management
Data lineage analysis
Data classification and masking
Data exploration and dataset management
Graph model management

3. Data Exploration Service (Port: 8281)

Data exploration task management
Exploration report generation
Data quality assessment
Exploration rule configuration
Task scheduling and execution

4. Basic Search Service (Port: 8163)

Full-text search across data catalogs
Information catalog search
Interface service search
Data view search
Electronic license search
Indicator and information system search
Unified cross-domain search

5. Configuration Center Service (Port: 8133)

System configuration management
User and role management
Permission and access control
Menu and dictionary management
Data source configuration
Workflow configuration
Code generation rules

6. Auth Service (Port: 8155)

Policy-based access control (PBAC)
Permission enforcement
Indicator dimensional rules
Data warehouse authorization requests
Resource access management
Workflow integration

7. Data Subject Service (Port: 8134)

Data subject management
Subject lifecycle tracking
Subject relationship management

8. Task Center Service (Port: 8080)

Project and task management
Work order system
Data processing pipelines
Object storage management
Notification and communication
Analytics and reporting

9. Session Service (Port: 8000)

User session management
Session authentication
Token management

10. Data Application Service (Port: 8156)

API interface management
Data service publishing
API lifecycle management
Workflow integration for API approval
Change Data Capture (CDC) for real-time synchronization
Service statistics and monitoring

11. Data Application Gateway (Port: 8157)

Unified API gateway for data services
Request routing and forwarding
API execution and invocation
Service discovery
Request validation and transformation
Rate limiting and access control

Infrastructure Services

OpenSearch (Port: 9200, 9600): Full-text search engine
Kafka (Port: 9092): Message queue with SASL/PLAIN authentication
Zookeeper (Port: 2181): Kafka coordination service
Hydra (Port: 4444, 4445): OAuth2 authentication server
Redis (Port: 6379): Caching and session storage
MariaDB (Port: 3306): Primary database for all services

Frontend Application

React-based web application
Micro-frontend architecture
Multiple business modules and plugins

Key Features

Data Governance

Data Cataloging: Comprehensive data resource and information catalog management
Data Classification: Automated data classification and categorization
Data Quality: Data quality assessment and monitoring
Data Lineage: End-to-end data lineage tracking and visualization
Data Masking: Sensitive data masking and privacy protection

Semantic Management

Semantic Search: Advanced full-text search with configurable tokenizers
Metadata Management: Rich metadata management and organization
Data Views: Unified data view management (metadata and logic views)
Data Exploration: Interactive data exploration and analysis

Access Control

Policy-Based Access Control (PBAC): Fine-grained access control policies
Role-Based Access Control (RBAC): Role and permission management
Resource Authorization: Resource-level access control
Workflow Integration: Authorization workflow support

Workflow & Orchestration

Task Management: Comprehensive task and project management
Work Order System: Work order creation and tracking
Data Processing Pipelines: Data aggregation and processing workflows
Audit Workflows: Audit process management and tracking

Data Application & API Management

API Management: Comprehensive API interface creation and lifecycle management
Data Service Publishing: Publish data views and catalogs as RESTful APIs
API Gateway: Unified entry point for API execution with routing and load balancing
Service Discovery: Dynamic discovery of published data services
Request Processing: Request validation, transformation, and response formatting
API Monitoring: Service call statistics and performance monitoring

System Management

Configuration Management: Centralized system configuration
User Management: User, role, and permission management
Menu Management: Dynamic menu and navigation management
Dictionary Management: System dictionary and metadata management

Technology Stack

Backend Services

Language: Go 1.24+
Web Framework: Gin
ORM: GORM with MySQL/MariaDB driver
Message Queue: Kafka (with SASL/PLAIN), NSQ
Cache: Redis
Search Engine: OpenSearch/Elasticsearch
Dependency Injection: Google Wire
API Documentation: Swagger/OpenAPI
Observability: OpenTelemetry
Logging: Zap
Configuration: Viper
CLI Framework: Cobra

Frontend

Framework: React
Build Tools: Webpack, Vite
Micro-frontend: Plugin-based architecture

Infrastructure

Containerization: Docker, Docker Compose
Database: MariaDB/MySQL
Message Queue: Kafka, NSQ
Cache: Redis
Search: OpenSearch
Authentication: OAuth2 (Hydra)

Project Structure

dsg/
├── services/              # Backend microservices
│   └── apps/             # Application services
│       ├── auth-service/           # Authentication and authorization
│       ├── basic-search/           # Search service
│       ├── configuration-center/  # Configuration management
│       ├── data-catalog/          # Data catalog management
│       ├── data-exploration-service/ # Data exploration
│       ├── data-subject/          # Data subject management
│       ├── data-view/             # Data view management
│       ├── session/              # Session management
│       ├── task_center/           # Task and workflow management
│       ├── data-application-service/  # Data application and API management
│       └── data-application-gateway/  # API gateway for data services
├── frontend/             # Frontend web application
│   ├── src/             # Source code
│   ├── public/          # Static assets
│   └── config/          # Build configuration
├── deploy/              # Deployment configurations
│   ├── docker/         # Docker configurations
│   │   ├── kafka/      # Kafka configuration
│   │   └── opensearch/ # OpenSearch configuration
│   ├── docker-compose.yml      # Multi-service orchestration
│   └── docker-compose.dev.yml  # Development environment
├── script/              # Utility scripts
│   ├── start-go-services.sh   # Service startup script
│   └── verify-go-work.sh      # Go workspace verification
├── local_patches/       # Local dependency patches
├── go.work             # Go workspace configuration
└── LICENSE             # License file

Prerequisites

Go: 1.24.0 or higher
Docker: 20.10+ and Docker Compose 2.0+
Node.js: 16+ (for frontend development)
Make: For build automation

Infrastructure Requirements

Memory: Minimum 8GB RAM (16GB recommended)
Disk: At least 20GB free space
CPU: 4+ cores recommended

Quick Start

1. Clone the Repository

git clone <repository-url>
cd dsg

2. Start All Services with Docker Compose

The easiest way to start all services is using the provided script:

# Start all services
./script/start-go-services.sh

# Or start only core services
./script/start-go-services.sh core

# Or start only Go services
./script/start-go-services.sh go

3. Manual Startup

Alternatively, you can start services manually:

cd deploy
docker-compose up -d

4. Verify Services

Check service status:

cd deploy
docker-compose ps

Access service endpoints:

OpenSearch: http://localhost:9200
Kafka UI: http://localhost:8080 (if enabled)
Hydra Admin: http://localhost:4445
Basic Search: http://localhost:8163
Configuration Center: http://localhost:8133
Data Catalog: http://localhost:8153
Data View: http://localhost:8123
Auth Service: http://localhost:8155
Task Center: http://localhost:8080
Data Application Service: http://localhost:8156
**Data

Dsg

Install / Use

README