SkillAgentSearch skills...

SciKGs

A Survey on Knowledge Graphs in AI for Science

Install / Use

/learn @HICAI-ZJU/SciKGs

README

Awesome Scientific Knowledge Graphs

Bridging Data and Discovery: A Survey on Knowledge Graphs in AI for Science

📑 Table of Contents

🧬 Research Scopes

overview An overview of the scope in this survey, covering four fundamental scientific tasks in biology, chemistry, and materials science: (a) drug development and optimization, (b) omics interpretation and analysis, (c) chemical reaction and synthesis, and (d) materials design and discovery.

📚 Structure of Survey

structure Structure of the survey. Our review is structured around the lifecycle of SciKGs: from their conceptual foundation and construction methodologies, to their applications and synergistic integration with LLMs for discovery, culminating in challenges, opportunities and future directions that envision SciKGs as engines for autonomous scientific discovery.

🔗 Evolution of SciKGs

evolution The co-evolution of knowledge graph technologies and their scientific practices. The technological evolution of KGs (top) has continually enabled new paradigms in SciKG applications (bottom). This progression has moved from static cataloguing and manual integration to machine learning-driven inference, culminating in the current era of bidirectional synergy between LLMs and KGs. This synergy, leveraging tools such as RAG and AI agents, transforms SciKGs from static repositories into dynamic engines for generative scientific discovery. Abbr., SQL: Structured Query Language; RDF: Resource Description Framework; OWL: Web Ontology Language; SPARQL: SPARQL Protocol and RDF Query Language; GNN: graph neural network; KGE: knowledge graph embedding; RAG: retrieval-augmented generation.

🏗️ Construction and Maintenance of SciKGs

construct Construction and maintenance of SciKGs. (a) The foundation of SciKG construction involves integrating diverse data sources, including structured databases, unstructured text, and multimodal data. (b) Two main approaches for extracting entities and relations from the acquired data are illustrated: rule/dictionary-based extraction, which relies on predefined lexicons and rules, and LLM-based extraction, involving fine-tuning on scientific datasets and prompt engineering. (c) Ontology alignment integrates diverse representations of the same entity (e.g., aspirin), followed by graph embedding into a continuous vector space. (d) Dynamic updating through incremental learning and LLM-driven error correction ensures SciKGs remain accurate and up to date. (e-h) Sub-figures illustrate representative examples of specialized knowledge graphs for drugs, omics, chemicals, and materials, respectively.

🌐 Core Functions of SciKGs

app Summary of core functions of SciKGs in diverse scientific tasks. SciKGs serve as a foundational infrastructure that: (1) organizes heterogeneous scientific data into structured knowledge; (2) enhances representation learning via graph embedding; (3) enables causal and relational inference for hypothesis generation; and (4) improves AI model interpretability by grounding predictions in traceable, evidence-based knowledge paths.

🤝 SciKG–LLM Integration for Scientific Discovery

kg_llm Synergistic integration of SciKGs and LLMs for knowledge-driven scientific discovery. (a) SciKGs serve as the foundational knowledge infrastructure by ensuring factual grounding and verification, defining reasonable scientific boundaries, and enabling unified representation of heterogeneous data. (b) LLMs act as dynamic semantic engines through five core functions: semantic interface for knowledge access, analytical reasoner for inference, generative engine for hypothesis design, constructor for knowledge curation, and orchestrator for workflow automation. (c) The SciKG-LLM integration empowers four key scientific discovery tasks: multi-source data interpretation, complex system mechanism analysis, system performance optimization, and innovative solution design.

🧠 Discovery Flywheel

copilot
The autonomous scientific discovery flywheel driven by LLM agents and SciKGs.

⚖️ Challenges and Opportunities in SciKGs

chall_oppor Challenges and Opportunities in SciKGs. This figure illustrates the major challenges (C1-C4) facing SciKGs, including data quality and completeness, interoperability and integration, dynamic and temporal knowledge, and trustworthy and explainable reasoning. Each challenge is paired with corresponding opportunities (O1-O4) for advancement, such as building standards and benchmarks, integrating multimodal foundation models, autonomous updating via agents, and developing community-driven platforms. The green sections depict workflows (W1-W4) that enable these opportunities, highlighting a path towards more auditable, unified, dynamic, and community-governed SciKGs.

Collection of SciKGs and its Applications

Drug Development and Optimization

| Year | Title | KG Name | KG Type | Domain | Construction Method | Venue | Paper | Code | | ---- | ----------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------------------- | ------------------- | ------------------------------------------------------------------------------- | ------------------- | ---------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 2025 | TarIKGC: A Target Identification Tool Using Semantics-Enhanced Knowledge Graph Completion with Application to CDK2 Inhibitor Discovery | biological activity KG | public KG | DTI prediction | Semi-automated | Journal of Medicinal Chemistry | Link | Link | | 2025 | A comprehensive large-scale biomedical knowledge graph for AI-powered data-driven biomedical research | iKraph | Multi-source KG | Drug repurposing and Hypothesis Generation | Semi-automated | Nature Machine Intelligence | Link | Link | | 2025 | VITAGRAPH: Building a Knowledge Graph for Biologically Re

Related Skills

View on GitHub
GitHub Stars25
CategoryDesign
Updated2d ago
Forks5

Security Score

80/100

Audited on Apr 8, 2026

No findings