SkillAgentSearch skills...

SMSD

SMSD — exact substructure & MCS search for chemical graphs.

Install / Use

/learn @asad/SMSD

README

<p align="center"> <a href="https://github.com/asad/SMSD" aria-label="SMSD Pro"> <img src="icons/icon.svg" alt="SMSD Pro" width="180"/> </a> </p> <h1 align="center">SMSD Pro</h1> <p align="center"><strong>Substructure &amp; MCS Search for Chemical Graphs</strong></p> <p align="center"> <a href="https://central.sonatype.com/artifact/com.bioinceptionlabs/smsd"><img src="https://img.shields.io/maven-central/v/com.bioinceptionlabs/smsd" alt="Maven Central"/></a> <a href="https://pypi.org/project/smsd/"><img src="https://img.shields.io/pypi/v/smsd" alt="PyPI"/></a> <a href="https://pypi.org/project/smsd/"><img src="https://img.shields.io/pypi/dm/smsd" alt="Downloads"/></a> <a href="LICENSE"><img src="https://img.shields.io/badge/License-Apache%202.0-blue.svg" alt="License"/></a> <a href="https://github.com/asad/SMSD/releases"><img src="https://img.shields.io/github/v/release/asad/SMSD" alt="Release"/></a> </p>

SMSD Pro provides exact substructure search and maximum common substructure (MCS) search for chemical graphs. It is available for Java, C++ (header-only), and Python. Optional GPU paths are available for CUDA and Apple Metal builds.

Version 6.11.0 delivers cache-optimal core engine (20-40% faster MCS), publication-quality SVG depiction (ACS 1996 standard), 8-phase 2D layout engine with 40+ scaffold templates, distance geometry 3D, and 35+ new Python bindings.

Guides and References

| Document | Description | |----------|-------------| | Examples, How-To, and Cautions | Worked examples for every feature with cautions and performance tips | | Python API Guide | Full Python API reference with code examples | | Java Guide | Java API and CLI usage | | C++ Guide | Header-only C++ integration | | Release Notes 6.11.0 | What's new in this release | | Whitepaper | Algorithm design (11-level MCS, VF2++, ring perception) | | How to Install | Build from source on all platforms | | Changelog | Full versioned change history |

Molfile Support

V2000 and V3000 core graph round-trip, names/comments, SDF properties, charges, isotopes, atom classes/maps, R# plus M RGP, and basic stereo flags.

Copyright (c) 2018-2026 Syed Asad Rahman — BioInception PVT LTD


Install

Java (Maven)

<dependency>
  <groupId>com.bioinceptionlabs</groupId>
  <artifactId>smsd</artifactId>
  <version>6.11.0</version>
</dependency>

Java (Download JAR)

curl -LO https://github.com/asad/SMSD/releases/download/v6.11.0/smsd-6.11.0-jar-with-dependencies.jar

java -jar smsd-6.11.0-jar-with-dependencies.jar \
  --Q SMI --q "c1ccccc1" --T SMI --t "c1ccc(O)cc1" --json -

Python (pip)

pip install smsd

Supported CPython versions: 3.9 through the latest stable release series. Current default test target: Python 3.12. CPU execution is the default path. CUDA and Metal acceleration are optional. RDKit and Open Babel are optional interop layers.

import smsd

result = smsd.substructure_search("c1ccccc1", "c1ccc(O)cc1")
mcs    = smsd.mcs("c1ccccc1", "c1ccc2ccccc2c1")

# Tautomer-aware MCS
mcs    = smsd.mcs("CC(=O)C", "CC(O)=C", tautomer_aware=True)

# Prefer rare heteroatoms (S, P, Se) for reaction mapping
mcs    = smsd.mcs("C[S+](C)CCC(N)C(=O)O", "SCCC(N)C(=O)O",
                   prefer_rare_heteroatoms=True)

# Reaction-aware atom mapping
aam    = smsd.map_reaction_aware("CC(=O)O", "CCO")

# Similarity upper bound (fast pre-filter)
sim    = smsd.similarity("c1ccccc1", "c1ccc(O)cc1")

fp     = smsd.fingerprint("c1ccccc1", kind="mcs")

# Circular fingerprint (ECFP4 equivalent, tautomer-aware)
ecfp4 = smsd.circular_fingerprint("c1ccccc1", radius=2, fp_size=2048)

Java API

import com.bioinception.smsd.core.*;

SMSD smsd = new SMSD(mol1, mol2, new ChemOptions());
boolean isSub = smsd.isSubstructure();
var mcs = smsd.findMCS();

// Reaction-aware with bond-change scoring
SearchEngine.McsOptions opts = new SearchEngine.McsOptions();
opts.reactionAware = true;
opts.bondChangeAware = true;  // penalise implausible bond transformations
var rxnMcs = SearchEngine.reactionAwareMCS(g1, g2, new ChemOptions(), opts);

// CIP stereo assignment (Rules 1-5, including pseudoasymmetric r/s)
Map<Integer, Character> stereo = CIPAssigner.assignRS(g);
Map<Long, Character> ez = CIPAssigner.assignEZ(g);

// Batch MCS with non-overlap constraints
var mappings = SearchEngine.batchMcsConstrained(queries, targets, new ChemOptions(), 10_000);

Python — Advanced Features

import smsd

# --- Reaction-Aware MCS ---
# Prefer heteroatom-containing mappings for reaction center identification
mapping = smsd.map_reaction_aware(
    "C[S+](CCC(N)C(=O)O)CC1OC(n2cnc3c(N)ncnc32)C(O)C1O",  # SAM
    "SCCC(N)C(=O)OCC1OC(n2cnc3c(N)ncnc32)C(O)C1O"           # SAH
)

# --- Structured MCS Result ---
result = smsd.mcs_result("c1ccccc1", "c1ccc(O)cc1")
print(result.size)          # 6
print(result.tanimoto)      # 0.857
print(result.mcs_smiles)    # "c1ccccc1"
print(result.mapping)       # {0: 0, 1: 1, ...}

# --- Works with any input type ---
# SMILES strings
mcs = smsd.mcs("c1ccccc1", "c1ccc(O)cc1")

# MolGraph objects (pre-parsed, fastest for batch)
g1 = smsd.parse_smiles("c1ccccc1")
g2 = smsd.parse_smiles("c1ccc(O)cc1")
mcs = smsd.mcs(g1, g2)

# Native Mol objects (auto-detected, indices returned in native ordering)
# from rdkit import Chem
# mcs = smsd.mcs(Chem.MolFromSmiles("c1ccccc1"), Chem.MolFromSmiles("c1ccc(O)cc1"))

# --- Fingerprints ---
ecfp4  = smsd.circular_fingerprint("c1ccccc1", radius=2, fp_size=2048)
fcfp4  = smsd.circular_fingerprint("c1ccccc1", radius=2, fp_size=2048, mode="fcfp")
counts = smsd.ecfp_counts("c1ccccc1", radius=2, fp_size=2048)
torsion = smsd.topological_torsion("c1ccccc1", fp_size=2048)
tan    = smsd.tanimoto(ecfp4, ecfp4)

# --- 2D Layout ---
g = smsd.parse_smiles("c1ccc2c(c1)cc1ccccc1c2")  # phenanthrene
coords = smsd.force_directed_layout(g, max_iter=500, target_bond_length=1.5)
coords = smsd.stress_majorisation(g, max_iter=300)
crossings = smsd.reduce_crossings(g, coords, max_iter=2000)

Python — MCS Variants & Batch Operations

import smsd

# --- All MCS variants ---
mcs = smsd.mcs("c1ccccc1", "c1ccc(O)cc1")                     # Connected MCS (default)
mcs = smsd.mcs("c1ccccc1", "c1ccc(O)cc1", connected_only=False) # Disconnected MCS
mcs = smsd.mcs("c1ccccc1", "c1ccc(O)cc1", induced=True)         # Induced MCS
mcs = smsd.mcs("c1ccccc1", "c1ccc(O)cc1", maximize_bonds=True)  # Edge MCS (MCES)

# Find top-N distinct MCS solutions
all_mcs = smsd.find_all_mcs("c1ccccc1", "c1ccc(O)cc1", max_results=5)

# SMARTS-based MCS
mcs = smsd.find_mcs_smarts("[#6]~[#7]", "c1ccc(N)cc1")

# Scaffold MCS (Murcko framework)
scaffold = smsd.find_scaffold_mcs("CC(=O)Oc1ccccc1C(=O)O", "Oc1ccccc1C(=O)O")

# R-group decomposition
rgroups = smsd.decompose_r_groups("c1ccccc1", ["c1ccc(O)cc1", "c1ccc(N)cc1"])

# --- Substructure Search ---
hit = smsd.substructure_search("c1ccccc1", "c1ccc(O)cc1")
all_matches = smsd.find_all_substructures("c1ccccc1", "c1ccc(O)cc1", max_matches=10)

# SMARTS pattern matching
matches = smsd.smarts_search("[OH]", "c1ccc(O)cc1")

# --- Similarity & Screening ---
sim = smsd.tanimoto(
    smsd.circular_fingerprint("CCO", radius=2),
    smsd.circular_fingerprint("CCCO", radius=2)
)
dice = smsd.dice_similarity(
    smsd.ecfp_counts("CCO", radius=2),
    smsd.ecfp_counts("CCCO", radius=2)
)

# --- Chemistry Options ---
# Tautomer-aware with solvent and pH
mcs = smsd.mcs("CC(=O)C", "CC(O)=C",
               tautomer_aware=True, solvent="DMSO", pH=5.0)

# Loose bond matching (FMCS-style)
mcs = smsd.mcs("c1ccccc1", "C1CCCCC1", bond_order_mode="loose")

# --- Canonical SMILES ---
smi = smsd.canonical_smiles("OC(=O)c1ccccc1")   # deterministic canonical form
mcs_smi = smsd.mcs_to_smiles(g1, mapping)        # extract MCS as SMILES

# --- CIP Stereo Assignment ---
g = smsd.parse_smiles("N[C@@H](C)C(=O)O")  # L-alanine
stereo = smsd.assign_rs(g)                   # {1: 'S'}
ez = smsd.assign_ez(smsd.parse_smiles("C/C=C/C"))  # E-2-butene

# --- Native MolGraph I/O ---
g = smsd.parse_smiles("c1ccccc1")
g = smsd.read_molfile("molecule.mol")
mol_block = smsd.write_mol_block(g)
v3000 = smsd.write_mol_block_v3000(g)
smsd.write_molfile(g, "molecule_out.mol", v3000=True)
smsd.export_sdf([g1, g2], "output.sdf")

Publication-Quality Depiction (ACS 1996 Standard)

Zero-dependency SVG renderer — the same specification used by Nature, Science, JACS, and Springer journals. See Examples for full usage guide.

import smsd

# Render any molecule as publication-quality SVG
svg = smsd.depict_svg("CC(=O)Oc1ccccc1C(=O)O")  # aspirin
smsd.save_svg(svg, "aspirin.svg")

# MCS comparison — side-by-side with highlighted matching atoms
mol1 = smsd.parse_smiles("c1ccccc1")
mol2 = smsd.parse_smiles("c1ccc(O)cc1")
mapping = smsd.find_mcs(mol1, mol2)
svg = smsd.depict_pair(mol1, mol2, mapping)
smsd.save_svg(svg, "mcs_comparison.svg")

# Substructure highlighting
svg = smsd.depict_mapping(mol2, mapping)

# Custom styling (all ACS proportions auto-scale from bond_length)
svg = smsd.depict_svg("Cn1cnc2c1c(=O)n(c(=O)n2C)C",  # caffeine
    bond_length=50, width=600, height=400)

# Export to SDF file
mols = [smsd.parse_smiles(s) for s in ["CCO", "c1ccccc1", "CC(=O)O"]]
smsd.export_sdf(mols, "output.sdf")

Features: skeletal formula, Jmol/CPK element colors, asymmetric double bonds, wedge/dash stereo, H-count subscripts, charge superscripts, bond-to-label clipping, aromatic inner circles, atom map numbers.

C++ (Header-Only)

git clone https://github.com/asad/SMSD.git
# Add SMSD/cpp/include to your include path — no other dependencies needed
#include "smsd/smsd.hpp"

auto mol1 = smsd::parseSMILES(
View on GitHub
GitHub Stars48
CategoryDevelopment
Updated22h ago
Forks0

Languages

C++

Security Score

95/100

Audited on Apr 5, 2026

No findings