Safe
A single model for all your molecular design tasks
Install / Use
/learn @datamol-io/SafeREADME
</br>
Overview of SAFE
SAFE is the deep learning molecular representation. It's an encoding leveraging a peculiarity in the decoding schemes of SMILES, to allow representation of molecules as a contiguous sequence of connected fragments. SAFE strings are valid SMILES strings, and thus are able to preserve the same amount of information. The intuitive representation of molecules as an ordered sequence of connected fragments greatly simplifies the following tasks often encountered in molecular design:
- de novo design
- superstructure generation
- scaffold decoration
- motif extension
- linker generation
- scaffold morphing.
The construction of a SAFE strings requires defining a molecular fragmentation algorithm. By default, we use [BRICS], but any other fragmentation algorithm can be used. The image below illustrates the process of building a SAFE string. The resulting string is a valid SMILES that can be read by datamol or RDKit.
</br> <div align="center"> <img src="docs/assets/safe-construction.svg" width="100%"> </div>News 🚀
💥 2024/01/15 💥
- @IanAWatson has a C++ implementation of SAFE in LillyMol that is quite fast and use a custom fragmentation algorithm. Follow the installation instruction on the repo and checkout the docs of the CLI here: docs/Molecule_Tools/SAFE.md
Installation
You can install safe using pip:
pip install safe-mol
You can use conda/mamba:
mamba install -c conda-forge safe-mol
2024/11/22
NOTE: Installation might cause issues like no detection of GPUs (which can be checked by torch.cuda.is_available()) and sengmentation error due to mismatch between installed and driver cuda versions. In that case, follow these steps:
Create a new environment using conda:
conda create -n env_safe python=3.12
conda activate env_safe
Check nvidia driver version on machine by running nvcc --version or nvidia-smi commands
Install pytorch with compatible cuda versions (from https://pytorch.org/get-started/locally/) and safe-mol:
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -c conda-forge safe-mol
Datasets and Models
| Type | Name | Infos | Size | Comment | | ---------------------- | ------------------------------------------------------------------------------ | ---------- | ----- | -------------------- | | Model | datamol-io/safe-gpt | 87M params | 350M | Default model | | Training Dataset | datamol-io/safe-gpt | 1.1B rows | 250GB | Training dataset | | Drug Benchmark Dataset | datamol-io/safe-drugs | 26 rows | 20 kB | Benchmarking dataset |
Usage
Please refer to the documentation, which contains tutorials for getting started with safe and detailed descriptions of the functions provided, as well as an example of how to get started with SAFE-GPT.
API
We summarize some key functions provided by the safe package below.
| Function | Description |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| safe.encode | Translates a SMILES string into its corresponding SAFE string. |
| safe.decode | Translates a SAFE string into its corresponding SMILES string. The SAFE decoder just augment RDKit's Chem.MolFromSmiles with an optional correction argument to take care of missing hydrogen bonds. |
| safe.split | Tokenizes a SAFE string to build a generative model. |
Examples
Translation between SAFE and SMILES representations
import safe
ibuprofen = "CC(Cc1ccc(cc1)C(C(=O)O)C)C"
# SMILES -> SAFE -> SMILES translation
try:
ibuprofen_sf = safe.encode(ibuprofen) # c12ccc3cc1.C3(C)C(=O)O.CC(C)C2
ibuprofen_smi = safe.decode(ibuprofen_sf, canonical=True) # CC(C)Cc1ccc(C(C)C(=O)O)cc1
except safe.EncoderError:
pass
except safe.DecoderError:
pass
ibuprofen_tokens = list(safe.split(ibuprofen_sf))
Training/Finetuning a (new) model
A command line interface is available to train a new model, please run safe-train --help. You can also provide an existing checkpoint to continue training or finetune on you own dataset.
For example:
safe-train --config <path to config> \
--model-path <path to model> \
--tokenizer <path to tokenizer> \
--dataset <path to dataset> \
--num_labels 9 \
--torch_compile True \
--optim "adamw_torch" \
--learning_rate 1e-5 \
--prop_loss_coeff 1e-3 \
--gradient_accumulation_steps 1 \
--output_dir "<path to outputdir>" \
--max_steps 5
References
If you use this repository, please cite the following related paper:
@misc{noutahi2023gotta,
title={Gotta be SAFE: A New Framework for Molecular Design},
author={Emmanuel Noutahi and Cristian Gabellini and Michael Craig and Jonathan S. C Lim and Prudencio Tossou},
year={2023},
eprint={2310.10773},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
License
The training dataset is licensed under CC BY 4.0. See DATA_LICENSE for details. This code base is licensed under the Apache-2.0 license. See LICENSE for details.
Note that the model weights of SAFE-GPT are exclusively licensed for research purposes (CC BY-NC 4.0).
Development lifecycle
Setup dev environment
mamba create -n safe -f env.yml
mamba activate safe
pip install --no-deps -e .
Tests
You can run tests locally with:
pytest
Related Skills
clearshot
Structured screenshot analysis for UI implementation and critique. Analyzes every UI screenshot with a 5×5 spatial grid, full element inventory, and design system extraction — facts and taste together, every time. Escalates to full implementation blueprint when building. Trigger on any digital interface image file (png, jpg, gif, webp — websites, apps, dashboards, mockups, wireframes) or commands like 'analyse this screenshot,' 'rebuild this,' 'match this design,' 'clone this.' Skip for non-UI images (photos, memes, charts) unless the user explicitly wants to build a UI from them. Does NOT trigger on HTML source code, CSS, SVGs, or any code pasted as text.
openpencil
2.2kThe world's first open-source AI-native vector design tool and the first to feature concurrent Agent Teams. Design-as-Code. Turn prompts into UI directly on the live canvas. A modern alternative to Pencil.
HappyColorBlend
HappyColorBlendVibe Project Guidelines Project Overview HappyColorBlendVibe is a Figma plugin for color palette generation with advanced tint/shade blending capabilities. It allows designers to
Flyaro-waffle-app
Waffle Delight - Full Stack MERN Application Rules & Documentation Project Overview A comprehensive waffle delivery application built with MERN stack featuring premium UI/UX, admin management, a
