DeepDrugDomain
DeepDrugDomain: A versatile Python toolkit for streamlined preprocessing and accurate prediction of drug-target interactions and binding affinities, leveraging deep learning for advancing computational drug discovery.
Install / Use
/learn @yazdanimehdi/DeepDrugDomainREADME
DeepDrugDomain
DeepDrugDomain is a comprehensive Python toolkit aimed at simplifying and accelerating the process of drug-target interaction (DTI) and drug-target affinity (DTA) prediction using deep learning. With a flexible preprocessing pipeline and modular design, DeepDrugDomain supports innovative research and development in computational drug discovery.
Features
DeepDrugDomain is built with a suite of powerful features designed to empower researchers in the field of computational drug discovery. Below are some of the core capabilities that make DeepDrugDomain an indispensable tool:
Extensive Preprocessing Capabilities
- Comprehensive Preparation Tools: Streamline your data preparation with our extensive suite of preprocessing tools.
- Support for Diverse Data: Cater to a wide array of data formats prevalent in drug discovery, ensuring compatibility and ease of integration.
Modular Design for Flexibility
- Customizable Components: Adapt the toolkit to meet your research needs with highly customizable components.
- Simplified Model Creation: Our modular design principle makes model creation and experimentation a straightforward process, saving time and reducing complexity.
Stateful Evaluation Metrics
- Consistent Performance Tracking: Integrated metrics provide a consistent framework for tracking the performance of models.
- Reproducibility and Accuracy: These metrics are integral in ensuring the reproducibility of results and the accuracy of predictions.
Custom Activation Functions
- Integration of Novel Functions: Introduce and integrate custom activation functions with ease to enhance your models.
- Boost to Model Adaptability: This feature allows models to be more adaptable and effective in handling complex drug discovery tasks.
Comprehensive Task Support
- Support for Core Tasks: DeepDrugDomain comes with built-in support for key tasks such as drug-target interaction (DTI) and drug-target affinity (DTA).
- Tailored for Drug Discovery: The toolkit is crafted to meet the unique challenges faced in drug discovery, providing tailored support that drives innovation and progress.
Facilitation of Model Augmentation
- Decorator Design: Augment models seamlessly with new inputs, enhancing the toolkit's utility and application scope.
- Accuracy Improvement: With just a line of code, improve the accuracy of existing models, streamlining the refinement process.
Benchmarking
- Built-in Benchmarks: Leverage the pre-implemented benchmark models to gauge performance and validate outcomes.
- Customizability: Tailor the architecture of implemented models to meet specific research requirements, offering unparalleled flexibility.
Expandability
- Continuous Development: Designed with the future in mind, DeepDrugDomain encourages and facilitates continuous expansion and incorporation of new features.
- Custom Instantiation: Choose to instantiate components in their default configuration or customize them for a more tailored experience.
Ease of Use
- Simplified Drug Discovery: Remove the complexity from drug discovery tasks. DeepDrugDomain comes with a comprehensive suite of tools for easy preprocessing of any generic data.
- User-Friendly Model Training: Whether you're defining new models or utilizing pre-implemented ones, the process is straightforward and user-friendly, requiring minimal setup.
By integrating these advanced features, DeepDrugDomain stands out as a toolkit that not only meets the current demands of drug discovery but also adapts to its future challenges and opportunities.
Installation
For now you can use this environments for usage and development,
conda create --name deepdrugdomain python=3.11
conda activate deepdrugdomain
pip install dgl -f https://data.dgl.ai/wheels/repo.html
conda install -c conda-forge rdkit
pip install git+https://github.com/yazdanimehdi/deepdrugdomain.git
Quick Start
import deepdrugdomain as ddd
# setting device on GPU if available, else CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = ModelFactory.create("attentionsitedti")
preprocesses = ddd.data.PreprocessingList(model.default_preprocess(
"SMILES", "pdb_id", "Label"))
dataset = ddd.data.DatasetFactory.create(
"human", file_paths="data/human/", preprocesses=preprocesses)
datasets = dataset(split_method="random_split",
frac=[0.8, 0.1, 0.1], seed=seed, sample=0.1)
collate_fn = model.collate
data_loader_train = DataLoader(
datasets[0], batch_size=64, shuffle=True, num_workers=0, pin_memory=True, drop_last=True, collate_fn=collate_fn)
data_loader_val = DataLoader(datasets[1], drop_last=False, batch_size=32,
num_workers=4, pin_memory=False, collate_fn=collate_fn)
data_loader_test = DataLoader(datasets[2], drop_last=False, batch_size=32,
num_workers=4, pin_memory=False, collate_fn=collate_fn)
criterion = torch.nn.BCELoss()
optimizer = OptimizerFactory.create(
"adam", model.parameters(), lr=1e-3, weight_decay=0.0)
scheduler = None
device = torch.device("cpu")
model.to(device)
train_evaluator = ddd.metrics.Evaluator(["accuracy_score"], threshold=0.5)
test_evaluator = ddd.metrics.Evaluator(
["accuracy_score", "f1_score", "auc", "precision_score", "recall_score"], threshold=0.5)
epochs = 3000
accum_iter = 1
print(model.evaluate(data_loader_val, device,
criterion, evaluator=test_evaluator))
for epoch in range(epochs):
print(f"Epoch {epoch}:")
model.train_one_epoch(data_loader_train, device, criterion,
optimizer, num_epochs=200, scheduler=scheduler, evaluator=train_evaluator, grad_accum_steps=accum_iter)
print(model.evaluate(data_loader_val, device,
criterion, evaluator=test_evaluator))
print(model.evaluate(data_loader_test, device,
criterion, evaluator=test_evaluator))
Examples
The example folder contains a collection of scripts and notebooks demonstrating various capabilities of DeepDrugDomain. Below is an overview of what each example covers:
Training Different Models
- attentionsitedti.ipynb: Brief explanation of training AttentionSiteDTI with custom configurations and model tampering in this Jupyter Notebook.
Other Functionalities
Supported Preprocessings
The following table lists the preprocessing methods supported by the package, detailing the data conversion, settings options, and the models that use them:
Ligand Preprocessing Methods
| Method | Converts From | Converts To | Settings Options | Used in Models | |------------------------|-------------------|--------------------|--------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------| | smiles_to_encoding | SMILES | Encoding Tensor | one_hot: bool, embedding_dim: Optional[int], max_sequence_length: Optional[int], replacement_dict: Dict[str, str], token_regex: Optional[str], from_set: Optional[Dict[str, int]] | DrugVQA, AttentionDTA | | smile_to_graph | SMILES | Graph | node_featurizer: Callable, edge_featurizer: Optional[Callable], consider_hydrogen: bool, fragment: bool, hops: int | AMMVF, AttentionSiteDTI, FragXsiteDTI, CSDTI | | smile_to_fingerprint | SMILES | Fingerprint | method: str, Refer to Supported Fingerprinting Methods table for detailed settings. | AMMVF |
For detailed information on fingerprinting methods, please see the Supported Fingerprinting Methods section.
Supported Fingerprinting Methods
| Method Name | Description | Settings Options | |-----------------|---------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------| | RDKit | Converts SMILES to RDKit fingerprints, capturing molecular structure information. | radius: Optional[int], nBits: Optional[int] | | Morgan | Generates circular fingerprints, representing the environment of each atom in a molecule. | radius: Optional[int], nBits: Optional[int] | | Daylight | Traditional method to encode molecular features, focusing on specific substructure patterns. | nBits: Optional[int] | | ErG | Extended reduced graph-based approach, emphasizing molecular topology. | nBits: Optional[int], atom_dict: Optional[AtomDictType], bond_dict: Optional[BondDictType] | | RDKit2D | Two-dimensional variant of RDKit, detailing planar molecular structures. | nBits: Optional[int], atom_dict: Optional[AtomDictType], bond_dict: Optional[BondDictType]
Related Skills
YC-Killer
2.7kA library of enterprise-grade AI agents designed to democratize artificial intelligence and provide free, open-source alternatives to overvalued Y Combinator startups. If you are excited about democratizing AI access & AI agents, please star ⭐️ this repository and use the link in the readme to join our open source AI research team.
best-practices-researcher
The most comprehensive Claude Code skills registry | Web Search: https://skills-registry-web.vercel.app
groundhog
400Groundhog's primary purpose is to teach people how Cursor and all these other coding agents work under the hood. If you understand how these coding assistants work from first principles, then you can drive these tools harder (or perhaps make your own!).
last30days-skill
19.5kAI agent skill that researches any topic across Reddit, X, YouTube, HN, Polymarket, and the web - then synthesizes a grounded summary
