[Competition] Multi-Omics-Based Drug Sensitivity Estimation
This project was completed as Team MOUM, in collaboration with my colleagues for the 6th YAI-CON.
💊 Multi-Omics-Based Drug Sensitivity Estimation
6th YAICON — Spring 2025 · Second Prize Code Link
📌 Overview
Accurately predicting how a cancer cell line responds to a drug (IC-50) remains an open challenge: the outcome depends not only on the drug’s chemistry but also on the cell’s intricate molecular profile.
We present an end-to-end deep-learning pipeline that fuses three omics layers (GEP, MUT, CNV) with advanced drug-embedding models (ChemBERTa & graph-based GNN) and a bi-directional cross-attention mechanism. Our approach improves upon the 2025 paper “Anticancer drug response prediction integrating multi-omics pathway-based difference features and multiple deep-learning techniques.”
🌱 Why we built this
Baseline limitation | Our upgrade |
---|---|
Drug representation lacks structural cues (only SMILES RNN) | Two interchangeable drug encoders • ChemBERTa — language-style SMILES embedding • BGD — graph transformer on molecular graphs |
Shallow “context attention” can’t model complex drug-omics interplay | Deep, bi-directional cross-attention (drug ↔ each omics) giving 6 interaction maps |
🔬 Data
Source | Entities | Notes |
---|---|---|
CCLE | 688 cell lines | GEP (log₂ TPM + 1), MUT (0/1/2), CNV (log₂ discrete) |
GDSC2 | 233 drugs | Matched IC-50 ground-truth |
MSigDB – 619 KEGG pathways | – | Used to derive pathway-difference statistics (Mann-Whitney U / χ²-G) |
🛠 Methodology
-
Omics pathway features
For every cell line × pathway, compute statistical separation between “in-pathway” and “out-pathway” genes → 3 feature matrices of size 1 × 619 (GEP, MUT, CNV). -
Drug embeddings
Choose one encoder at training timeEncoder Key idea Output shape ChemBERTa Tokenise SMILES, pad to 256, take final hidden CLS 1 × 384 BGD Graph transformer over atoms/bonds + DeepChem node feats 1 × 256 -
Cross-attention block
• Drug (Q) ↔ Omics (K,V) for each omics type, two directions (drug→omics, omics→drug) → 6 attention layers in total.
• Concatenate pooled outputs → stacked MLP → IC-50 regression.
Implementation diagram above: original (left) vs. modified cross-attention (right).
📂 Code & Repos
Repository | Description |
---|---|
Drug-Sensitivity-Prediction-Pipeline | Main training pipeline, model zoo, experiment scripts |
DGL-Life-sci | Custom extensions for graph-based drug encoders |
Model zoo snapshots
1. ChemBERTa Drug Embedding
📊 Results
Figure 1. Drug embedding comparison (Original vs. Modified attention)
Figure 2. Cross-attention variant performance
Figure 3. Pearson r on cell-blinded split (scatter)
Key takeaway : Improvement over the baseline when switching to ChemBERTa/BGD-Model + cross-attention. Full metrics in
/results/
.
👥 Contributors & Acknowledgments
With gratitude to @yumin-c, I’m distilling this work into a focused, application-driven research project in collaboration with several colleagues, including my original MOUM team members: @bgduck33 and @whdsbwn.
You can find the full names of all MOUM team members on GitHub: Code Link.
댓글남기기