[Competition] Multi-Omics-Based Drug Sensitivity Estimation


This project was completed as Team MOUM, in collaboration with my colleagues for the 6th YAI-CON.

💊 Multi-Omics-Based Drug Sensitivity Estimation

6th YAICON — Spring 2025 · Second Prize Code Link


📌 Overview

Accurately predicting how a cancer cell line responds to a drug (IC-50) remains an open challenge: the outcome depends not only on the drug’s chemistry but also on the cell’s intricate molecular profile.
We present an end-to-end deep-learning pipeline that fuses three omics layers (GEP, MUT, CNV) with advanced drug-embedding models (ChemBERTa & graph-based GNN) and a bi-directional cross-attention mechanism. Our approach improves upon the 2025 paper “Anticancer drug response prediction integrating multi-omics pathway-based difference features and multiple deep-learning techniques.”


🌱 Why we built this

Baseline limitation Our upgrade
Drug representation lacks structural cues (only SMILES RNN) Two interchangeable drug encoders
ChemBERTa — language-style SMILES embedding
BGD — graph transformer on molecular graphs
Shallow “context attention” can’t model complex drug-omics interplay Deep, bi-directional cross-attention (drug ↔ each omics) giving 6 interaction maps

🔬 Data

Source Entities Notes
CCLE 688 cell lines GEP (log₂ TPM + 1), MUT (0/1/2), CNV (log₂ discrete)
GDSC2 233 drugs Matched IC-50 ground-truth
MSigDB – 619 KEGG pathways Used to derive pathway-difference statistics (Mann-Whitney U / χ²-G)

🛠 Methodology

  1. Omics pathway features
    For every cell line × pathway, compute statistical separation between “in-pathway” and “out-pathway” genes → 3 feature matrices of size 1 × 619 (GEP, MUT, CNV).

  2. Drug embeddings
    Choose one encoder at training time

    Encoder Key idea Output shape
    ChemBERTa Tokenise SMILES, pad to 256, take final hidden CLS 1 × 384
    BGD Graph transformer over atoms/bonds + DeepChem node feats 1 × 256
  3. Cross-attention block
    • Drug (Q) ↔ Omics (K,V) for each omics type, two directions (drug→omics, omics→drug) → 6 attention layers in total.
    • Concatenate pooled outputs → stacked MLP → IC-50 regression.

Image
Implementation diagram above: original (left) vs. modified cross-attention (right).


📂 Code & Repos

Repository Description
Drug-Sensitivity-Prediction-Pipeline Main training pipeline, model zoo, experiment scripts
DGL-Life-sci Custom extensions for graph-based drug encoders
Model zoo snapshots 1. ChemBERTa Drug Embedding

ChemBERTa

2. Graph-Transformer Drug Embedding

Graph


📊 Results

Figure 1
Figure 1. Drug embedding comparison (Original vs. Modified attention)

Figure 2
Figure 2. Cross-attention variant performance

Figure 3
Figure 3. Pearson r on cell-blinded split (scatter)

Key takeaway : Improvement over the baseline when switching to ChemBERTa/BGD-Model + cross-attention. Full metrics in /results/.


👥 Contributors & Acknowledgments

With gratitude to @yumin-c, I’m distilling this work into a focused, application-driven research project in collaboration with several colleagues, including my original MOUM team members: @bgduck33 and @whdsbwn.

You can find the full names of all MOUM team members on GitHub: Code Link.


댓글남기기