[Competition] Multi-Omics-Based Drug Sensitivity Estimation

This project was completed as Team MOUM, in collaboration with my colleagues for the 6th YAI-CON.

💊 Multi-Omics-Based Drug Sensitivity Estimation

6th YAICON — Spring 2025 · Second Prize Page Link

📌 Overview

Accurately predicting how a cancer cell line responds to a drug (IC-50) remains an open challenge: the outcome depends not only on the drug’s chemistry but also on the cell’s intricate molecular profile.
We present an end-to-end deep-learning pipeline that fuses three omics layers (GEP, MUT, CNV) with advanced drug-embedding models (ChemBERTa & graph-based GNN) and a bi-directional cross-attention mechanism. Our approach improves upon the 2025 paper “Anticancer drug response prediction integrating multi-omics pathway-based difference features and multiple deep-learning techniques.”

🌱 Why we built this

Baseline limitation	Our upgrade
Drug representation lacks structural cues (only SMILES RNN)	Two interchangeable drug encoders • ChemBERTa — language-style SMILES embedding • BGD — graph transformer on molecular graphs
Shallow “context attention” can’t model complex drug-omics interplay	Deep, bi-directional cross-attention (drug ↔ each omics) giving 6 interaction maps

🔬 Data

Source	Entities	Notes
CCLE	688 cell lines	GEP (log₂ TPM + 1), MUT (0/1/2), CNV (log₂ discrete)
GDSC2	233 drugs	Matched IC-50 ground-truth
MSigDB – 619 KEGG pathways	–	Used to derive pathway-difference statistics (Mann-Whitney U / χ²-G)

🛠 Methodology

Omics pathway features
For every cell line × pathway, compute statistical separation between “in-pathway” and “out-pathway” genes → 3 feature matrices of size 1 × 619 (GEP, MUT, CNV).

Drug embeddings
Choose one encoder at training time

Encoder	Key idea	Output shape
ChemBERTa	Tokenise SMILES, pad to 256, take final hidden CLS	1 × 384
BGD	Graph transformer over atoms/bonds + DeepChem node feats	1 × 256

Cross-attention block
• Drug (Q) ↔ Omics (K,V) for each omics type, two directions (drug→omics, omics→drug) → 6 attention layers in total.
• Concatenate pooled outputs → stacked MLP → IC-50 regression.

Implementation diagram above: original (left) vs. modified cross-attention (right).

📂 Code & Repos

Repository	Description
Drug-Sensitivity-Prediction-Pipeline	Main training pipeline, model zoo, experiment scripts
DGL-Life-sci	Custom extensions for graph-based drug encoders

Model zoo snapshots

1. ChemBERTa Drug Embedding

ChemBERTa

2. Graph-Transformer Drug Embedding

Graph

📊 Results

Figure 1. Drug embedding comparison (Original vs. Modified attention)

Figure 2. Cross-attention variant performance

Figure 3. Pearson r on cell-blinded split (scatter)

Key takeaway : Improvement over the baseline when switching to ChemBERTa/BGD-Model + cross-attention. Full metrics in /results/.

👥 Contributors & Acknowledgments

With gratitude to @yumin-c, I’m distilling this work into a focused, application-driven research project in collaboration with several colleagues, including my original MOUM team members: @bgduck33 and @whdsbwn.

You can find the full names of all MOUM team members on GitHub: Code Link.

Twitter Facebook LinkedIn

DaRe_jin

[Competition] Multi-Omics-Based Drug Sensitivity Estimation

💊 Multi-Omics-Based Drug Sensitivity Estimation

📌 Overview

🌱 Why we built this

🔬 Data

🛠 Methodology

📂 Code & Repos

📊 Results

👥 Contributors & Acknowledgments

공유하기

댓글남기기

최근 포스팅 목록

[Research] Game-based Shadow-induced Forgetting

섬세한 솔직함

장기 흐름

격을 갖춤