GEMCONT: Genetics-based Multimodal Contrastive Learning Enhances Phenotypic embeddings and Boosts Genetic Discovery

Sens D, Shilova L, Dalca AV, Schnabel JA, Casale FP (2026)


Publication Type: Conference contribution

Publication year: 2026

Publisher: ML Research Press

Book Volume: 315

Pages Range: 3443-3463

Conference Proceedings Title: Proceedings of Machine Learning Research

Event location: Chientan, TWN

Abstract

Genetic variation provides stable, time-invariant markers of disease risk and can therefore reveal upstream mechanisms underlying complex traits. Genome-wide association studies (GWAS) have identified thousands of loci associated with disease, yet most remain difficult to interpret because the intermediate phenotypes linking genotype to disease are unknown. Here, we address the question whether disease-associated genetic loci can be directly used to extract such risk-related features from quantitative phenotypes, including functional tests and medical imaging. We introduce GEMCONT (GEnetics-based Multimodal CONTrastive Learning), a multimodal contrastive learning framework that aligns genotype and phenotype representations in a shared latent space. Unlike task-agnostic multimodal pretraining, GEMCONT is disease-conditioned: GWAS-informed variant panels act as targeted supervision to learn risk-relevant imaging embeddings. To reflect the weak, additive nature of genetic effects, it employs a linear genetic encoder alongside a deep phenotypic encoder. We validate GEMCONT in controlled simulations and apply it to two real-world settings: spirometry curves for asthma and retinal fundus images for glaucoma. In both, GEMCONT improves disease risk prediction and enhances recovery of genetic associations compared with standard unsupervised or polygenic risk–based models. Altogether, our results demonstrate that incorporating stable genetic supervision into multimodal representation learning enables the extraction of genetically informed risk traits, refining disease phenotypes and improving the interpretability of association studies.

Involved external institutions

How to cite

APA:

Sens, D., Shilova, L., Dalca, A.V., Schnabel, J.A., & Casale, F.P. (2026). GEMCONT: Genetics-based Multimodal Contrastive Learning Enhances Phenotypic embeddings and Boosts Genetic Discovery. In Yuankai Huo, Mingchen Gao, Chang-Fu Kuo, Yueming Jin, Ruining Deng (Eds.), Proceedings of Machine Learning Research (pp. 3443-3463). Chientan, TWN: ML Research Press.

MLA:

Sens, Daniel, et al. "GEMCONT: Genetics-based Multimodal Contrastive Learning Enhances Phenotypic embeddings and Boosts Genetic Discovery." Proceedings of the 9th International Conference on Medical Imaging with Deep Learning, MIDL 2026, Chientan, TWN Ed. Yuankai Huo, Mingchen Gao, Chang-Fu Kuo, Yueming Jin, Ruining Deng, ML Research Press, 2026. 3443-3463.

BibTeX: Download