Prediction of Driver Mutation Heterogeneity in Renal Cancer from Histopathology Slides using Deep Learning

Posted on 2022-06-07 - 05:41 authored by Satwik Rajaram

This collection contains all previously unreleased slide-image datasets and the deep learning models used in the paper "Intratumoral resolution of driver gene mutation heterogeneity in renal cancer using deep learning" by Acosta et al., published in Cancer Research ( This work demonstrates that deep learning (DL) models can predict the intratumor heterogeneity in driver mutation status purely from Hematoxylin and Eosin (H&E) stained slides.

Specifically, the data in this collection was used to train and validate DL models that predict the status of three of the most frequently mutated driver genes (BAP1, PBRM1, and SETD2) in clear cell renal cell carcinoma. The DL models were trained on a large cohort of whole slide images (N=1282, referred to as WSI cohort in the paper/code) and tested on several independent cohorts including the TCGA KIRC (N=363 patients), two human tissue microarray (TMA) cohorts (referred to as TMA1 with 118 patients and TMA2 with 365 patients respectively) and a patient-derived xenograft TMA (referred to as PDX1).

The H&E stained whole slide images for the WSI cohort are deposited in the Training Cohort dataset contained in this collection. The whole slide images for the TMA cohorts (TMA1, TMA2 and PDX1) are deposited in the Testing Cohorts dataset. Finally, all deep learning models used in the paper are deposited in the Deep Learning Models dataset.

Additionally, all code used to perform the analysis can be found at:

The code repository also contains relevant metadata associated with images (e.g., driver mutation status) here:

We note that the TCGA KIRC data set is already publicly available, and the code repository provides a manifest to download it.xenograft (PDX) TMA, which afforded analyses of homo- and hetero-topic interactions of tumor and stroma. The WSI_ Raw_Slide_Images dataset in this collection contains the large cohort (N=1292) of clinical whole slide images. The TMA_Cohorts_Raw_Slide_Images dataset contains the all three tissue microarray cohorts (TMA1, TMA2, PDX1). Additional information and details on the datasets and models can be found in the paper.


Select your citation style and then place your mouse over the citation text to select it.


NIH (P50 CA196516) CPRIT (RP180192) CPRIT (RP180191) NIH (R01CA244579 , R01CA154475 , and R01DK115986) DOD ( W81XWH1910710 ) CPRIT ( RP200233 ). Lyda Hill Department of Bioinformatics

University of Texas Southwestern Medical Center SPORE in Kidney Cancer

National Cancer Institute

Dissecting the interplay between BAP1 and PBRM1 in renal cancer

Cancer Prevention and Research Institute of Texas

Understanding TFE3-mediated Tumorigenesis through Analysis of a Novel, Clinically-Relevant Mouse Model of Translocation Renal Cell Carcinoma

Cancer Prevention and Research Institute of Texas

Vascular image-guided optimization of response (VIGOR) to therapy in kidney cancer

National Cancer Institute


National Cancer Institute

Glomerular Filtration of Sub-nm Gold Nanoparticles

National Institute of Diabetes and Digestive and Kidney Diseases

Advance CT And Fluorescence Imaging Of Kidney Cancers With Glutathione-mediated Contrast Enhancements

Cancer Prevention and Research Institute of Texas

Research Institution(s)

UT Southwestern Medical Center, Mayo Clinic



Usage metrics



Paul Acosta
Vandana Panwar
Vipul Jarmale
Alana Christie
Jay Jasti
Vitaly Margulis
Dinesh Rakheja
John Cheville
Bradley C. Leibovich
Alexander Parker
James Brugarolas
Payal Kapur
need help?