123 files

REAL-colon dataset

Version 2 2024-02-29, 22:39
Version 1 2023-03-11, 00:01
posted on 2024-02-29, 22:39 authored by Carlo BiffiCarlo Biffi, Giulio Antonelli, Sebastian Bernhofer, Cesare Hassan, Daizen Hirata, Mineo Iwatate, Andreas Maieron, Pietro SalvagniniPietro Salvagnini, Andrea CherubiniAndrea Cherubini

The REAL (Real-world multi-center Endoscopy Annotated video Library) - colon dataset comprises 60 recordings of real-world colonoscopies. These recordings come from four different clinical studies (001 to 004), with each study contributing 15 videos. Compressed folders titled SSS-VVV_frames contain video frames, where SSS indicates the clinical study (001 to 004) and VVV represents the video name (001 to 015).

For each patient/video, several clinical variables have been collected, including the endoscope brand, bowel cleanliness score (BBPS), number of surgically removed colon lesions, and more. This data is stored in the lesion_info.csv file. Each removed lesion has been annotated with a bounding box in each video frame where it appeared, by trained image annotation specialists supervised by expert gastroenterologists. These annotations are available in 60 compressed folders titled SSS-VVV_annotations, each containing the video annotations for its respective video. Polyp information, including histology, size, and anatomical site, has been recorded in the lesion_info.csv file.

For full details on the dataset and to cite this work, please refer to:

Biffi, C., Antonelli, G., Bernhofer, S. et al. REAL-Colon: A dataset for developing real-world AI applications in colonoscopy. Sci Data 11, 539 (2024). Available at:

A GitHub repository containing python code to facilitate the process of downloading and exploring the dataset is available at

Key stats:

- 60 recordings, 15 for each of the 4 centers

- 2757723 total frames

- 132 removed colorectal polyps

- 351264 bounding box annotations

The dataset is composed of the following files:

- 60 compressed folders named `{SSS}-{VVV}_frames` with the frames from each recording

- 60 compressed folders named `{SSS}-{VVV}_annotation` with the annotations from each recordings

- video_info.csv file, a file with the metadata for each video

- lesion_info.csv, a file with the metadata for each lesion

-, a readme file with information about the dataset


Research Institution(s)

Cosmo Intelligent Medical Devices

I confirm there is no human personally identifiable information in the files or description shared

  • Yes

I confirm the files and description shared may be publicly distributed under the license selected

  • Yes

Competing Interest Statement

C.B., P.S., and A.C. are affiliated with Cosmo Intelligent Medical Devices, the developer of the GI Genius medical device. C.H. is consultant for Medtronic and Fujifilm.