Carlo Biffi, Giulio Antonelli, Sebastian Bernhofer, Cesare Hassan, Daizen Hirata, Mineo Iwatate, Andreas Maieron, Pietro Salvagnini, Andrea Cherubini

The REAL (Real-world multi-center Endoscopy Annotated video Library) - colon dataset comprises 60 recordings of real-world colonoscopies. These recordings come from four different clinical studies (001 to 004), with each study contributing 15 videos. Compressed folders titled SSS-VVV_frames contain video frames, where SSS indicates the clinical study (001 to 004) and VVV represents the video name (001 to 015).

For each patient/video, several clinical variables have been collected, including the endoscope brand, bowel cleanliness score (BBPS), number of surgically removed colon lesions, and more. This data is stored in the lesion_info.csv file. Each removed lesion has been annotated with a bounding box in each video frame where it appeared, by trained image annotation specialists supervised by expert gastroenterologists. These annotations are available in 60 compressed folders titled SSS-VVV_annotations, each containing the video annotations for its respective video. Polyp information, including histology, size, and anatomical site, has been recorded in the lesion_info.csv file.

For full details on the dataset and to cite this work, please refer to:

Biffi, C., Antonelli, G., Bernhofer, S. et al. REAL-Colon: A dataset for developing real-world AI applications in colonoscopy. Sci Data 11, 539 (2024). Available at:

A GitHub repository containing python code to facilitate the process of downloading and exploring the dataset is available at

Key stats:

- 60 recordings, 15 for each of the 4 centers

- 2757723 total frames

- 132 removed colorectal polyps

- 351264 bounding box annotations

The dataset is composed of the following files:

- 60 compressed folders named `{SSS}-{VVV}_frames` with the frames from each recording

- 60 compressed folders named `{SSS}-{VVV}_annotation` with the annotations from each recordings

- video_info.csv file, a file with the metadata for each video

- lesion_info.csv, a file with the metadata for each lesion

-, a readme file with information about the dataset


Research Institution(s)

Cosmo Intelligent Medical Devices

Competing Interest Statement

C.B., P.S., and A.C. are affiliated with Cosmo Intelligent Medical Devices, the developer of the GI Genius medical device. C.H. is consultant for Medtronic and Fujifilm.