Dataset for the Development of Statistical Methods to Assess AI Model Performance: Images and Pathologist Annotations from the HTT Pilot Study | Center for Devices and Radiological Health

Catalog of Regulatory Science Tools to Help Assess New Medical Devices

This regulatory science tool (RST) is a dataset of whole slide images and pathologist annotations for use in the development of new statistical methods.

Technical Description

This dataset of images and pathologist annotations can be used for the development of new statistical methods for assessing the performance of AI models that estimate quantitative biomarkers where the reference standard comes from multiple experts. The dataset is the result of the pilot study of the High Throughput Truthing (HTT) project, which aims to create a validation dataset for machine learning algorithms that perform the stromal tumor-infiltrating lymphocyte (sTILs) assessment. The data consists of sTIL annotations in regions of interest (ROIs) from whole slide images (WSIs) of hematoxylin and eosin (H&E)-stained breast cancer tissue. The pathologist annotations were collected in batches and are intended for use in multi-reader multi-case agreement analyses. In addition to the dataset, the associated R package contains four utility functions that can assist with statistical analysis of this or similar data.

The annotations are publicly available in an R package (https://github.com/DIDSR/HTT). The images are available from a server hosted by Emory University (https://wolf.cci.emory.edu/camic/htt/) after creating an account and requesting access to the images by email (https://didsr.github.io/HTT.home/assets/pages/team). The availability of the images and viewer are subject to institutional resources.

The dataset consists of 7,898 sTILs density estimates acquired from 640 unique ROIs in 64 WSIs. Slides were scanned on a Hamamatsu Nanozoomer 2.0-RS C10730 series scanner at 40x equivalent magnification (0.23 µm/pixel). For each ROI annotation, three annotations were collected from the annotator: ROI label, evaluable status, and sTILs density. The ROI label is a qualitative variable that describes the tissue within the ROI as either appropriate (“Intra-Tumoral Stroma”, “Invasive Margin”) or not appropriate (“Tumor with No Intervening Stroma”, “Other Regions”) for the sTILs assessment. This information is also captured in the summary variable “evaluable”, which indicates if the ROI is evaluable given the International TILs Working

Group recommendations [references below]. The sTILs density is defined as the percentage of area of tumor-infiltrating lymphocytes within the area of tumor-associated stroma. Additional data elements in this dataset include further annotation information and reader (annotator) information.

More details regarding all elements can be found in the user manual in the “pilotHTT_RST” section (https://github.com/DIDSR/HTT/blob/main/inst/manual/HTT_2.0.1.pdf).

R. Salgado et al., “The evaluation of tumor-infiltrating lymphocytes (TILs) in breast cancer: recommendations by an International TILs Working Group 2014,” Ann. Oncol., vol. 26, no. 2, pp. 259–271, Feb. 2015, doi: 10.1093/annonc/mdu450.

Intended Purpose

The purpose of this regulatory science tool is to provide data for the development of new statistical analysis methods which may be used to assess the performance of AI models that estimate quantitative pathology biomarkers where multiple experts provide the reference standard or comparable biomarker measurements.

Testing

A description of the annotation data, including the data-collection study design and methods, can be found in this peer-reviewed manuscript:

K. Elfer et al., “Reproducible Reporting of the Collection and Evaluation of Annotations for Artificial Intelligence Models,” Mod Pathol, vol. 37, p. 100439, Jan. 2024, doi: 10.1016/j.modpat.2024.100439.

There are additional manuscripts that are based on the dataset intended to help users characterize the data:

Dudgeon SN, Wen S, Hanna MG, Gupta R, Amgad M, Sheth M, Marble H, Huang R, Herrmann MD, Szu CH, Tong D, Werness B, Szu E, Larsimont D, Madabhushi A, Hytopoulos E, Chen W, Singh R, Hart SN, Sharma A, Saltz J, Salgado R, Gallas BD. A pathologist-annotated dataset for validating artificial intelligence: A project description and pilot study. J Pathol Inform 2021;12:45-45. doi: 10.4103/jpi.jpi_83_20. PMID: 34881099; PMCID: PMC8609287.
1. Describes source of data, study design, and data collection methodology.
Garcia V, Elfer K, Peeters DJE, Ehinger A, Werness B, Ly A, Li X, Hanna MG, Blenman KRM, Salgado R, Gallas BD. Development of Training Materials for Pathologists to Provide Machine Learning Validation Data of Tumor-Infiltrating Lymphocytes in Breast Cancer. Cancers (Basel). 2022 May 17;14(10):2467. doi: 10.3390/cancers14102467. PMID: 35626070; PMCID: PMC9139395.
1. Provides analysis of variability of dataset’s annotations with focus on development of pathologist training materials to reduce pathologist variability in the project’s future pivotal study.
Elfer K, Dudgeon S, Garcia V, Blenman K, Hytopoulos E, Wen S, Li X, Ly A, Werness B, Sheth MS, Amgad M, Gupta R, Saltz J, Hanna MG, Ehinger A, Peeters D, Salgado R, Gallas BD. Pilot study to evaluate tools to collect pathologist annotations for validating machine learning algorithms. J Med Imaging (Bellingham). 2022 Jul;9(4):047501. doi: 10.1117/1.JMI.9.4.047501. Epub 2022 Jul 27. PMID: 35911208; PMCID: PMC9326105.
1. Provides alternate analysis of the variability of the dataset’s annotations with focus on discussion across data collection platforms.

Limitations

This tool was generated through the collection of pathologist annotations during a pilot study. As a result, there is high variability among pathologist annotations. However, it provides a complete dataset that can be used for agreement method development efforts.

Supporting Documentation

Tool Website

Images: https://wolf.cci.emory.edu/camic/htt/
Annotations: https://github.com/DIDSR/HTT
Related information: https://didsr.github.io/HTT.home/

User Manual

https://github.com/DIDSR/HTT/tree/main/inst/manual/RST-pilotHTT_userManual.pdf

Related FDA Product Codes (This is not a comprehensive list)

QPN: Software algorithm device to assist users in digital pathology.
POK: Computer-Assisted Diagnostic Software For Lesions Suspicious For Cancer.
QIH: Automated radiological image processing software.
LLZ: system, image processing, radiological.

Current Research

The goal of the High-Throughput Truthing (HTT) project is to produce a validation dataset established by pathologist annotations for artificial intelligence algorithms analyzing digital scans of pathology slides: data (images + annotations). We will share this final validation dataset as another regulatory science tool to become a high-value public resource that can be used in AI/ML algorithm submissions and guide others to develop quality validation datasets.
- https://didsr.github.io/HTT.home/assets/pages/whatIsHTT
- https://didsr.github.io/HTT.home/assets/pages/publications

Contact

RST_CDRH@fda.hhs.gov

Tool Reference

RST Reference Number: RST26DP01.01
Date of Publication: 05/01/2026
Recommended Citation: U.S. Food and Drug Administration. (2026). Dataset for the Development of Statistical Methods to Assess AI Model Performance: Images and Pathologist Annotations from the HTT Pilot Study (RST26DP01.01). https://cdrh-rst.fda.gov/dataset-development-statistical-methods-assess-ai-model-performance-images-and-pathologist