Jun 3, 2025

HISTAI Dataset: A Landmark Open-Source Resource for Computational Pathology

HistAI is proud to announce the public release of a landmark open-source dataset designed to accelerate innovation in digital and computational pathology. This unprecedented collection includes 112,000 Whole Slide Images (WSIs) derived from 46,000 unique pathology cases, encompassing a broad spectrum of organ systems, disease states, and tissue types.

The dataset is accompanied by fully digital pathology reports, creating a rich, structured resource for researchers, clinicians, and developers of artificial intelligence (AI) tools.

Unparalleled Scale and Diversity

The HISTAI dataset sets a new benchmark for scale and comprehensiveness in the field:

  • 112,000 WSIs at various magnifications

  • Representing 46,000 cases, including routine, chronic, and oncological pathologies

  • Includes 20,000 Immunohistochemistry (IHC) slides

  • Wide range of tissue types: oncology, chronic disease, and healthy/normal specimens

  • Fully digitized diagnostic reports and structured metadata

This release addresses a critical bottleneck in AI development for pathology - the lack of large-scale, diverse, and expertly annotated image data. By pairing high-resolution WSIs with detailed clinical context, this dataset offers unmatched opportunities for building and validating generalizable machine learning models.

A Transformative Resource for the Community

“This dataset is a major step forward for computational pathology,” said Dmitry Nechaev, Chief AI-Scientist at HistAI. “It not only provides scale but also diversity across tissue types and diagnostic categories. Most importantly, it’s open to the entire research community.”

Whether you're developing diagnostic algorithms, exploring disease biomarkers, or building foundational models for multi-modal healthcare AI, the HistAI dataset provides a robust and reproducible foundation.

Access and Collaboration

The full dataset is now available for download and research use under an open-source license. HistAI encourages collaboration with academic, clinical, and industry partners to further expand and utilize this resource.

Download HISTAI dataset from Hugging Face
📩 Contact us for collaborations: models@hist.ai

*HistAI CELLDX and the models available on the platform is not for primary diagnosis. For Research Use only!

*HistAI CELLDX and the models available on the platform is not for primary diagnosis. For Research Use only!