Overview

Each task involves predicting the expression levels of the 50 most variable genes from 112×112 μm H&E-stained image patches centered on each spatial transcriptomics spot. The tasks are formulated as multivariate regression problems.

Task ID

Oncotree

Number of Samples

Technology

Sample ID

Task 1

IDC

4

Xenium

TENX95, TENX99, NCBI783, NCBI785

Task 2

PRAD

23

Visium

MEND139~MEND162

Task 3

PAAD

3

Xenium

TENX116, TENX126, TENX140

Task 4

SKCM

2

Xenium

TENX115, TENX117

Task 5

COAD

4

Xenium

TENX111, TENX147, TENX148, TENX149

Task 6

READ

4

Visium

ZEN36, ZEN40, ZEN48, ZEN49

Task 7

ccRCC

24

Visium

INT1~INT24

Task 8

LUAD

2

Xenium

TENX118, TENX141

Task 9

IDC-LymphNode

4

Visium

NCBI681, NCBI682, NCBI683, NCBI684

Reproducing HEST-Benchmark results

  • Ensure that HEST has been properly installed (see README, Installation)

  • Install benchmark dependencies via pip install -e ".[benchmark]"

  • Benchmark data (patches, h5ad, splits, genes) are downloaded automatically

  • Patch encoders are loaded through TRIDENT

Important (model access): Many foundation models used by TRIDENT are hosted on Hugging Face and may be gated. You may need to request access for each model you want to benchmark and authenticate locally.

Typical setup:

  1. Request access on each model page (for example: UNI, CONCH/CONCHv1.5, GigaPath, Virchow, H-Optimus, etc.).

  2. Login with your Hugging Face token in the benchmark environment:

huggingface-cli login

Once authenticated, TRIDENT handles model loading directly from the encoder name in the benchmark config.

Launching HEST-bench via CLI

%%bash
export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libffi.so.7
python ../src/hest/bench/benchmark.py --config ../bench_config/bench_config.yaml

Benchmarking your own model with HEST-Benchmark

Below is a complete, minimal example using a Hugging Face Vision Transformer as a custom encoder.

We wrap the model with:

  • a forward() method returning a patch embedding (CLS token),

  • an eval_transforms callable,

  • a precision attribute.

This aligns with TRIDENT-style encoder behavior and can be passed directly to benchmark(...).

from hest.bench import benchmark
import torch
from torchvision import transforms
from transformers import AutoImageProcessor, AutoModel


class HFViTCustomEncoder(torch.nn.Module):
    """Minimal TRIDENT-style wrapper around a Hugging Face vision model."""

    def __init__(self, hf_id="google/vit-base-patch16-224-in21k", precision=torch.float32):
        super().__init__()
        self.hf_id = hf_id
        self.processor = AutoImageProcessor.from_pretrained(hf_id)
        self.model = AutoModel.from_pretrained(hf_id)
        self.precision = precision

        # Reuse model-recommended preprocessing parameters when available.
        mean = self.processor.image_mean if self.processor.image_mean is not None else (0.5, 0.5, 0.5)
        std = self.processor.image_std if self.processor.image_std is not None else (0.5, 0.5, 0.5)
        size = self.processor.size
        target = size["shortest_edge"] if isinstance(size, dict) and "shortest_edge" in size else 224

        self.eval_transforms = transforms.Compose([
            transforms.Resize(target),
            transforms.CenterCrop(target),
            transforms.ToTensor(),
            transforms.Normalize(mean=mean, std=std),
        ])

    def forward(self, x):
        outputs = self.model(pixel_values=x)
        # Return CLS token as patch embedding.
        return outputs.last_hidden_state[:, 0, :]


PATH_TO_CONFIG = "../bench_config/bench_config.yaml"
custom_encoder = HFViTCustomEncoder(hf_id="google/vit-base-patch16-224-in21k", precision=torch.float32)

# This wrapper exposes `eval_transforms` and `precision`, so it can be passed directly.
benchmark(
    custom_encoder,
    None,
    None,
    config=PATH_TO_CONFIG,
    exp_code="custom_hf_vit",
)

Reproducing: finding genes of interest for HEST benchmark

import scanpy as sc
from hest import get_k_genes

# TODO add full paths to samples of interest here
sample_paths = ['TENX118.h5ad', 'TENX141.h5ad']

ad_list = [ sc.read_h5ad(sample_path) for sample_path in sample_paths]
genes = get_k_genes(ad_list, k=50, criteria="var", min_cells_pct=0.1)
print(len(genes))