hest.HESTData.HESTData

class hest.HESTData.HESTData(adata: sc.AnnData, img: np.ndarray | openslide.OpenSlide | CuImage | str, pixel_size: float, meta: Dict = {}, tissue_contours: gpd.GeoDataFrame = None, shapes: List[LazyShapes] = [])

Object representing a (pooled) Spatial Transcriptomics sample along with a full resolution H&E image and associated metadata

Attributes table

shapes

List of LazyShapes, i.e. cells, nuclei.

tissue_contours

Geodataframe of tissue contours polygons also contains a tissue_id column

Methods table

dump_patches(patch_save_dir[, name, ...])

Dump H&E patches centered around ST spots to a .h5 file.

ensembl_id_to_gene()

Converts ensemble gene IDs using Biomart annotations and filter out genes with no matching Ensembl ID for the current object

from_paths(adata_path, img, metrics_path[, ...])

Read a HEST sample from disk

get_shapes(name, coordinate_system)

get_tissue_vis()

load_wsi()

Load the full WSI in memory

save(path[, save_img, pyramidal, bigtiff, ...])

Save a HESTData object to path as follows:

save_spatial_plot(save_path[, name, key, ...])

Save the spatial plot from that STObject

save_tissue_contours(save_dir, name)

save_tissue_seg_pkl(save_dir, name)

Backward-compatible alias kept for old tutorials.

save_tissue_vis(save_dir, name)

Save a visualization of the tissue segmentation on top of the downscaled H&E

segment_tissue([fast_mode, target_pxl_size, ...])

Compute tissue mask and stores it in the current HESTData object

to_spatial_data([fullres])

Convert a HESTData sample to a scverse SpatialData object.

Attributes

HESTData.shapes: List[LazyShapes] = []

List of LazyShapes, i.e. cells, nuclei

HESTData.tissue_contours

Geodataframe of tissue contours polygons also contains a tissue_id column

Methods

HESTData.dump_patches(patch_save_dir: str, name: str = 'patches', target_patch_size: int = 224, target_pixel_size: float = 0.5, verbose=0, dump_visualization=True, use_mask=True, threshold=0.15, coords_only=False, qc=False, nb_qc_patches=20)

Dump H&E patches centered around ST spots to a .h5 file.

Patches are computed such that:
  • each patch is rescaled to target_pixel_size um/px

  • a crop of target_patch_size`x`target_patch_size pixels around each ST (pseudo) spot is derived (which coordinates are derived from adata.obsm[‘spatial’])

Parameters:
  • patch_save_dir (str) – directory where the .h5 patch file will be saved

  • name (str, optional) – file will be saved as {name}.h5. Defaults to ‘patches’.

  • target_patch_size (int, optional) – target patch size in pixels (after scaling to match target_pixel_size). Defaults to 224.

  • target_pixel_size (float, optional) – target patch pixel size in um/px. Defaults to 0.5.

  • verbose (int, optional) – verbose. Defaults to 0.

  • dump_visualization (bool, optional) – whenever to dump a visualization of the patches on top of the downscaled WSI. Defaults to True.

  • use_mask (bool, optional) – whenever to take into account the tissue mask. Defaults to True.

  • threshold (float, optional) – Tissue intersection threshold for a patch to be kept. Defaults to 0.15

  • coords_only (bool, optional) – if false, save patches under the .h5 img key instead of coords only. Defaults to False.

  • qc (bool, optional) – if true, will save nb_qc_patches random patches as patch_save_dir/qc/dump_patches/patch_vis_qc_{i}_{x}_{y}.jpg (this is useful to quickly check the quality of patches)

  • nb_qc_patches (int, optional) – number of patches save if qc is True. Defaults to 20.

HESTData.ensembl_id_to_gene() None

Converts ensemble gene IDs using Biomart annotations and filter out genes with no matching Ensembl ID for the current object

Parameters:

filter_na (bool) – whenever to filter genes that are not valid ensemble IDs. Defaults to False.

static HESTData.from_paths(adata_path: str, img: str | np.ndarray | openslide.OpenSlide | CuImage, metrics_path: str, cellvit_path: str = None, tissue_contours_path: str = None) HESTData

Read a HEST sample from disk

Parameters:
  • adata_path (str) – path to .h5ad adata file containing ST data the adata object must contain a downscaled image in [‘spatial’][‘ST’][‘images’][‘downscaled_fullres’]

  • img (Union[str, np.ndarray, openslide.OpenSlide, CuImage]) – path to a full resolution image (if passed as str) or full resolution image corresponding to the ST data, Openslide/CuImage are lazily loaded, use CuImage for GPU accelerated computation

  • metrics_path (str) – metadata dictionary containing information such as the pixel size, or QC metrics attached to that sample

  • cellvit_path (str) – path to a cell segmentation file in .geojson or .parquet. Defaults to None.

  • tissue_contours_path (str) – path to a .geojson tissue contours file. Defaults to None.

Returns:

HESTData object

Return type:

HESTData

HESTData.get_shapes(name, coordinate_system)
HESTData.get_tissue_vis()
HESTData.load_wsi() None

Load the full WSI in memory

HESTData.save(path: str, save_img=True, pyramidal=True, bigtiff=False, plot_pxl_size=False, save_adata=True, **kwargs)
Save a HESTData object to path as follows:
  • aligned_adata.h5ad (contains expressions for each spots + their location on the fullres image + a downscaled version of the fullres image)

  • metrics.json (contains useful metrics)

  • downscaled_fullres.jpeg (a downscaled version of the fullres image)

  • aligned_fullres_HE.tif (the full resolution image)

  • cells.geojson (cell segmentation if it exists)

  • Optional: tissue_contours.geojson (contours of the tissue segmentation if it exists)

  • Optional: tissue_seg_vis.jpg (visualization of tissue contour and holes on downscaled H&E if it exists)

Parameters:
  • path (str) – save location

  • save_img (bool) – whenever to save the image at all (can save a lot of time if set to False). Defaults to True

  • pyramidal (bool, optional) – whenever to save the full resolution image as pyramidal (can be slow to save, however it’s sometimes necessary for loading large images in QuPath). Defaults to True.

  • bigtiff (bool, optional) – whenever the bigtiff image is more than 4.1GB. Defaults to False.

  • save_adata (bool, optional) – whenever to save the genomic data. Defaults to True.

HESTData.save_spatial_plot(save_path: str, name: str = '', key='total_counts', pl_kwargs={})

Save the spatial plot from that STObject

Parameters:
  • save_path (str) – path to a directory where the spatial plot will be saved

  • name (str) – save plot as {name}spatial_plots.png

  • key (str) – feature to plot. Default: ‘total_counts’

  • pl_kwargs (Dict) – arguments for sc.pl.spatial

HESTData.save_tissue_contours(save_dir: str, name: str) None
HESTData.save_tissue_seg_pkl(save_dir: str, name: str) None

Backward-compatible alias kept for old tutorials.

HESTData.save_tissue_vis(save_dir: str, name: str) None

Save a visualization of the tissue segmentation on top of the downscaled H&E

Parameters:
  • save_dir (str) – directory where the visualization will be saved

  • name (str) – file is saved as {save_dir}/{name}_vis.jpg

HESTData.segment_tissue(fast_mode=False, target_pxl_size=1, patch_size_um=512, model_name='deeplabv3_seg_v4.ckpt', batch_size=8, auto_download=True, num_workers=8, thumbnail_width=2000, method: str = 'deep', weights_dir=None, holes_are_tissue=True, verbose=True) None | ndarray

Compute tissue mask and stores it in the current HESTData object

Parameters:
  • fast_mode (bool, optional) – in fast mode the inference is done at 2 um/px instead of 1 um/px, note that the inference pixel size is overwritten by the target_pxl_size argument if != 1. Defaults to False.

  • target_pxl_size (int, optional) – patches are scaled to this pixel size in um/px for inference. Defaults to 1.

  • patch_size_um (int, optional) – patch size in um. Defaults to 512.

  • model_name (str, optional) – model name in HEST/models dir. Defaults to ‘deeplabv3_seg_v4.ckpt’.

  • batch_size (int, optional) – batch size for inference. Defaults to 8.

  • auto_download (bool, optional) – whenever to download the model weights automatically if not found. Defaults to True.

  • num_workers (int, optional) – number of workers for the dataloader during inference. Defaults to 8.

  • thumbnail_width (int, optional) – size at which otsu segmentation is performed, ignored if method is ‘deep’

  • method (str, optional) – perform deep learning based segmentation (‘deep’) or otsu based (‘otsu’). Deep-learning based segmentation will be more accurate but a GPU is recommended, ‘otsu’ is faster but less accurate. Defaults to ‘deep’.

  • weights_dir (str, optional) – directory containing the models, if None will be ../models relative to the src package of hestcore. None

  • holes_are_tissue (bool, optional) – Whether to treat holes in the mask as tissue (only if method is ‘deep’). Defaults to True.

  • verbose (bool, optional) – verbose level. Defaults to True.

Returns:

a geodataframe of the tissue contours, contains a column tissue_id indicating to which tissue the contour belongs to.

Return type:

gpd.GeoDataFrame

HESTData.to_spatial_data(fullres: bool = False) SpatialData

Convert a HESTData sample to a scverse SpatialData object. Note that a large part of this function is based on spatialdata-io’s [from_legacy_anndata](https://spatialdata.scverse.org/projects/io/en/latest/generated/spatialdata_io.experimental.from_legacy_anndata.html) function with some adjustments for HESTData.

Parameters:

fullres (bool, optional) – Includes pyramidal full resolution whole slide image as a DataTree object for those dimensions compatible with Image2DModel’s downsampling. Defaults to False.

Returns:

scverse SpatialData oobject containing the hires and lowres downsampled versions

of the image and their respective coordinate systems.

Return type:

SpatialData

Example

>>> from hest import load_hest
>>> hest_data = load_hest('../hest_data', id_list=['TENX68'])
>>> st = hest_data[0]
>>> st.to_spatial_data(fullres=True)
SpatialData object
├── Images
│     ├── 'ST_downscaled_hires_image': SpatialImage[cyx] (3, 4779, 2586)
│     ├── 'ST_downscaled_lowres_image': SpatialImage[cyx] (3, 1000, 541)
│     └── 'ST_fullres_image': DataTree[cyx] (3, 38232, 20690), (3, 19116, 10345)
├── Shapes
│     └── 'locations': GeoDataFrame shape: (1657, 2) (2D shapes)
└── Tables
    └── 'table': AnnData (1657, 18085)
with coordinate systems:
    ▸ 'ST_downscaled_hires', with elements:
        ST_downscaled_hires_image (Images), locations (Shapes)
    ▸ 'ST_downscaled_lowres', with elements:
        ST_downscaled_lowres_image (Images), locations (Shapes)
    ▸ 'ST_fullres', with elements:
        ST_fullres_image (Images), locations (Shapes)