hest.HESTData.HESTData
- class hest.HESTData.HESTData(adata: sc.AnnData, img: np.ndarray | openslide.OpenSlide | CuImage | str, pixel_size: float, meta: Dict = {}, tissue_contours: gpd.GeoDataFrame = None, shapes: List[LazyShapes] = [])
Object representing a (pooled) Spatial Transcriptomics sample along with a full resolution H&E image and associated metadata
Attributes table
List of LazyShapes, i.e. cells, nuclei. |
|
Geodataframe of tissue contours polygons also contains a tissue_id column |
Methods table
|
Dump H&E patches centered around ST spots to a .h5 file. |
Converts ensemble gene IDs using Biomart annotations and filter out genes with no matching Ensembl ID for the current object |
|
|
Read a HEST sample from disk |
|
|
|
Load the full WSI in memory |
|
Save a HESTData object to path as follows: |
|
Save the spatial plot from that STObject |
|
|
|
Backward-compatible alias kept for old tutorials. |
|
Save a visualization of the tissue segmentation on top of the downscaled H&E |
|
Compute tissue mask and stores it in the current HESTData object |
|
Convert a HESTData sample to a scverse SpatialData object. |
Attributes
- HESTData.shapes: List[LazyShapes] = []
List of LazyShapes, i.e. cells, nuclei
- HESTData.tissue_contours
Geodataframe of tissue contours polygons also contains a tissue_id column
Methods
- HESTData.dump_patches(patch_save_dir: str, name: str = 'patches', target_patch_size: int = 224, target_pixel_size: float = 0.5, verbose=0, dump_visualization=True, use_mask=True, threshold=0.15, coords_only=False, qc=False, nb_qc_patches=20)
Dump H&E patches centered around ST spots to a .h5 file.
- Patches are computed such that:
each patch is rescaled to target_pixel_size um/px
a crop of target_patch_size`x`target_patch_size pixels around each ST (pseudo) spot is derived (which coordinates are derived from adata.obsm[‘spatial’])
- Parameters:
patch_save_dir (str) – directory where the .h5 patch file will be saved
name (str, optional) – file will be saved as {name}.h5. Defaults to ‘patches’.
target_patch_size (int, optional) – target patch size in pixels (after scaling to match target_pixel_size). Defaults to 224.
target_pixel_size (float, optional) – target patch pixel size in um/px. Defaults to 0.5.
verbose (int, optional) – verbose. Defaults to 0.
dump_visualization (bool, optional) – whenever to dump a visualization of the patches on top of the downscaled WSI. Defaults to True.
use_mask (bool, optional) – whenever to take into account the tissue mask. Defaults to True.
threshold (float, optional) – Tissue intersection threshold for a patch to be kept. Defaults to 0.15
coords_only (bool, optional) – if false, save patches under the .h5 img key instead of coords only. Defaults to False.
qc (bool, optional) – if true, will save nb_qc_patches random patches as patch_save_dir/qc/dump_patches/patch_vis_qc_{i}_{x}_{y}.jpg (this is useful to quickly check the quality of patches)
nb_qc_patches (int, optional) – number of patches save if qc is True. Defaults to 20.
- HESTData.ensembl_id_to_gene() None
Converts ensemble gene IDs using Biomart annotations and filter out genes with no matching Ensembl ID for the current object
- Parameters:
filter_na (bool) – whenever to filter genes that are not valid ensemble IDs. Defaults to False.
- static HESTData.from_paths(adata_path: str, img: str | np.ndarray | openslide.OpenSlide | CuImage, metrics_path: str, cellvit_path: str = None, tissue_contours_path: str = None) HESTData
Read a HEST sample from disk
- Parameters:
adata_path (str) – path to .h5ad adata file containing ST data the adata object must contain a downscaled image in [‘spatial’][‘ST’][‘images’][‘downscaled_fullres’]
img (Union[str, np.ndarray, openslide.OpenSlide, CuImage]) – path to a full resolution image (if passed as str) or full resolution image corresponding to the ST data, Openslide/CuImage are lazily loaded, use CuImage for GPU accelerated computation
metrics_path (str) – metadata dictionary containing information such as the pixel size, or QC metrics attached to that sample
cellvit_path (str) – path to a cell segmentation file in .geojson or .parquet. Defaults to None.
tissue_contours_path (str) – path to a .geojson tissue contours file. Defaults to None.
- Returns:
HESTData object
- Return type:
- HESTData.get_shapes(name, coordinate_system)
- HESTData.get_tissue_vis()
- HESTData.load_wsi() None
Load the full WSI in memory
- HESTData.save(path: str, save_img=True, pyramidal=True, bigtiff=False, plot_pxl_size=False, save_adata=True, **kwargs)
- Save a HESTData object to path as follows:
aligned_adata.h5ad (contains expressions for each spots + their location on the fullres image + a downscaled version of the fullres image)
metrics.json (contains useful metrics)
downscaled_fullres.jpeg (a downscaled version of the fullres image)
aligned_fullres_HE.tif (the full resolution image)
cells.geojson (cell segmentation if it exists)
Optional: tissue_contours.geojson (contours of the tissue segmentation if it exists)
Optional: tissue_seg_vis.jpg (visualization of tissue contour and holes on downscaled H&E if it exists)
- Parameters:
path (str) – save location
save_img (bool) – whenever to save the image at all (can save a lot of time if set to False). Defaults to True
pyramidal (bool, optional) – whenever to save the full resolution image as pyramidal (can be slow to save, however it’s sometimes necessary for loading large images in QuPath). Defaults to True.
bigtiff (bool, optional) – whenever the bigtiff image is more than 4.1GB. Defaults to False.
save_adata (bool, optional) – whenever to save the genomic data. Defaults to True.
- HESTData.save_spatial_plot(save_path: str, name: str = '', key='total_counts', pl_kwargs={})
Save the spatial plot from that STObject
- Parameters:
save_path (str) – path to a directory where the spatial plot will be saved
name (str) – save plot as {name}spatial_plots.png
key (str) – feature to plot. Default: ‘total_counts’
pl_kwargs (Dict) – arguments for sc.pl.spatial
- HESTData.save_tissue_contours(save_dir: str, name: str) None
- HESTData.save_tissue_seg_pkl(save_dir: str, name: str) None
Backward-compatible alias kept for old tutorials.
- HESTData.save_tissue_vis(save_dir: str, name: str) None
Save a visualization of the tissue segmentation on top of the downscaled H&E
- Parameters:
save_dir (str) – directory where the visualization will be saved
name (str) – file is saved as {save_dir}/{name}_vis.jpg
- HESTData.segment_tissue(fast_mode=False, target_pxl_size=1, patch_size_um=512, model_name='deeplabv3_seg_v4.ckpt', batch_size=8, auto_download=True, num_workers=8, thumbnail_width=2000, method: str = 'deep', weights_dir=None, holes_are_tissue=True, verbose=True) None | ndarray
Compute tissue mask and stores it in the current HESTData object
- Parameters:
fast_mode (bool, optional) – in fast mode the inference is done at 2 um/px instead of 1 um/px, note that the inference pixel size is overwritten by the target_pxl_size argument if != 1. Defaults to False.
target_pxl_size (int, optional) – patches are scaled to this pixel size in um/px for inference. Defaults to 1.
patch_size_um (int, optional) – patch size in um. Defaults to 512.
model_name (str, optional) – model name in HEST/models dir. Defaults to ‘deeplabv3_seg_v4.ckpt’.
batch_size (int, optional) – batch size for inference. Defaults to 8.
auto_download (bool, optional) – whenever to download the model weights automatically if not found. Defaults to True.
num_workers (int, optional) – number of workers for the dataloader during inference. Defaults to 8.
thumbnail_width (int, optional) – size at which otsu segmentation is performed, ignored if method is ‘deep’
method (str, optional) – perform deep learning based segmentation (‘deep’) or otsu based (‘otsu’). Deep-learning based segmentation will be more accurate but a GPU is recommended, ‘otsu’ is faster but less accurate. Defaults to ‘deep’.
weights_dir (str, optional) – directory containing the models, if None will be ../models relative to the src package of hestcore. None
holes_are_tissue (bool, optional) – Whether to treat holes in the mask as tissue (only if method is ‘deep’). Defaults to True.
verbose (bool, optional) – verbose level. Defaults to True.
- Returns:
a geodataframe of the tissue contours, contains a column tissue_id indicating to which tissue the contour belongs to.
- Return type:
gpd.GeoDataFrame
- HESTData.to_spatial_data(fullres: bool = False) SpatialData
Convert a HESTData sample to a scverse SpatialData object. Note that a large part of this function is based on spatialdata-io’s [
from_legacy_anndata](https://spatialdata.scverse.org/projects/io/en/latest/generated/spatialdata_io.experimental.from_legacy_anndata.html) function with some adjustments forHESTData.- Parameters:
fullres (bool, optional) – Includes pyramidal full resolution whole slide image as a
DataTreeobject for those dimensions compatible with Image2DModel’s downsampling. Defaults to False.- Returns:
- scverse SpatialData oobject containing the
hiresandlowresdownsampled versions of the image and their respective coordinate systems.
- scverse SpatialData oobject containing the
- Return type:
SpatialData
Example
>>> from hest import load_hest >>> hest_data = load_hest('../hest_data', id_list=['TENX68']) >>> st = hest_data[0] >>> st.to_spatial_data(fullres=True) SpatialData object ├── Images │ ├── 'ST_downscaled_hires_image': SpatialImage[cyx] (3, 4779, 2586) │ ├── 'ST_downscaled_lowres_image': SpatialImage[cyx] (3, 1000, 541) │ └── 'ST_fullres_image': DataTree[cyx] (3, 38232, 20690), (3, 19116, 10345) ├── Shapes │ └── 'locations': GeoDataFrame shape: (1657, 2) (2D shapes) └── Tables └── 'table': AnnData (1657, 18085) with coordinate systems: ▸ 'ST_downscaled_hires', with elements: ST_downscaled_hires_image (Images), locations (Shapes) ▸ 'ST_downscaled_lowres', with elements: ST_downscaled_lowres_image (Images), locations (Shapes) ▸ 'ST_fullres', with elements: ST_fullres_image (Images), locations (Shapes)