{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Step-by-step instructions to interact HEST-1k \n", "\n", "This tutorial will guide you to:\n", "\n", "- Read HEST data\n", "- Visualized the spots over a downscaled version of the WSI\n", "- Saving HESTData into Pyramidal Tif and anndata\n", "\n", "\n", "This tutorial assumes that the user has already downloaded HEST-1k (in its entirety or partially). " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Read HEST" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hest import iter_hest\n", "\n", "# Iterate through a subset of hest\n", "for st in iter_hest('../hest_data', id_list=['TENX105']):\n", "\n", " # ST (adata):\n", " adata = st.adata\n", " print('\\n* Scanpy adata:')\n", " print(adata)\n", "\n", " # WSI:\n", " wsi = st.wsi\n", " print('\\n* WSI:')\n", " print(wsi)\n", " \n", " # Shapes:\n", " shapes = st.shapes\n", " print('\\n* Shapes:')\n", " print(shapes)\n", " \n", " # Tissue segmentation\n", " tissue_contours = st.tissue_contours\n", " print('\\n* Tissue contours:')\n", " print(tissue_contours)\n", " \n", " # Conversion to SpatialData\n", " sdata = st.to_spatial_data()\n", " print('\\n* SpatialData conversion:')\n", " print(sdata)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`st.adata` is a spatial scanpy object containing the following:\n", "#### Observations (st.adata.obs)\n", "- `in_tissue`: Indicator if the observation is within the tissue (`in_tissue` comes from the initial Visium/Xenium run and might not be accurate, prefer the segmentation obtained by st.segment_tissue() instead).\n", "- `pxl_col_in_fullres`: Pixel column position of the patch/spot centroid in the full resolution image.\n", "- `pxl_row_in_fullres`: Pixel row position of the patch/spot centroid in the full resolution image.\n", "- `array_col`: Patch/spot column position in the array.\n", "- `array_row`: Patch/spot row position in the array.\n", "- `n_counts`: Number of counts for each observation.\n", "- `n_genes_by_counts`: Number of genes detected by counts in each observation.\n", "- `log1p_n_genes_by_counts`: Log-transformed number of genes detected by counts.\n", "- `total_counts`: Total counts per observation.\n", "- `log1p_total_counts`: Log-transformed total counts.\n", "- `pct_counts_in_top_50_genes`: Percentage of counts in the top 50 genes.\n", "- `pct_counts_in_top_100_genes`: Percentage of counts in the top 100 genes.\n", "- `pct_counts_in_top_200_genes`: Percentage of counts in the top 200 genes.\n", "- `pct_counts_in_top_500_genes`: Percentage of counts in the top 500 genes.\n", "- `total_counts_mito`: Total mitochondrial counts per observation. (note that this field might not be accurate)\n", "- `log1p_total_counts_mito`: Log-transformed total mitochondrial counts. (note that this field might not be accurate)\n", "- `pct_counts_mito`: Percentage of counts that are mitochondrial. (note that this field might not be accurate)\n", "\n", "#### Variables (st.adata.var)\n", "- `n_cells_by_counts`: Number of cells detected by counts for each variable.\n", "- `mean_counts`: Mean counts per variable.\n", "- `log1p_mean_counts`: Log-transformed mean counts.\n", "- `pct_dropout_by_counts`: Percentage of dropout events by counts.\n", "- `total_counts`: Total counts per variable.\n", "- `log1p_total_counts`: Log-transformed total counts.\n", "- `mito`: Indicator if the gene is mitochondrial. (note that this field might not be accurate)\n", "\n", "#### Unstructured (st.adata.uns)\n", "- `spatial`: Contains a downscaled version of the full resolution image in `st.adata.uns['spatial']['ST']['images']['downscaled_fullres']`\n", "\n", "#### Observation-wise Multidimensional (st.adata.obsm)\n", "- `spatial`: Pixel coordinates of spots/patches centroids on the full resolution image. (first column is x axis, second column is y axis)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Visualizing the spots over a downscaled version of the WSI" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# visualize the spots over a downscaled version of the full resolution image\n", "save_dir = '.'\n", "st.save_spatial_plot(save_dir)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Saving to pyramidal tiff and h5\n", "Save `HESTData` object to `.tiff` + expression `.h5ad` and a metadata file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Warning saving a large image to pyramidal tiff (>1GB) can be slow on a hard drive.\n", "st.save(save_dir, pyramidal=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Tissue segmentation\n", "\n", "We integrated 2 tissue segmentation methods:\n", "\n", "- Image processing-based using Otsu thresholding \n", "- Deep learning-based using a fine-tuned DeepLabV3 ResNet50\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "save_dir = '.'\n", "\n", "name = 'tissue_seg_otsu'\n", "st.segment_tissue(method='otsu') \n", "st.save_tissue_seg_pkl(save_dir, name)\n", "\n", "name = 'tissue_seg_deep'\n", "st.segment_tissue(method='deep') \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing the Patching/pooling size\n", "\n", "#### Patching\n", "You can change the size of patches around the spots with `dump_patches`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# directory where the patch .h5 will be saved\n", "patch_save_dir = './processed'\n", "new_patch_size = 224\n", "\n", "\n", "st.dump_patches(\n", " patch_save_dir,\n", " name='demo',\n", " target_patch_size=new_patch_size,\n", " target_pixel_size=0.5 # pixel size of the patches in um/px after rescaling\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing the Pooling size\n", "\n", "You can change the pooling size of Xenium/Visium HD samples stored in `HESTData.adata` with the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hest.readers import pool_transcripts_xenium\n", "from hest import iter_hest\n", "\n", "new_spot_size = 200\n", "new_patch_size = 224\n", "patch_save_dir = './processed'\n", "\n", "# Iterate through a subset of hest\n", "for st in iter_hest('../hest_data', id_list=['TENX105'], load_transcripts=True,):\n", " print(st.transcript_df)\n", "\n", " st.adata = pool_transcripts_xenium(\n", " st.transcript_df, # Feel free to convert st.transcript_df to a Dask DataFrame if you are working with limited RAM.\n", " st.pixel_size, \n", " key_x='he_x',\n", " key_y='he_y',\n", " spot_size_um=new_spot_size\n", " )\n", "\n", "\n", " # We change the spots, so we need to re-extract the patches\n", " st.dump_patches(\n", " patch_save_dir,\n", " name='demo',\n", " target_patch_size=new_patch_size, # target patch size in 224\n", " target_pixel_size=0.5 # pixel size of the patches in um/px after rescaling\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Batch effect visualization" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from hest.batch_effect import filter_hest_stromal_housekeeping, get_silhouette_score, plot_umap\n", "\n", "meta_df = pd.read_csv('../assets/HEST_v1_2_0.csv')\n", "\n", "\n", "meta_df = meta_df[\n", " (meta_df['st_technology'] == 'Visium') & \n", " (meta_df['disease_state'] == 'Cancer') & \n", " (meta_df['oncotree_code'] == 'IDC') & \n", " (meta_df['species'] == 'Homo sapiens')\n", "]\n", "\n", "# We filter spots, such that:\n", "# - we only keep the most stable housekeeping genes (based on https://housekeeping.unicamp.br/?download)\n", "# - we only keep the spots under the stroma (based on CellViT segmentation)\n", "adata_list = filter_hest_stromal_housekeeping(meta_df, hest_dir='../hest_data')\n", "\n", "labels = meta_df['id'].values\n", "score = get_silhouette_score(adata_list, labels=labels)\n", "# (-1: strong overlap between clusters, can be indicative of a small batch effect, 1: poor overlap between clusters, can be indicative of a strong batch effect)\n", "print(f'Silhouette score is {score}')\n", "\n", "\n", "plot_umap(adata_list, labels, 'batch_whole_tissue.png', umap_kwargs={'min_dist': 0.6})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading transcripts (for Xenium only)\n", "The transcript dataframe contains the following columns:\n", "- `cell_id` (default Xenium Explorer flag): id of cell containing this transcript\n", "- `he_x`, `he_y`: x, y coordinate or transcript on the HE WSI in pixels (aligned with a non-rigid transformation)\n", "- `feature_name`: name of transcript\n", "- `qv` (default Xenium Explorer flag): Phred-scaled quality value (Q-Score) estimating the probability of incorrect call" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from hest import iter_hest\n", "\n", "# Iterate through a subset of hest\n", "for st in iter_hest('../hest_data', id_list=['TENX105'], load_transcripts=True):\n", " print(st.transcript_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Warning:** The segmentation contained in 'xenium_seg/' (accessed with st.get_shapes('xenium_cell', 'he') or st.get_shapes('xenium_nucleus', 'he')) is different than the CellViT segmentation. It was obtained by the Xenium instrument directly by segmenting the DAPI stained image, the DAPI segmented cells were then aligned to H&E with a non rigid transform." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Linking transcripts with cells\n", "Each transcript is associated with a cell id (derived by the Xenium analyzer directly). See column `cell_id` in the transcript dataframe." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "transcripts = st.transcript_df\n", "shapes = st.get_shapes('xenium_cell', 'he').shapes\n", "\n", "cell_id = 1\n", "\n", "# Print the transcripts associated with a particular cell\n", "print(transcripts[transcripts['cell_id'] == cell_id])\n", "print(shapes[shapes.index == cell_id])\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "HEST", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 2 }