{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step-by-step instructions to assemble HEST data \n",
    "\n",
    "\n",
    "### I. Visium reader\n",
    "This tutorial will guide you to convert a legacy Visium sample into a HEST-compatible object. \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download Visium sample from NCBI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "# As an example, download the files from the following NCBI study:\n",
    "# https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSM6215674)\n",
    "\n",
    "mkdir downloads\n",
    "cd downloads\n",
    "wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6215nnn/GSM6215674/suppl/GSM6215674%5FS13.png.gz\n",
    "gunzip GSM6215674_S13.png.gz\n",
    "wget https://ftp.ncbi.nlm.nih.gov/geo/samples/GSM6215nnn/GSM6215674/suppl/GSM6215674%5FS13%5Ffiltered%5Ffeature%5Fbc%5Fmatrix.h5"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create HESTData object from the image and count matrix \n",
    "\n",
    "The library performs:\n",
    "\n",
    "- Creation of AnnData object\n",
    "- Creation of OpenSlide object \n",
    "- Automatic fiducial detection for spot alignment \n",
    "\n",
    "**Troubleshooting:**\n",
    "\n",
    "If you encounter: `SystemError: ffi_prep_closure(): bad user_data (it seems that the version of the libffi library`. Attempt: `pip install --force-reinstall --no-binary :all: cffi`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hest import VisiumReader\n",
    "\n",
    "fullres_img_path = 'downloads/GSM6215674_S13.png'\n",
    "bc_matrix_path = 'downloads/GSM6215674_S13_filtered_feature_bc_matrix.h5'\n",
    "\n",
    "st = VisiumReader().read(\n",
    "    fullres_img_path, # path to a full res image\n",
    "    bc_matrix_path, # path to filtered_feature_bc_matrix.h5\n",
    "    save_autoalign=True # pass this argument to visualize the fiducial autodetection\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st.save(path='processed')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can also visualize an overlay of the aligned spots on the downscaled WSI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st.save_spatial_plot(save_path='processed')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### When should I provide an alignment file and when should I use autoalignment?\n",
    "\n",
    "#### Step 1: check if a tissue_positions.csv/tissue_position_list.csv already provides a correct alignment\n",
    "\n",
    "In most cases, if a `spatial/` folder containing a `tissue_positions.csv` or `tissue_position_list.csv` is available you don't need any autoalignment/alignment file.\n",
    "\n",
    "Try the following:\n",
    "\n",
    "`st = VisiumReader().read(fullres_img_path, bc_matric_path, spatial_coord_path=spatial_path)`, where `spatial_path` is contains `tissue_positions.csv` or `tissue_position_list.csv`. You can manually inspect the alignment by saving a visualization plot that takes the full resolution image, downscale it and overlays it with the spots (using `st.save_spatial_plot(save_dir)`). If the alignment is incorrect, try step 2.\n",
    "\n",
    "#### Step 2: check if a .json alignment file is provided\n",
    "\n",
    "If a `.json` alignment file is available, try: `VisiumReader().read(fullres_img_path, bc_matric_path, spatial_coord_path=spatial_path, alignment_file_path=align_path)`. You can also omit the `spatial_coord_path` as `VisiumReader().read(fullres_img_path, bc_matric_path, alignment_file_path=align_path)`\n",
    "\n",
    "#### Step 3: attempt auto-alignment\n",
    "\n",
    "If at least 3 corner fiducials are not cropped out and are reasonably visible, you can attempt an autoalignment with `VisiumReader().read(fullres_img_path, bc_matric_path`. (if no spatial folder and no alignment_file_path is provided, it will attempt autoalignment by default, you can also force auto-alignment by passing `autoalign='always'`). \n",
    "\n",
    "### Examples:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hest import VisiumReader\n",
    "\n",
    "fullres_img_path = 'my_path/image.tif'\n",
    "bc_matrix_path = 'my_path/filtered_bc_matrix.h5'\n",
    "spatial_coord_path = 'my_path/spatial'\n",
    "alignment_file_path = 'my_path/alignment.txt'\n",
    "\n",
    "st = VisiumReader().read(\n",
    "    fullres_img_path, # path to a full res image\n",
    "    bc_matrix_path, # path to filtered_feature_bc_matrix.h5\n",
    "    spatial_coord_path=spatial_coord_path # path to a space ranger spatial/ folder containing either a tissue_positions.csv or tissue_position_list.csv\n",
    ")\n",
    "\n",
    "# if no spatial folder is provided, but you have an alignment file\n",
    "st = VisiumReader().read(\n",
    "    fullres_img_path, # path to a full res image\n",
    "    bc_matrix_path, # path to filtered_feature_bc_matrix.h5\n",
    "    alignment_file_path=alignment_file_path # path to a .json alignment file\n",
    ")\n",
    "\n",
    "# if both the alignment file and the spatial folder are missing, attempt auto-alignment\n",
    "st = VisiumReader().read(\n",
    "    fullres_img_path, # path to a full res image\n",
    "    bc_matrix_path, # path to filtered_feature_bc_matrix.h5\n",
    ")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Auto read\n",
    "Given that `visium_dir` contains a full resolution image and all the necessary Visium files such as the `filtered_bc_matrix.h5` and the `spatial` folder, `VisiumReader.auto_read(path)` should be able to automatically read the sample. Prefer `read` for a more fine grain control.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hest import VisiumReader\n",
    "\n",
    "visium_dir = ...\n",
    "\n",
    "# attempt autoread\n",
    "st = VisiumReader().auto_read(visium_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### II. Xenium reader"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Download Xenium sample from 10x genomics website\n",
    "\n",
    "Download the following xenium files and place them in the same directory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "https://www.10xgenomics.com/datasets/human-skin-data-xenium-human-multi-tissue-and-cancer-panel-1-standard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "mkdir downloads\n",
    "cd downloads\n",
    "wget https://cf.10xgenomics.com/samples/xenium/1.9.0/Xenium_V1_hSkin_nondiseased_section_1_FFPE/Xenium_V1_hSkin_nondiseased_section_1_FFPE_outs.zip\n",
    "unzip Xenium_V1_hSkin_nondiseased_section_1_FFPE_outs.zip -d Xenium_V1_hSkin_nondiseased_section_1_FFPE_outs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "cd downloads/Xenium_V1_hSkin_nondiseased_section_1_FFPE_outs\n",
    "wget https://cf.10xgenomics.com/samples/xenium/1.9.0/Xenium_V1_hSkin_nondiseased_section_1_FFPE/Xenium_V1_hSkin_nondiseased_section_1_FFPE_he_image.ome.tif\n",
    "wget https://cf.10xgenomics.com/samples/xenium/1.9.0/Xenium_V1_hSkin_nondiseased_section_1_FFPE/Xenium_V1_hSkin_nondiseased_section_1_FFPE_he_imagealignment.csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hest import XeniumReader\n",
    "\n",
    "xenium_folder_path = 'downloads/Xenium_V1_hSkin_nondiseased_section_1_FFPE_outs'\n",
    "\n",
    "st = XeniumReader().auto_read(xenium_folder_path)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Working with larger than RAM Xenium samples (Xenium 5k)\n",
    "We support larger than RAM transcripts pooling powered by dask. Dask will automatically chunk the data such that it never has to hold the entire transcript dataframe in memory.\n",
    "\n",
    "Dask will attempt to process one partition per thread. To avoid loading large partitions on systems having a low amount of RAM, we advise using a high number of partitions (>100), as well as a single worker and a low number of threads (<4 depending on RAM available).\n",
    "\n",
    "> Note: feel free to open the dask dashboard to visualize workers, partitions and resources (usually on http://localhost:8787)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dask.distributed import LocalCluster, Client\n",
    "\n",
    "cluster = LocalCluster(\n",
    "    \"127.0.0.1:8786\",\n",
    "    n_workers=1, # increase depending on RAM available\n",
    "    memory_limit=\"20GB\", # dask will kill the worker if this is exceeded\n",
    "    threads_per_worker=1, # increase depending on RAM available\n",
    ")\n",
    "client = Client(cluster)\n",
    "print('dashboard is available at: ', client.dashboard_link)\n",
    "\n",
    "st = XeniumReader().auto_read(\n",
    "    xenium_folder_path, \n",
    "    use_dask=True, \n",
    "    nb_partitions=100\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st.save_spatial_plot('./')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st.segment_tissue()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We then compute patches centered around pseudo-visium transcript bins.\n",
    "\n",
    "> Warning: note that patches might be larger than transcript bins. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st.dump_patches('patches', target_patch_size=224, target_pixel_size=0.5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "st.save('save', save_img=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Adding new samples to HEST\n",
    "This section explains how to format new samples for HuggingFace\n",
    "\n",
    "### I. Sample preparation\n",
    "\n",
    "#### 1. Download raw datasets in structured folders\n",
    "\n",
    "Create the following folder structure:\n",
    "```python\n",
    "my_data/\n",
    "    xenium/\n",
    "        dataset_name_1/\n",
    "            subseries1/\n",
    "                sample_1/\n",
    "                    ...\n",
    "                sample_2/\n",
    "                    ...\n",
    "                ...\n",
    "        dataset_name_2/\n",
    "            sample_1/\n",
    "                ...\n",
    "    visium-hd/\n",
    "        ...\n",
    "    visium/\n",
    "        ...\n",
    "```\n",
    "\n",
    "Then, download corresponding files for xenium, visium, visium-hd... See reader examples above or refer to the doc for the list of required files.\n",
    "\n",
    "#### 2. Add metadata rows to CSV\n",
    "Fill columns in the sample CSV, specifically: dataset_title, subseries should match with the folder structure.\n",
    "\n",
    "### II. Sample processing\n",
    "\n",
    "#### 1. Process raw samples\n",
    "\n",
    "Set the path to your sample CSV in `tutorials/scripts/1_process_raw_samples.py`, also modify the memory_limit, we recommend setting dask memory limit to at least 20GB for Xenium 5k, eventhough 30GB is safer.\n",
    "\n",
    "Process and save raw samples by launching `python tutorials/scripts/1_process_raw_samples.py`.\n",
    "\n",
    "Processing Xenium 5k will take time, you can monitor progress on the dask dashboard (usually at `localhost:8787` after launch).\n",
    "\n",
    "\n",
    "#### 2. Check Xenium DAPI/HE alignment and realign if necessary\n",
    "\n",
    "By default, the Xenium platform uses a single affine transform for the whole WSI. For some samples, the resulting alignment might be unsatisfactory.\n",
    "In order to visualize the affine alignment, either:\n",
    "- open the resulting `.geojson` files (in `processed/`) in QuPath (might not work with QuPath >=0.6) \\\n",
    "\\\n",
    "or\n",
    "<br/> \n",
    "<br/> \n",
    "- launch `tutorials/scripts/2_check_xenium_alignment.py`. \n",
    "\n",
    "#### 3. Micro-align DAPI to HE with Valis\n",
    "\n",
    "If the alignment from steps (1-2) is unsatisfactory, we highly recommend using Valis non-rigid micro-alignment for sub-cellular alignment precision. This is crucial for correctly mapping transcripts and cells to H&E.\n",
    "\n",
    "\n",
    "##### a. Register DAPI to HE with Valis\n",
    "\n",
    "We provide a modified Valis version for simplified pythonic use and improved precision, please check-out the original repository [here](https://github.com/MathOnco/valis).\n",
    "\n",
    "Open [3a_microalign_xenium.py](./scripts/3a_microalign_xenium.py), in this example, we use `morphology_focus_0000.ome.tif` as the DAPI slide, feel free to use `morphology_focus.ome.tif`.\n",
    "\n",
    "Monitor alignment quality at `results/{sample_name}/{date}/overlaps/`, check both `_rigid_overlap.png` and `micro_reg.png`.\n",
    "\n",
    "\n",
    "##### b. Warp transcripts, cells and nuclei using Valis and Dask\n",
    "\n",
    "See [3b_microalign_xenium.py](./scripts/3b_microalign_xenium.py) to warp transcripts, nuclei and cells from DAPI to H&E.\n",
    "\n",
    "Then re-run step 3.b, in order to compare quality.\n",
    "\n",
    "#### 4. Segment with CellViT\n",
    "\n",
    "See [4_segment_cellvit.py](./scripts/4_segment_cellvit.py) to segment with CellViT.\n",
    "\n",
    "\n",
    "#### 5. Copy processed files to HEST_results/\n",
    "\n",
    "See [5_copy_processed.py](./scripts/5_copy_processed.py) to copy files to the structure expected by huggingface.\n",
    "\n",
    "\n",
    "#### 6. Generate a new HEST_vX_Y_Z.csv sheet\n",
    "\n",
    "See [6_generate_new_meta.py](./scripts/6_generate_new_meta.py) to create a new HEST_vX_Y_Z.csv. Once created, copy it to the `HEST_results/` folder.\n",
    "\n",
    "\n",
    "#### 7. Upload to HuggingFace\n",
    "\n",
    "See [7_upload_huggingface.py](./scripts/7_upload_huggingface.py) to upload to HuggingFace via a PR.\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "hest",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.16"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}