{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!-- ## Step-by-step instructions to run HEST-Benchmark\n",
    "\n",
    "This tutorial will guide you to:\n",
    "\n",
    "- **Reproduce** HEST-Benchmark results provided in the paper (Random Forest regression and Ridge regression models)\n",
    "- Benchmark your **own** model -->\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "Each task involves predicting the expression levels of the 50 most variable genes from 112×112 μm H&E-stained image patches centered on each spatial transcriptomics spot. The tasks are formulated as multivariate regression problems.\n",
    "\n",
    "| **Task ID** | **Oncotree** | **Number of Samples** | **Technology** | **Sample ID** |\n",
    "|-------------|--------------|-----------------------|----------------|---------------|\n",
    "| Task 1      | IDC          | 4                     | Xenium         | TENX95, TENX99, NCBI783, NCBI785      |\n",
    "| Task 2      | PRAD         | 23                    | Visium         | MEND139~MEND162      |\n",
    "| Task 3      | PAAD         | 3                     | Xenium         | TENX116, TENX126, TENX140      |\n",
    "| Task 4      | SKCM         | 2                     | Xenium         | TENX115, TENX117      |\n",
    "| Task 5      | COAD         | 4                     | Xenium         | TENX111, TENX147, TENX148, TENX149      |\n",
    "| Task 6      | READ         | 4                     | Visium         | ZEN36, ZEN40, ZEN48, ZEN49      |\n",
    "| Task 7      | ccRCC        | 24                    | Visium         | INT1~INT24      |\n",
    "| Task 8      | LUAD         | 2                     | Xenium         | TENX118, TENX141      |\n",
    "| Task 9     | IDC-LymphNode | 4                    | Visium         | NCBI681, NCBI682, NCBI683, NCBI684     |\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reproducing HEST-Benchmark results \n",
    "\n",
    "- Ensure that HEST has been properly installed (see README, Installation)\n",
    "- Automatic download preprocessed patches, h5ad and gene targets  \n",
    "- Automatic download of publicly available patch encoders\n",
    "\n",
    "**Note:** Not all public foundation models can be shared due to licensing issues. We provide model-specific instructions that users can follow to access weights:\n",
    "\n",
    "#### CONCH installation (model + weights request)\n",
    "\n",
    "1. Request access to the model weights from the Huggingface model page [here](https://huggingface.co/MahmoodLab/CONCH).\n",
    "\n",
    "2. Install the CONCH PyTorch model:\n",
    "\n",
    "```\n",
    "pip install git+https://github.com/Mahmoodlab/CONCH.git\n",
    "```\n",
    "\n",
    "#### UNI weights request\n",
    "\n",
    "Request access to the model weights from the Huggingface model page [here](https://huggingface.co/MahmoodLab/UNI).\n",
    "\n",
    "#### GigaPath weights request\n",
    "\n",
    "Request access to the model weights from the Huggingface model page  [here](https://huggingface.co/prov-gigapath/prov-gigapath).\n",
    "\n",
    "#### Remedis (weights only)\n",
    "\n",
    "1. Request access to the model weights from the Huggingface model page  [here](https://physionet.org/content/medical-ai-research-foundation/1.0.0/).\n",
    "2. Download the model weights (`path-152x2-remedis-m_torch.pth`) and place them in `{weights_root}/fm_v1/remedis/path-152x2-remedis-m_torch.pth` where `weights_root` is specified in the config. You can also directly modify the path of remedis in `{PATH_TO_HEST/src/hest/bench/local_ckpts.json}`.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Launching HEST-bench via CLI"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libffi.so.7\n",
    "python ../src/hest/bench/benchmark.py --config ../bench_config/bench_config.yaml"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Benchmarking your own model with HEST-Benchmark \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from hest.bench import benchmark\n",
    "import torch\n",
    "\n",
    "PATH_TO_CONFIG = .. # path to `bench_config.yaml`\n",
    "model = .. # PyTorch model (torch.nn.Module)\n",
    "model_transforms = .. # transforms to apply during inference (torchvision.transforms.Compose)\n",
    "precision = torch.float32\n",
    "\n",
    "benchmark(        \n",
    "    model, \n",
    "    model_transforms,\n",
    "    precision,\n",
    "    config=PATH_TO_CONFIG, \n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reproducing: finding genes of interest for HEST benchmark"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import scanpy as sc\n",
    "from hest import get_k_genes\n",
    "\n",
    "# TODO add full paths to samples of interest here\n",
    "sample_paths = ['TENX118.h5ad', 'TENX141.h5ad']\n",
    "\n",
    "ad_list = [ sc.read_h5ad(sample_path) for sample_path in sample_paths]\n",
    "genes = get_k_genes(ad_list, k=50, criteria=\"var\", min_cells_pct=0.1)\n",
    "print(len(genes))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "hest",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}