hest.get_k_genes
- hest.get_k_genes(adata_list: List[sc.AnnData], k: int, criteria: str, save_dir: str = None, min_cells_pct=0.1) List[str]
Get the top-k genes according to some criteria in common genes across multiple samples. This function was used to derive genes of interest for the HEST benchmark.
- Parameters:
adata_list (List[sc.AnnData]) – list of scanpy AnnData containing gene expressions in adata.to_df()
k (int) – number of most genes to return
criteria (str) – criteria for the k genes to return - ‘mean’: return the k genes with the largest mean expression across samples - ‘var’: return the k genes with the largest expression variance across samples
save_dir (str, optional) – genes are saved as json array to this path if not None. Defaults to None.
min_cells_pct (float) – filter out genes that are expressed in less than min_cells_pct% of the spots for each slide
- Returns:
top-k genes according to the criteria
- Return type:
List[str]
Examples
>>> # Find genes for interest for HEST benchmark >>> import scanpy as sc >>> from hest import get_k_genes >>> ad1 = sc.read_h5ad("TENX118.h5ad") >>> ad2 = sc.read_h5ad("TENX141.h5ad") >>> genes = get_k_genes([ad1, ad2], k=50, criteria="var") >>> print(len(genes))