API Reference#
Pipeline#
Run the complete flashscenic pipeline. |
- flashscenic.pipeline.run_flashscenic(exp_matrix: numpy.ndarray, gene_names: List[str], species: str = 'human', *, datasource: str = 'scenic', version: str = 'v10', cache_dir: Optional[str] = None, tf_list_path: Optional[str] = None, ranking_db_paths: Optional[List[str]] = None, motif_annotation_path: Optional[str] = None, grn_n_steps: int = 1000, grn_sparsity_threshold: float = 1.5, module_k: int = 50, module_percentile_thresholds: tuple = (75,), module_top_n_per_target: tuple = (5, 10, 50), module_min_targets: int = 20, module_min_fraction: float = None, module_include_tf: bool = True, pruning_rank_threshold: int = 5000, pruning_auc_threshold: float = 0.05, pruning_nes_threshold: float = 3.0, pruning_min_genes: int = 0, pruning_merge_strategy: str = 'union', annotation_motif_similarity_fdr: float = 0.001, annotation_orthologous_identity: float = 0.0, aucell_k: Optional[int] = None, aucell_auc_threshold: float = 0.05, aucell_batch_size: int = 32, device: str = 'cuda', seed: Optional[int] = None, verbose: bool = True) Dict[source]#
Run the complete flashscenic pipeline.
Performs GRN inference (RegDiffusion), module filtering, cisTarget pruning, and AUCell scoring. Stops before dimensionality reduction / visualization.
Parameters
exp_matrix : np.ndarray Expression matrix of shape (n_cells, n_genes). Should be log-transformed and (optionally) subset to highly variable genes. gene_names : list of str Gene names corresponding to columns of exp_matrix. Length must equal exp_matrix.shape[1]. species : str, default=’human’ Species for TF list and ranking databases. One of ‘human’, ‘mouse’, ‘drosophila’.
datasource : str, default=’scenic’ Data source for resource downloads. version : str, default=’v10’ Motif collection version. cache_dir : str or None Cache directory for downloaded resources. Defaults to
./flashscenic_data/. tf_list_path : str or None Path to a custom TF list file. Overrides downloaded TF list. ranking_db_paths : list of str or None Paths to custom ranking database .feather files. Overrides downloaded databases. motif_annotation_path : str or None Path to a custom motif annotation .tbl file. Overrides downloaded annotation.grn_n_steps : int, default=1000 Number of training steps for RegDiffusion. grn_sparsity_threshold : float, default=1.5 Edges below this weight are zeroed. Higher = sparser network.
module_k : int, default=50 Top target genes per TF for module creation. module_percentile_thresholds : tuple of int, default=(75,) Percentile thresholds for percentile-based modules. Each value creates a module type keeping targets above that global weight percentile. Empty tuple to skip. module_top_n_per_target : tuple of int, default=(5, 10, 50) N values for top-N-per-target modules. For each N, finds each target gene’s top N strongest regulators, then regroups by TF. Empty tuple to skip. module_min_targets : int, default=20 Minimum target genes for a TF module to be retained. module_min_fraction : float, default=0.8 Minimum fraction of targets required. Matches pySCENIC’s 80% rule. module_include_tf : bool, default=True Include TF itself in its own module.
pruning_rank_threshold : int, default=5000 Maximum rank for cisTarget recovery curve. pruning_auc_threshold : float, default=0.05 Fraction of genome for cisTarget AUC. pruning_nes_threshold : float, default=3.0 NES threshold for motif enrichment. pruning_min_genes : int, default=0 Minimum genes per regulon after pruning. pruning_merge_strategy : str, default=’union’ How to merge regulons from multiple databases (‘union’ or ‘best’).
annotation_motif_similarity_fdr : float, default=0.001 Maximum FDR for motif similarity filtering. annotation_orthologous_identity : float, default=0.0 Minimum orthologous identity threshold.
aucell_k : int or None Top k targets for AUCell scoring. Defaults to module_k if None. aucell_auc_threshold : float, default=0.05 Fraction of genome for AUCell AUC calculation. aucell_batch_size : int, default=32 Batch size for AUCell computation.
device : str, default=’cuda’ PyTorch device (‘cuda’ or ‘cpu’). seed : int or None Random seed for reproducibility. verbose : bool, default=True Print progress messages.
Returns
dict -
'auc_scores': np.ndarray of shape (n_cells, n_regulons) -'regulon_names': list of regulon name strings -'regulons': list of regulon dicts from cisTarget -'regulon_adj': np.ndarray of shape (n_regulons, n_genes) -'parameters': dict of all parameters usedRaises
ImportError If regdiffusion is not installed. ValueError If no TFs survive filtering or no regulons survive pruning.
Examples
import flashscenic as fs result = fs.run_flashscenic(exp_matrix, gene_names, species=’human’) auc_scores = result[‘auc_scores’] # (n_cells, n_regulons)
Data Download#
Download cistarget resource files required for the flashscenic pipeline. |
|
List available resource sets, optionally filtered. |
|
Paths to downloaded resource files. |
- flashscenic.data.download_data(species: str = 'human', version: str = 'v10', datasource: str = 'scenic', cache_dir: Optional[str] = None, force: bool = False) flashscenic.data.DownloadedResources[source]#
Download cistarget resource files required for the flashscenic pipeline.
Parameters
species : str, default=’human’ Species to download resources for. One of ‘human’, ‘mouse’, ‘drosophila’. version : str, default=’v10’ Motif collection version. One of ‘v10’ (recommended), ‘v9’. datasource : str, default=’scenic’ Data source identifier. Currently only ‘scenic’ (Aertslab) is supported. Architecture supports adding alternative sources. cache_dir : str or None, default=None Local directory to store downloaded files. If None, defaults to
./flashscenic_data/. force : bool, default=False If True, re-download files even if they already exist locally.Returns
DownloadedResources Dataclass with Path objects pointing to each downloaded file.
Raises
ValueError If the species/version/datasource combination is not recognized. ConnectionError If download fails after retries.
- flashscenic.data.list_available_resources(datasource: Optional[str] = None, species: Optional[str] = None, version: Optional[str] = None) List[flashscenic.data.ResourceSet][source]#
List available resource sets, optionally filtered.
Parameters
datasource : str or None Filter by data source (e.g., ‘scenic’). species : str or None Filter by species (e.g., ‘human’, ‘mouse’, ‘drosophila’). version : str or None Filter by version (e.g., ‘v10’, ‘v9’).
Returns
list of ResourceSet Matching resource sets.
AUCell#
- flashscenic.aucell.get_aucell(exp_array, adj_array, k=50, auc_threshold=0.05, device='cuda', batch_size=32, seed=None)[source]#
Fully vectorized pySCENIC-equivalent AUCell calculation.
Uses the actual number of target genes per TF/regulon (not fixed k) to match pySCENIC behavior. When adj_array has weighted entries, uses weights for AUC.
Args: exp_array (np.ndarray): Expression matrix (n_cells x n_genes) adj_array (np.ndarray): Adjacency matrix (n_tfs x n_genes), can be binary or weighted k (int): Max target genes per TF (pads shorter regulons). Default is 50. auc_threshold (float): Fraction of genome for AUC calculation. Default is 0.05. device (str): Device, ‘cpu’ or ‘cuda’. Default is ‘cuda’. batch_size (int): Batch size for processing cells. Default is 32. seed (int): Random seed for tie-breaking. Default is None.
Returns: np.ndarray: AUCell scores matrix of shape (n_cells, n_TFs)
cisTarget#
GPU-accelerated cisTarget pruning with support for single or multiple databases. |
|
Compute recovery curves and AUCs for all motifs given module genes. |
|
Compute Normalized Enrichment Scores (NES) from AUC values. |
|
Perform cisTarget pruning for a single module. |
|
Lightweight motif annotation storage without pandas. |
|
Filter enriched motifs by annotations (CPU implementation). |
- class CisTargetPruner(rank_threshold: int = 5000, auc_threshold: float = 0.05, nes_threshold: float = 3.0, device: str = 'cuda', min_genes_per_regulon: int = 0, merge_strategy: str = 'union')#
GPU-accelerated cisTarget pruning with support for single or multiple databases.
Example (single database): ```python pruner = CisTargetPruner(device=’cuda’) pruner.load_database(‘rankings.feather’) pruner.load_annotations(‘motifs.tbl’, filter_for_annotation=True)
# Prune with tensor input result = pruner.prune(module_gene_indices) ```
Example (multiple databases): ```python pruner = CisTargetPruner(device=’cuda’) pruner.load_database([‘db_500bp.feather’, ‘db_10kb.feather’]) pruner.load_annotations(‘motifs.tbl’)
# Prune modules across all databases regulon_info = pruner.prune_modules(modules, tf_names, gene_names) ```
Initialization
- load_database(paths: Union[str, List[str]], database_names: Optional[Union[str, List[str]]] = None)#
Load ranking database(s) from feather file(s).
Args: paths: Path to .feather ranking database, or list of paths for multiple databases database_names: Optional name(s) for database(s) (defaults to filename(s))
- load_annotations(annotation_file: str, filter_for_annotation: bool = True, motif_similarity_fdr: float = 0.001, orthologous_identity_threshold: float = 0.0)#
Load motif annotations and enable filtering.
Args: annotation_file: Path to motif annotation TSV file filter_for_annotation: If True, filter enriched motifs to keep only those with annotations motif_similarity_fdr: Maximum FDR threshold (default: 0.001) orthologous_identity_threshold: Minimum orthologous identity (default: 0.0)
- load_from_tensor(rankings: flashscenic.cistarget.ArrayLike, motif_names: Optional[List[str]] = None, gene_names: Optional[List[str]] = None)#
Load database from tensor/array directly.
Args: rankings: (n_motifs, n_genes) ranking matrix motif_names: Optional list of motif names gene_names: Optional list of gene names
- genes_to_indices(genes: List[str]) torch.Tensor#
Convert gene names to indices tensor.
- prune(module_gene_indices: flashscenic.cistarget.ArrayLike, weights: Optional[flashscenic.cistarget.ArrayLike] = None, tf_name: Optional[str] = None) Dict[str, torch.Tensor]#
Prune a single module (single database mode only).
Args: module_gene_indices: (n_module_genes,) indices of genes in module weights: Optional (n_module_genes,) gene weights tf_name: TF name for TF-specific annotation filtering (matching pySCENIC behavior). If None, keeps motifs with any annotation.
Returns: Dict with pruning results (all tensors)
- prune_batch(modules: List[torch.Tensor], weights_list: Optional[List[torch.Tensor]] = None) List[Dict[str, torch.Tensor]]#
Prune multiple modules.
Args: modules: List of (n_genes_i,) tensors with gene indices weights_list: Optional list of weight tensors
Returns: List of pruning result dicts
- get_enriched_motif_names(result: Dict[str, torch.Tensor]) List[str]#
Get names of enriched motifs from pruning result.
- get_leading_edge_genes(result: Dict[str, torch.Tensor], module_gene_indices: torch.Tensor) List[List[str]]#
Get leading edge gene names for each enriched motif.
Args: result: Pruning result dict module_gene_indices: Original module gene indices
Returns: List of gene name lists, one per enriched motif
- prune_modules(modules: List[torch.Tensor], tf_names: List[str], gene_names: List[str], weights_list: Optional[List[torch.Tensor]] = None) List[Dict]#
Prune modules across all databases and merge results (multi-database mode only).
Args: modules: List of (n_genes_i,) tensors with gene indices for each TF module tf_names: List of TF names corresponding to modules gene_names: List of all gene names weights_list: Optional list of weight tensors for each module
Returns: List of regulon dictionaries with keys: name, tf, motif, n_genes, genes, context, nes, auc
- _merge_regulons(regulons: List[Dict]) List[Dict]#
Merge regulons from multiple databases.
For the same TF+motif combination:
If merge_strategy=’union’: keep all (they may have different genes from different DBs)
If merge_strategy=’best’: keep the one with highest NES
Note: pyscenic uses union strategy - it merges genes from all databases for the same TF+motif combination.
- _merge_regulons_by_tf(regulons: List[Dict]) List[Dict]#
Merge regulons by TF, matching pyscenic’s df2regulons behavior.
pyscenic groups by (TF, Type) and uses Regulon.union to merge all motifs for each TF into a single regulon. This function implements the same logic.
Args: regulons: List of regulon dictionaries
Returns: Merged regulons (one per TF)
- clear_gpu_memory()#
Release GPU memory.
- flashscenic.cistarget.compute_recovery_aucs(rankings: torch.Tensor, module_gene_indices: torch.Tensor, rank_threshold: int, auc_threshold: float, weights: Optional[torch.Tensor] = None) Tuple[torch.Tensor, torch.Tensor][source]#
Compute recovery curves and AUCs for all motifs given module genes.
Vectorized implementation - processes all motifs in parallel.
Args: rankings: (n_motifs, n_genes) - rank of each gene for each motif (0-indexed) module_gene_indices: (n_module_genes,) - indices of genes in the module rank_threshold: Maximum rank to consider for recovery curve auc_threshold: Fraction of genome for AUC calculation weights: (n_module_genes,) - optional weights for weighted recovery
Returns: rccs: (n_motifs, rank_threshold) - recovery curves aucs: (n_motifs,) - AUC values
- flashscenic.cistarget.compute_nes(aucs: torch.Tensor) torch.Tensor[source]#
Compute Normalized Enrichment Scores (NES) from AUC values.
NES = (AUC - mean(AUC)) / std(AUC) Uses population std (ddof=0) to match pySCENIC.
- flashscenic.cistarget.prune_single_module(rankings: torch.Tensor, module_gene_indices: torch.Tensor, rank_threshold: int = 5000, auc_threshold: float = 0.05, nes_threshold: float = 3.0, weights: Optional[torch.Tensor] = None) Dict[str, torch.Tensor][source]#
Perform cisTarget pruning for a single module.
All inputs and outputs are tensors on the same device.
Args: rankings: (n_motifs, n_genes) - ranking database tensor module_gene_indices: (n_module_genes,) - gene indices for this module rank_threshold: Maximum rank for recovery curve auc_threshold: Fraction of genome for AUC nes_threshold: NES threshold for enrichment weights: Optional (n_module_genes,) gene weights
Returns: Dict with keys: - enriched_mask: (n_motifs,) bool - which motifs are enriched - nes: (n_motifs,) - NES scores - aucs: (n_motifs,) - AUC scores - rccs: (n_motifs, rank_threshold) - recovery curves - leading_edge_masks: (n_enriched, n_module_genes) - leading edge for each enriched motif - rank_at_max: (n_enriched,) - rank at max for each enriched motif
- class MotifAnnotation#
Lightweight motif annotation storage without pandas.
Stores motif annotations in a dictionary for fast lookup. Matches pyscenic’s annotation filtering behavior.
Initialization
- classmethod load_from_file(fname: str, motif_similarity_fdr: float = 0.001, orthologous_identity_threshold: float = 0.0, column_names: Optional[Tuple[str, ...]] = None) flashscenic.cistarget.MotifAnnotation#
Load motif annotations from a motif2TF snapshot file.
Args: fname: Path to TSV annotation file motif_similarity_fdr: Maximum FDR threshold (default: 0.001) orthologous_identity_threshold: Minimum orthologous identity (default: 0.0) column_names: Optional tuple of column names to use. If None, reads from header.
Returns: MotifAnnotation instance
- has_annotation(motif_id: str, tf_name: Optional[str] = None) bool#
Check if a motif has annotation.
Args: motif_id: Motif ID tf_name: Optional TF name (if provided, checks (TF, motif) pair)
Returns: True if annotation exists
- get_annotation(motif_id: str, tf_name: Optional[str] = None) Optional[Dict]#
Get annotation for a motif.
Args: motif_id: Motif ID tf_name: Optional TF name
Returns: Annotation dict or None
- flashscenic.cistarget.filter_by_annotations(result: Dict[str, torch.Tensor], motif_names: List[str], motif_annotations: Optional[flashscenic.cistarget.MotifAnnotation], filter_for_annotation: bool = True, tf_name: Optional[str] = None) Dict[str, torch.Tensor][source]#
Filter enriched motifs by annotations (CPU implementation).
Matches pyscenic behavior: filters enriched motifs to keep only those annotated for the specific TF of the module being pruned.
Args: result: Pruning result dict with ‘enriched_mask’, ‘nes’, ‘aucs’, etc. motif_names: List of motif names (from database) motif_annotations: MotifAnnotation object (None = no filtering) filter_for_annotation: If True, only keep motifs with annotations tf_name: TF name to filter for. If provided, only keep motifs annotated for this specific TF (matching pySCENIC behavior). If None, keep motifs with any annotation.
Returns: Filtered result dict (all tensors remain on original device)
Module Utilities#
Select top-k targets per TF from adjacency matrix. |
|
Select targets above threshold from adjacency matrix. |
|
Select top-N regulators per target gene, then regroup by TF. |
|
Filter out TFs with fewer than min_targets or below min_fraction of targets. |
|
Filter out TFs where less than min_fraction of targets can be mapped to a reference. |
|
Select targets using Gaussian mixture model to separate signal from noise. |
|
Select targets using knee/elbow detection on the sorted edge weight curve. |
|
Get indices of non-zero targets for each TF. |
|
Convert weighted adjacency matrix to binary (0/1). |
|
Convert tensor to numpy array. |
- flashscenic.modules.select_topk_targets(adj: flashscenic.modules.ArrayLike, k: int = 50, include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') torch.Tensor[source]#
Select top-k targets per TF from adjacency matrix.
Parameters
adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights k : int, default=50 Number of top targets to select per TF include_tf : bool, default=True Include TF itself in its module (sets diagonal to 1 if tf_indices provided) tf_indices : array-like, optional Index of each TF in the gene list. Required if include_tf=True and TFs are part of the gene set. device : str, default=’cuda’ Device for computation
Returns
torch.Tensor Filtered adjacency matrix with only top-k targets per TF Shape: (n_tfs, n_genes)
Example
adj = torch.rand(100, 5000) # 100 TFs, 5000 genes filtered = select_topk_targets(adj, k=50)
Each row now has at most 50 non-zero values
- flashscenic.modules.select_threshold_targets(adj: flashscenic.modules.ArrayLike, threshold: float = 0.0, percentile: Optional[float] = None, include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') torch.Tensor[source]#
Select targets above threshold from adjacency matrix.
Parameters
adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights threshold : float, default=0.0 Absolute threshold (edges below this become 0) percentile : float, optional If provided, use this percentile of non-zero weights as threshold (overrides threshold parameter). Value between 0-100. include_tf : bool, default=True Include TF itself in its module tf_indices : array-like, optional Index of each TF in the gene list device : str, default=’cuda’ Device for computation
Returns
torch.Tensor Filtered adjacency matrix
- flashscenic.modules.select_top_n_per_target(adj: flashscenic.modules.ArrayLike, n: int = 5, include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') torch.Tensor[source]#
Select top-N regulators per target gene, then regroup by TF.
For each target gene, finds its N strongest regulators (TFs). The result is regrouped back into the standard (n_tfs, n_genes) layout. This is the inverted view used by pySCENIC’s “top N per target” module type.
Parameters
adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights n : int, default=5 Number of top regulators to select per target gene include_tf : bool, default=True Include TF itself in its module (sets diagonal to 1 if tf_indices provided) tf_indices : array-like, optional Index of each TF in the gene list. Required if include_tf=True and TFs are part of the gene set. device : str, default=’cuda’ Device for computation
Returns
torch.Tensor Filtered adjacency matrix with only top-N-per-target entries Shape: (n_tfs, n_genes)
- flashscenic.modules.filter_by_min_targets(adj: flashscenic.modules.ArrayLike, min_targets: int = 20, min_fraction: Optional[float] = None, device: str = 'cuda') Tuple[torch.Tensor, torch.Tensor][source]#
Filter out TFs with fewer than min_targets or below min_fraction of targets.
This function supports both absolute count filtering (like pySCENIC’s min_genes=20) and percentage-based filtering (like pySCENIC’s 80% mapping requirement).
Parameters
adj : array-like Adjacency matrix (n_tfs x n_genes) min_targets : int, default=20 Minimum number of non-zero targets required. Set to 0 to disable absolute count filtering. min_fraction : float or None, default=0.8 Minimum fraction of total genes that must be non-zero targets. Value between 0.0 and 1.0. Default 0.8 matches pySCENIC’s behavior of skipping modules where less than 80% of genes can be mapped. Set to None to disable fraction-based filtering. device : str, default=’cuda’ Device for computation
Returns
Tuple[torch.Tensor, torch.Tensor] - Filtered adjacency matrix (n_valid_tfs x n_genes) - Boolean mask indicating which TFs were kept
Examples
adj = torch.rand(100, 5000) > 0.5 # Random binary adjacency
Filter by absolute count (default pySCENIC behavior)
filtered, mask = filter_by_min_targets(adj, min_targets=20)
Filter by percentage (pySCENIC’s 80% rule)
filtered, mask = filter_by_min_targets(adj, min_targets=0, min_fraction=0.8)
Combine both filters
filtered, mask = filter_by_min_targets(adj, min_targets=20, min_fraction=0.8)
- flashscenic.modules.filter_by_mapped_fraction(adj: flashscenic.modules.ArrayLike, reference_indices: Optional[flashscenic.modules.ArrayLike] = None, min_fraction: float = 0.8, device: str = 'cuda') Tuple[torch.Tensor, torch.Tensor][source]#
Filter out TFs where less than min_fraction of targets can be mapped to a reference.
This mimics pySCENIC’s behavior of skipping modules where “less than 80% of the genes could be mapped to the ranking database” (transform.py:298-307).
Parameters
adj : array-like Adjacency matrix (n_tfs x n_genes) reference_indices : array-like, optional Indices of genes that exist in the reference database (e.g., ranking DB). If None, uses all genes (no filtering based on mapping). min_fraction : float, default=0.8 Minimum fraction of targets that must be mappable to the reference. Default 0.8 matches pySCENIC’s 80% threshold. device : str, default=’cuda’ Device for computation
Returns
Tuple[torch.Tensor, torch.Tensor] - Filtered adjacency matrix (n_valid_tfs x n_genes) - Boolean mask indicating which TFs were kept
Notes
pySCENIC’s logic (from transform.py): n_missing = len(module) - len(genes) # genes not in ranking DB frac_missing = float(n_missing) / len(module) if frac_missing >= 0.20: # i.e., less than 80% mapped skip this module
Examples
adj = torch.rand(100, 5000) > 0.5 # 100 TFs, 5000 genes
Assume only genes 0-4000 are in the ranking database
db_gene_indices = torch.arange(4000) filtered, mask = filter_by_mapped_fraction(adj, db_gene_indices, min_fraction=0.8)
- flashscenic.modules.select_mixture_model_targets(adj: flashscenic.modules.ArrayLike, n_components: int = 2, method: str = 'intersection', include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') Tuple[torch.Tensor, dict][source]#
Select targets using Gaussian mixture model to separate signal from noise.
Fits a GMM to the non-zero edge weights across the full adjacency matrix. The threshold is derived from the fitted components, providing a data-driven alternative to fixed thresholds like
adj[adj < 1.0] = 0.Parameters
adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights. n_components : int, default=2 Number of Gaussian components (2 = noise + signal). method : str, default=’intersection’ How to derive the threshold from the fitted GMM:
- ``'intersection'``: weight where the posterior probabilities of the two components cross (between the two means). - ``'posterior'``: keep edges with P(signal | weight) > 0.5. - ``'noise_quantile'``: noise_mean + 2 * noise_std.
include_tf : bool, default=True Include TF itself in its module. tf_indices : array-like, optional Index of each TF in the gene list. device : str, default=’cuda’ Device for output tensor.
Returns
Tuple[torch.Tensor, dict] - Filtered adjacency matrix (n_tfs, n_genes) on device. - Info dict with keys:
threshold,means,stds,weights,converged,method.
- flashscenic.modules.select_knee_targets(adj: flashscenic.modules.ArrayLike, sensitivity: float = 1.0, per_tf: bool = False, include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') Tuple[torch.Tensor, dict][source]#
Select targets using knee/elbow detection on the sorted edge weight curve.
Sorts non-zero edge weights in descending order and finds the “knee” point where the rate of decrease changes most sharply. This provides a data-driven threshold without assuming a parametric distribution.
Parameters
adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights. sensitivity : float, default=1.0 Controls how aggressively the knee is detected. Higher values detect the knee earlier (more aggressive pruning). Values in [0.5, 3.0] are typical. per_tf : bool, default=False If True, find a separate knee per TF row. If False, find a single global knee across all weights. include_tf : bool, default=True Include TF itself in its module. tf_indices : array-like, optional Index of each TF in the gene list. device : str, default=’cuda’ Device for output tensor.
Returns
Tuple[torch.Tensor, dict] - Filtered adjacency matrix (n_tfs, n_genes) on device. - Info dict with keys:
- ``threshold``: float (global) or list of float (per-TF) - ``knee_index``: int or list of int - ``per_tf``: bool
- flashscenic.modules.get_target_indices(adj: flashscenic.modules.ArrayLike, device: str = 'cuda') Tuple[torch.Tensor, torch.Tensor][source]#
Get indices of non-zero targets for each TF.
Useful for cisTarget pruning which needs gene indices.
Parameters
adj : array-like Adjacency matrix (n_tfs x n_genes) device : str, default=’cuda’ Device for computation
Returns
Tuple[torch.Tensor, torch.Tensor] - Flat tensor of gene indices - Tensor of (start, end) positions for each TF’s targets
Analysis#
- flashscenic.rss.regulon_specificity_scores(auc_matrix, cell_type_labels, regulon_names=None)[source]#
Compute Regulon Specificity Scores (RSS) based on Jensen-Shannon divergence.
RSS quantifies how specific each regulon’s activity is to each cell type. A score close to 1 means the regulon is exclusively active in that cell type.
Reference: Suo et al. 2018 (doi: 10.1016/j.celrep.2018.10.045)
Parameters
auc_matrix : np.ndarray AUCell scores of shape (n_cells, n_regulons). cell_type_labels : array-like Cell type label per cell (length n_cells). Can be a list, numpy array, or pandas Series. regulon_names : list of str, optional Names for each regulon column. If None, integer indices are used.
Returns
dict ‘rss’ : np.ndarray of shape (n_cell_types, n_regulons) RSS values. Higher means more specific. ‘cell_types’ : list of str Sorted unique cell type labels (row labels of rss). ‘regulon_names’ : list of str Regulon names (column labels of rss).
Helpers#
- flashscenic.regulons_to_adjacency(regulons: list[dict], gene_names: list[str]) numpy.ndarray[source]#
Convert list of regulon dicts to adjacency matrix for AUCell.
Parameters
regulons : list of dict Output from CisTargetPruner.prune_modules(), each dict has ‘genes’ key gene_names : list of str List of gene names matching columns of expression matrix
Returns
np.ndarray Binary adjacency matrix of shape (n_regulons, n_genes)