API Reference

API Reference#

Pipeline#

flashscenic.pipeline.run_flashscenic

Run the complete flashscenic pipeline.

flashscenic.pipeline.run_flashscenic(exp_matrix: numpy.ndarray, gene_names: List[str], species: str = 'human', *, datasource: str = 'scenic', version: str = 'v10', cache_dir: Optional[str] = None, tf_list_path: Optional[str] = None, ranking_db_paths: Optional[List[str]] = None, motif_annotation_path: Optional[str] = None, grn_n_steps: int = 1000, grn_sparsity_threshold: float = 1.5, module_k: int = 50, module_percentile_thresholds: tuple = (75,), module_top_n_per_target: tuple = (5, 10, 50), module_min_targets: int = 20, module_min_fraction: float = None, module_include_tf: bool = True, pruning_rank_threshold: int = 5000, pruning_auc_threshold: float = 0.05, pruning_nes_threshold: float = 3.0, pruning_min_genes: int = 0, pruning_merge_strategy: str = 'union', annotation_motif_similarity_fdr: float = 0.001, annotation_orthologous_identity: float = 0.0, aucell_k: Optional[int] = None, aucell_auc_threshold: float = 0.05, aucell_batch_size: int = 32, device: str = 'cuda', seed: Optional[int] = None, verbose: bool = True) → Dict[source]#

Run the complete flashscenic pipeline.

Performs GRN inference (RegDiffusion), module filtering, cisTarget pruning, and AUCell scoring. Stops before dimensionality reduction / visualization.

Parameters

exp_matrix : np.ndarray Expression matrix of shape (n_cells, n_genes). Should be log-transformed and (optionally) subset to highly variable genes. gene_names : list of str Gene names corresponding to columns of exp_matrix. Length must equal exp_matrix.shape[1]. species : str, default=’human’ Species for TF list and ranking databases. One of ‘human’, ‘mouse’, ‘drosophila’.

datasource : str, default=’scenic’ Data source for resource downloads. version : str, default=’v10’ Motif collection version. cache_dir : str or None Cache directory for downloaded resources. Defaults to ./flashscenic_data/. tf_list_path : str or None Path to a custom TF list file. Overrides downloaded TF list. ranking_db_paths : list of str or None Paths to custom ranking database .feather files. Overrides downloaded databases. motif_annotation_path : str or None Path to a custom motif annotation .tbl file. Overrides downloaded annotation.

grn_n_steps : int, default=1000 Number of training steps for RegDiffusion. grn_sparsity_threshold : float, default=1.5 Edges below this weight are zeroed. Higher = sparser network.

module_k : int, default=50 Top target genes per TF for module creation. module_percentile_thresholds : tuple of int, default=(75,) Percentile thresholds for percentile-based modules. Each value creates a module type keeping targets above that global weight percentile. Empty tuple to skip. module_top_n_per_target : tuple of int, default=(5, 10, 50) N values for top-N-per-target modules. For each N, finds each target gene’s top N strongest regulators, then regroups by TF. Empty tuple to skip. module_min_targets : int, default=20 Minimum target genes for a TF module to be retained. module_min_fraction : float, default=0.8 Minimum fraction of targets required. Matches pySCENIC’s 80% rule. module_include_tf : bool, default=True Include TF itself in its own module.

pruning_rank_threshold : int, default=5000 Maximum rank for cisTarget recovery curve. pruning_auc_threshold : float, default=0.05 Fraction of genome for cisTarget AUC. pruning_nes_threshold : float, default=3.0 NES threshold for motif enrichment. pruning_min_genes : int, default=0 Minimum genes per regulon after pruning. pruning_merge_strategy : str, default=’union’ How to merge regulons from multiple databases (‘union’ or ‘best’).

annotation_motif_similarity_fdr : float, default=0.001 Maximum FDR for motif similarity filtering. annotation_orthologous_identity : float, default=0.0 Minimum orthologous identity threshold.

aucell_k : int or None Top k targets for AUCell scoring. Defaults to module_k if None. aucell_auc_threshold : float, default=0.05 Fraction of genome for AUCell AUC calculation. aucell_batch_size : int, default=32 Batch size for AUCell computation.

device : str, default=’cuda’ PyTorch device (‘cuda’ or ‘cpu’). seed : int or None Random seed for reproducibility. verbose : bool, default=True Print progress messages.

Returns

dict - 'auc_scores': np.ndarray of shape (n_cells, n_regulons) - 'regulon_names': list of regulon name strings - 'regulons': list of regulon dicts from cisTarget - 'regulon_adj': np.ndarray of shape (n_regulons, n_genes) - 'parameters': dict of all parameters used

Raises

ImportError If regdiffusion is not installed. ValueError If no TFs survive filtering or no regulons survive pruning.

Examples

import flashscenic as fs result = fs.run_flashscenic(exp_matrix, gene_names, species=’human’) auc_scores = result[‘auc_scores’] # (n_cells, n_regulons)

Data Download#

`flashscenic.data.download_data`	Download cistarget resource files required for the flashscenic pipeline.
`flashscenic.data.list_available_resources`	List available resource sets, optionally filtered.
`flashscenic.data.DownloadedResources`	Paths to downloaded resource files.

flashscenic.data.download_data(species: str = 'human', version: str = 'v10', datasource: str = 'scenic', cache_dir: Optional[str] = None, force: bool = False) → flashscenic.data.DownloadedResources[source]#

Download cistarget resource files required for the flashscenic pipeline.

Parameters

species : str, default=’human’ Species to download resources for. One of ‘human’, ‘mouse’, ‘drosophila’. version : str, default=’v10’ Motif collection version. One of ‘v10’ (recommended), ‘v9’. datasource : str, default=’scenic’ Data source identifier. Currently only ‘scenic’ (Aertslab) is supported. Architecture supports adding alternative sources. cache_dir : str or None, default=None Local directory to store downloaded files. If None, defaults to ./flashscenic_data/. force : bool, default=False If True, re-download files even if they already exist locally.

Returns

DownloadedResources Dataclass with Path objects pointing to each downloaded file.

Raises

ValueError If the species/version/datasource combination is not recognized. ConnectionError If download fails after retries.

flashscenic.data.list_available_resources(datasource: Optional[str] = None, species: Optional[str] = None, version: Optional[str] = None) → List[flashscenic.data.ResourceSet][source]#

List available resource sets, optionally filtered.

Parameters

datasource : str or None Filter by data source (e.g., ‘scenic’). species : str or None Filter by species (e.g., ‘human’, ‘mouse’, ‘drosophila’). version : str or None Filter by version (e.g., ‘v10’, ‘v9’).

Returns

list of ResourceSet Matching resource sets.

class DownloadedResources#

Paths to downloaded resource files.

tf_list: Optional[pathlib.Path]#: None

ranking_dbs: List[pathlib.Path]#: ‘field(…)’

motif_annotation: Optional[pathlib.Path]#: None

cache_dir: pathlib.Path#: ‘field(…)’

__repr__() → str#

AUCell#

flashscenic.aucell.get_aucell(exp_array, adj_array, k=50, auc_threshold=0.05, device='cuda', batch_size=32, seed=None)[source]#

Fully vectorized pySCENIC-equivalent AUCell calculation.

Uses the actual number of target genes per TF/regulon (not fixed k) to match pySCENIC behavior. When adj_array has weighted entries, uses weights for AUC.

Args: exp_array (np.ndarray): Expression matrix (n_cells x n_genes) adj_array (np.ndarray): Adjacency matrix (n_tfs x n_genes), can be binary or weighted k (int): Max target genes per TF (pads shorter regulons). Default is 50. auc_threshold (float): Fraction of genome for AUC calculation. Default is 0.05. device (str): Device, ‘cpu’ or ‘cuda’. Default is ‘cuda’. batch_size (int): Batch size for processing cells. Default is 32. seed (int): Random seed for tie-breaking. Default is None.

Returns: np.ndarray: AUCell scores matrix of shape (n_cells, n_TFs)

cisTarget#

`flashscenic.cistarget.CisTargetPruner`	GPU-accelerated cisTarget pruning with support for single or multiple databases.
`flashscenic.cistarget.compute_recovery_aucs`	Compute recovery curves and AUCs for all motifs given module genes.
`flashscenic.cistarget.compute_nes`	Compute Normalized Enrichment Scores (NES) from AUC values.
`flashscenic.cistarget.prune_single_module`	Perform cisTarget pruning for a single module.
`flashscenic.cistarget.MotifAnnotation`	Lightweight motif annotation storage without pandas.
`flashscenic.cistarget.filter_by_annotations`	Filter enriched motifs by annotations (CPU implementation).

class CisTargetPruner(rank_threshold: int = 5000, auc_threshold: float = 0.05, nes_threshold: float = 3.0, device: str = 'cuda', min_genes_per_regulon: int = 0, merge_strategy: str = 'union')#

GPU-accelerated cisTarget pruning with support for single or multiple databases.

Example (single database): ```python pruner = CisTargetPruner(device=’cuda’) pruner.load_database(‘rankings.feather’) pruner.load_annotations(‘motifs.tbl’, filter_for_annotation=True)

# Prune with tensor input
result = pruner.prune(module_gene_indices)
```

Example (multiple databases): ```python pruner = CisTargetPruner(device=’cuda’) pruner.load_database([‘db_500bp.feather’, ‘db_10kb.feather’]) pruner.load_annotations(‘motifs.tbl’)

# Prune modules across all databases
regulon_info = pruner.prune_modules(modules, tf_names, gene_names)
```

Initialization

load_database(paths: Union[str, List[str]], database_names: Optional[Union[str, List[str]]] = None)#

Load ranking database(s) from feather file(s).

Args: paths: Path to .feather ranking database, or list of paths for multiple databases database_names: Optional name(s) for database(s) (defaults to filename(s))

load_annotations(annotation_file: str, filter_for_annotation: bool = True, motif_similarity_fdr: float = 0.001, orthologous_identity_threshold: float = 0.0)#

Load motif annotations and enable filtering.

Args: annotation_file: Path to motif annotation TSV file filter_for_annotation: If True, filter enriched motifs to keep only those with annotations motif_similarity_fdr: Maximum FDR threshold (default: 0.001) orthologous_identity_threshold: Minimum orthologous identity (default: 0.0)

load_from_tensor(rankings: flashscenic.cistarget.ArrayLike, motif_names: Optional[List[str]] = None, gene_names: Optional[List[str]] = None)#

Load database from tensor/array directly.

Args: rankings: (n_motifs, n_genes) ranking matrix motif_names: Optional list of motif names gene_names: Optional list of gene names

genes_to_indices(genes: List[str]) → torch.Tensor#: Convert gene names to indices tensor.

prune(module_gene_indices: flashscenic.cistarget.ArrayLike, weights: Optional[flashscenic.cistarget.ArrayLike] = None, tf_name: Optional[str] = None) → Dict[str, torch.Tensor]#

Prune a single module (single database mode only).

Args: module_gene_indices: (n_module_genes,) indices of genes in module weights: Optional (n_module_genes,) gene weights tf_name: TF name for TF-specific annotation filtering (matching pySCENIC behavior). If None, keeps motifs with any annotation.

Returns: Dict with pruning results (all tensors)

prune_batch(modules: List[torch.Tensor], weights_list: Optional[List[torch.Tensor]] = None) → List[Dict[str, torch.Tensor]]#

Prune multiple modules.

Args: modules: List of (n_genes_i,) tensors with gene indices weights_list: Optional list of weight tensors

Returns: List of pruning result dicts

get_enriched_motif_names(result: Dict[str, torch.Tensor]) → List[str]#: Get names of enriched motifs from pruning result.

get_leading_edge_genes(result: Dict[str, torch.Tensor], module_gene_indices: torch.Tensor) → List[List[str]]#

Get leading edge gene names for each enriched motif.

Args: result: Pruning result dict module_gene_indices: Original module gene indices

Returns: List of gene name lists, one per enriched motif

prune_modules(modules: List[torch.Tensor], tf_names: List[str], gene_names: List[str], weights_list: Optional[List[torch.Tensor]] = None) → List[Dict]#

Prune modules across all databases and merge results (multi-database mode only).

Args: modules: List of (n_genes_i,) tensors with gene indices for each TF module tf_names: List of TF names corresponding to modules gene_names: List of all gene names weights_list: Optional list of weight tensors for each module

Returns: List of regulon dictionaries with keys: name, tf, motif, n_genes, genes, context, nes, auc

_merge_regulons(regulons: List[Dict]) → List[Dict]#

Merge regulons from multiple databases.

For the same TF+motif combination:

If merge_strategy=’union’: keep all (they may have different genes from different DBs)
If merge_strategy=’best’: keep the one with highest NES

Note: pyscenic uses union strategy - it merges genes from all databases for the same TF+motif combination.

_merge_regulons_by_tf(regulons: List[Dict]) → List[Dict]#

Merge regulons by TF, matching pyscenic’s df2regulons behavior.

pyscenic groups by (TF, Type) and uses Regulon.union to merge all motifs for each TF into a single regulon. This function implements the same logic.

Args: regulons: List of regulon dictionaries

Returns: Merged regulons (one per TF)

clear_gpu_memory()#: Release GPU memory.

flashscenic.cistarget.compute_recovery_aucs(rankings: torch.Tensor, module_gene_indices: torch.Tensor, rank_threshold: int, auc_threshold: float, weights: Optional[torch.Tensor] = None) → Tuple[torch.Tensor, torch.Tensor][source]#

Compute recovery curves and AUCs for all motifs given module genes.

Vectorized implementation - processes all motifs in parallel.

Args: rankings: (n_motifs, n_genes) - rank of each gene for each motif (0-indexed) module_gene_indices: (n_module_genes,) - indices of genes in the module rank_threshold: Maximum rank to consider for recovery curve auc_threshold: Fraction of genome for AUC calculation weights: (n_module_genes,) - optional weights for weighted recovery

Returns: rccs: (n_motifs, rank_threshold) - recovery curves aucs: (n_motifs,) - AUC values

flashscenic.cistarget.compute_nes(aucs: torch.Tensor) → torch.Tensor[source]#

Compute Normalized Enrichment Scores (NES) from AUC values.

NES = (AUC - mean(AUC)) / std(AUC) Uses population std (ddof=0) to match pySCENIC.

flashscenic.cistarget.prune_single_module(rankings: torch.Tensor, module_gene_indices: torch.Tensor, rank_threshold: int = 5000, auc_threshold: float = 0.05, nes_threshold: float = 3.0, weights: Optional[torch.Tensor] = None) → Dict[str, torch.Tensor][source]#

Perform cisTarget pruning for a single module.

All inputs and outputs are tensors on the same device.

Args: rankings: (n_motifs, n_genes) - ranking database tensor module_gene_indices: (n_module_genes,) - gene indices for this module rank_threshold: Maximum rank for recovery curve auc_threshold: Fraction of genome for AUC nes_threshold: NES threshold for enrichment weights: Optional (n_module_genes,) gene weights

Returns: Dict with keys: - enriched_mask: (n_motifs,) bool - which motifs are enriched - nes: (n_motifs,) - NES scores - aucs: (n_motifs,) - AUC scores - rccs: (n_motifs, rank_threshold) - recovery curves - leading_edge_masks: (n_enriched, n_module_genes) - leading edge for each enriched motif - rank_at_max: (n_enriched,) - rank at max for each enriched motif

class MotifAnnotation#

Lightweight motif annotation storage without pandas.

Stores motif annotations in a dictionary for fast lookup. Matches pyscenic’s annotation filtering behavior.

Initialization

classmethod load_from_file(fname: str, motif_similarity_fdr: float = 0.001, orthologous_identity_threshold: float = 0.0, column_names: Optional[Tuple[str, ...]] = None) → flashscenic.cistarget.MotifAnnotation#

Load motif annotations from a motif2TF snapshot file.

Args: fname: Path to TSV annotation file motif_similarity_fdr: Maximum FDR threshold (default: 0.001) orthologous_identity_threshold: Minimum orthologous identity (default: 0.0) column_names: Optional tuple of column names to use. If None, reads from header.

Returns: MotifAnnotation instance

has_annotation(motif_id: str, tf_name: Optional[str] = None) → bool#

Check if a motif has annotation.

Args: motif_id: Motif ID tf_name: Optional TF name (if provided, checks (TF, motif) pair)

Returns: True if annotation exists

get_annotation(motif_id: str, tf_name: Optional[str] = None) → Optional[Dict]#

Get annotation for a motif.

Args: motif_id: Motif ID tf_name: Optional TF name

Returns: Annotation dict or None

flashscenic.cistarget.filter_by_annotations(result: Dict[str, torch.Tensor], motif_names: List[str], motif_annotations: Optional[flashscenic.cistarget.MotifAnnotation], filter_for_annotation: bool = True, tf_name: Optional[str] = None) → Dict[str, torch.Tensor][source]#

Filter enriched motifs by annotations (CPU implementation).

Matches pyscenic behavior: filters enriched motifs to keep only those annotated for the specific TF of the module being pruned.

Args: result: Pruning result dict with ‘enriched_mask’, ‘nes’, ‘aucs’, etc. motif_names: List of motif names (from database) motif_annotations: MotifAnnotation object (None = no filtering) filter_for_annotation: If True, only keep motifs with annotations tf_name: TF name to filter for. If provided, only keep motifs annotated for this specific TF (matching pySCENIC behavior). If None, keep motifs with any annotation.

Returns: Filtered result dict (all tensors remain on original device)

Module Utilities#

`flashscenic.modules.select_topk_targets`	Select top-k targets per TF from adjacency matrix.
`flashscenic.modules.select_threshold_targets`	Select targets above threshold from adjacency matrix.
`flashscenic.modules.select_top_n_per_target`	Select top-N regulators per target gene, then regroup by TF.
`flashscenic.modules.filter_by_min_targets`	Filter out TFs with fewer than min_targets or below min_fraction of targets.
`flashscenic.modules.filter_by_mapped_fraction`	Filter out TFs where less than min_fraction of targets can be mapped to a reference.
`flashscenic.modules.select_mixture_model_targets`	Select targets using Gaussian mixture model to separate signal from noise.
`flashscenic.modules.select_knee_targets`	Select targets using knee/elbow detection on the sorted edge weight curve.
`flashscenic.modules.get_target_indices`	Get indices of non-zero targets for each TF.
`flashscenic.modules.binarize`	Convert weighted adjacency matrix to binary (0/1).
`flashscenic.modules.to_numpy`	Convert tensor to numpy array.

flashscenic.modules.select_topk_targets(adj: flashscenic.modules.ArrayLike, k: int = 50, include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') → torch.Tensor[source]#

Select top-k targets per TF from adjacency matrix.

Parameters

adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights k : int, default=50 Number of top targets to select per TF include_tf : bool, default=True Include TF itself in its module (sets diagonal to 1 if tf_indices provided) tf_indices : array-like, optional Index of each TF in the gene list. Required if include_tf=True and TFs are part of the gene set. device : str, default=’cuda’ Device for computation

Returns

torch.Tensor Filtered adjacency matrix with only top-k targets per TF Shape: (n_tfs, n_genes)

Example

adj = torch.rand(100, 5000) # 100 TFs, 5000 genes filtered = select_topk_targets(adj, k=50)

Each row now has at most 50 non-zero values

flashscenic.modules.select_threshold_targets(adj: flashscenic.modules.ArrayLike, threshold: float = 0.0, percentile: Optional[float] = None, include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') → torch.Tensor[source]#

Select targets above threshold from adjacency matrix.

Parameters

adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights threshold : float, default=0.0 Absolute threshold (edges below this become 0) percentile : float, optional If provided, use this percentile of non-zero weights as threshold (overrides threshold parameter). Value between 0-100. include_tf : bool, default=True Include TF itself in its module tf_indices : array-like, optional Index of each TF in the gene list device : str, default=’cuda’ Device for computation

Returns

torch.Tensor Filtered adjacency matrix

flashscenic.modules.select_top_n_per_target(adj: flashscenic.modules.ArrayLike, n: int = 5, include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') → torch.Tensor[source]#

Select top-N regulators per target gene, then regroup by TF.

For each target gene, finds its N strongest regulators (TFs). The result is regrouped back into the standard (n_tfs, n_genes) layout. This is the inverted view used by pySCENIC’s “top N per target” module type.

Parameters

adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights n : int, default=5 Number of top regulators to select per target gene include_tf : bool, default=True Include TF itself in its module (sets diagonal to 1 if tf_indices provided) tf_indices : array-like, optional Index of each TF in the gene list. Required if include_tf=True and TFs are part of the gene set. device : str, default=’cuda’ Device for computation

Returns

torch.Tensor Filtered adjacency matrix with only top-N-per-target entries Shape: (n_tfs, n_genes)

flashscenic.modules.filter_by_min_targets(adj: flashscenic.modules.ArrayLike, min_targets: int = 20, min_fraction: Optional[float] = None, device: str = 'cuda') → Tuple[torch.Tensor, torch.Tensor][source]#

Filter out TFs with fewer than min_targets or below min_fraction of targets.

This function supports both absolute count filtering (like pySCENIC’s min_genes=20) and percentage-based filtering (like pySCENIC’s 80% mapping requirement).

Parameters

adj : array-like Adjacency matrix (n_tfs x n_genes) min_targets : int, default=20 Minimum number of non-zero targets required. Set to 0 to disable absolute count filtering. min_fraction : float or None, default=0.8 Minimum fraction of total genes that must be non-zero targets. Value between 0.0 and 1.0. Default 0.8 matches pySCENIC’s behavior of skipping modules where less than 80% of genes can be mapped. Set to None to disable fraction-based filtering. device : str, default=’cuda’ Device for computation

Returns

Tuple[torch.Tensor, torch.Tensor] - Filtered adjacency matrix (n_valid_tfs x n_genes) - Boolean mask indicating which TFs were kept

Examples

adj = torch.rand(100, 5000) > 0.5 # Random binary adjacency

Filter by absolute count (default pySCENIC behavior)

filtered, mask = filter_by_min_targets(adj, min_targets=20)

Filter by percentage (pySCENIC’s 80% rule)

filtered, mask = filter_by_min_targets(adj, min_targets=0, min_fraction=0.8)

Combine both filters

filtered, mask = filter_by_min_targets(adj, min_targets=20, min_fraction=0.8)

flashscenic.modules.filter_by_mapped_fraction(adj: flashscenic.modules.ArrayLike, reference_indices: Optional[flashscenic.modules.ArrayLike] = None, min_fraction: float = 0.8, device: str = 'cuda') → Tuple[torch.Tensor, torch.Tensor][source]#

Filter out TFs where less than min_fraction of targets can be mapped to a reference.

This mimics pySCENIC’s behavior of skipping modules where “less than 80% of the genes could be mapped to the ranking database” (transform.py:298-307).

Parameters

adj : array-like Adjacency matrix (n_tfs x n_genes) reference_indices : array-like, optional Indices of genes that exist in the reference database (e.g., ranking DB). If None, uses all genes (no filtering based on mapping). min_fraction : float, default=0.8 Minimum fraction of targets that must be mappable to the reference. Default 0.8 matches pySCENIC’s 80% threshold. device : str, default=’cuda’ Device for computation

Returns

Tuple[torch.Tensor, torch.Tensor] - Filtered adjacency matrix (n_valid_tfs x n_genes) - Boolean mask indicating which TFs were kept

Notes

pySCENIC’s logic (from transform.py): n_missing = len(module) - len(genes) # genes not in ranking DB frac_missing = float(n_missing) / len(module) if frac_missing >= 0.20: # i.e., less than 80% mapped skip this module

Examples

adj = torch.rand(100, 5000) > 0.5 # 100 TFs, 5000 genes

Assume only genes 0-4000 are in the ranking database

db_gene_indices = torch.arange(4000) filtered, mask = filter_by_mapped_fraction(adj, db_gene_indices, min_fraction=0.8)

flashscenic.modules.select_mixture_model_targets(adj: flashscenic.modules.ArrayLike, n_components: int = 2, method: str = 'intersection', include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') → Tuple[torch.Tensor, dict][source]#

Select targets using Gaussian mixture model to separate signal from noise.

Fits a GMM to the non-zero edge weights across the full adjacency matrix. The threshold is derived from the fitted components, providing a data-driven alternative to fixed thresholds like adj[adj < 1.0] = 0.

Parameters

adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights. n_components : int, default=2 Number of Gaussian components (2 = noise + signal). method : str, default=’intersection’ How to derive the threshold from the fitted GMM:

- ``'intersection'``: weight where the posterior probabilities of the
  two components cross (between the two means).
- ``'posterior'``: keep edges with P(signal | weight) > 0.5.
- ``'noise_quantile'``: noise_mean + 2 * noise_std.

include_tf : bool, default=True Include TF itself in its module. tf_indices : array-like, optional Index of each TF in the gene list. device : str, default=’cuda’ Device for output tensor.

Returns

Tuple[torch.Tensor, dict] - Filtered adjacency matrix (n_tfs, n_genes) on device. - Info dict with keys: threshold, means, stds, weights, converged, method.

flashscenic.modules.select_knee_targets(adj: flashscenic.modules.ArrayLike, sensitivity: float = 1.0, per_tf: bool = False, include_tf: bool = True, tf_indices: Optional[flashscenic.modules.ArrayLike] = None, device: str = 'cuda') → Tuple[torch.Tensor, dict][source]#

Select targets using knee/elbow detection on the sorted edge weight curve.

Sorts non-zero edge weights in descending order and finds the “knee” point where the rate of decrease changes most sharply. This provides a data-driven threshold without assuming a parametric distribution.

Parameters

adj : array-like Adjacency matrix (n_tfs x n_genes) with edge weights. sensitivity : float, default=1.0 Controls how aggressively the knee is detected. Higher values detect the knee earlier (more aggressive pruning). Values in [0.5, 3.0] are typical. per_tf : bool, default=False If True, find a separate knee per TF row. If False, find a single global knee across all weights. include_tf : bool, default=True Include TF itself in its module. tf_indices : array-like, optional Index of each TF in the gene list. device : str, default=’cuda’ Device for output tensor.

Returns

Tuple[torch.Tensor, dict] - Filtered adjacency matrix (n_tfs, n_genes) on device. - Info dict with keys:

  - ``threshold``: float (global) or list of float (per-TF)
  - ``knee_index``: int or list of int
  - ``per_tf``: bool

flashscenic.modules.get_target_indices(adj: flashscenic.modules.ArrayLike, device: str = 'cuda') → Tuple[torch.Tensor, torch.Tensor][source]#

Get indices of non-zero targets for each TF.

Useful for cisTarget pruning which needs gene indices.

Parameters

adj : array-like Adjacency matrix (n_tfs x n_genes) device : str, default=’cuda’ Device for computation

Returns

Tuple[torch.Tensor, torch.Tensor] - Flat tensor of gene indices - Tensor of (start, end) positions for each TF’s targets

flashscenic.modules.binarize(adj: flashscenic.modules.ArrayLike, device: str = 'cuda') → torch.Tensor[source]#

Convert weighted adjacency matrix to binary (0/1).

Parameters

adj : array-like Adjacency matrix (n_tfs x n_genes) device : str, default=’cuda’

Returns

torch.Tensor Binary adjacency matrix

flashscenic.modules.to_numpy(tensor: torch.Tensor) → numpy.ndarray[source]#: Convert tensor to numpy array.

Analysis#

flashscenic.rss.regulon_specificity_scores(auc_matrix, cell_type_labels, regulon_names=None)[source]#

Compute Regulon Specificity Scores (RSS) based on Jensen-Shannon divergence.

RSS quantifies how specific each regulon’s activity is to each cell type. A score close to 1 means the regulon is exclusively active in that cell type.

Reference: Suo et al. 2018 (doi: 10.1016/j.celrep.2018.10.045)

Parameters

auc_matrix : np.ndarray AUCell scores of shape (n_cells, n_regulons). cell_type_labels : array-like Cell type label per cell (length n_cells). Can be a list, numpy array, or pandas Series. regulon_names : list of str, optional Names for each regulon column. If None, integer indices are used.

Returns

dict ‘rss’ : np.ndarray of shape (n_cell_types, n_regulons) RSS values. Higher means more specific. ‘cell_types’ : list of str Sorted unique cell type labels (row labels of rss). ‘regulon_names’ : list of str Regulon names (column labels of rss).

Helpers#

flashscenic.regulons_to_adjacency(regulons: list[dict], gene_names: list[str]) → numpy.ndarray[source]#

Convert list of regulon dicts to adjacency matrix for AUCell.

Parameters

regulons : list of dict Output from CisTargetPruner.prune_modules(), each dict has ‘genes’ key gene_names : list of str List of gene names matching columns of expression matrix

Returns

np.ndarray Binary adjacency matrix of shape (n_regulons, n_genes)

API Reference

Contents

API Reference#

Pipeline#

Data Download#

AUCell#

cisTarget#

Module Utilities#

Analysis#

Helpers#