Pipeline Guide#

run_flashscenic() orchestrates five stages. This guide explains each stage and how to tune its parameters.

Pipeline Overview#

Expression Matrix (n_cells x n_genes)
    │
    ▼
Step 1: GRN Inference (RegDiffusion)     ← grn_*
    │
    ▼
Step 2: TF Filtering                     ← tf_list_path, grn_sparsity_threshold
    │
    ▼
Step 3: Module Filtering                  ← module_*
    │
    ▼
Step 4: cisTarget Pruning                 ← pruning_*, annotation_*
    │
    ▼
Step 5: AUCell Scoring                    ← aucell_*
    │
    ▼
Result dict (auc_scores, regulons, ...)

Step 1: GRN Inference#

Uses RegDiffusion to infer a gene-gene adjacency matrix from the expression data.

Parameter	Default	Description
`grn_n_steps`	1000	Training iterations. More steps = better convergence.
`grn_sparsity_threshold`	1.5	Edges below this weight are zeroed. Higher = sparser.

Tuning tips:

Increase grn_n_steps to 2000+ for larger datasets
Raise grn_sparsity_threshold if you get too many TF modules; lower it if too few survive

Step 2: TF Filtering#

Loads a known transcription factor list and subsets the adjacency matrix to rows corresponding to TFs.

Parameter	Default	Description
`tf_list_path`	Auto-downloaded	Path to TF gene list (one per line)

Step 3: Module Filtering#

Generates multiple module types per TF (matching pySCENIC’s strategy) and filters out TFs with too few targets.

By default, the pipeline creates three kinds of modules for each TF:

Top-k: The top module_k targets by adjacency weight (e.g., top50)
Percentile: Targets above each percentile threshold in module_percentile_thresholds (e.g., pct75)
Top-N per target: The top N regulators per target gene, regrouped by TF (e.g., top5perTarget, top10perTarget, top50perTarget)

Parameter	Default	Description
`module_k`	50	Top k target genes per TF
`module_percentile_thresholds`	(75,)	Percentile cutoffs for percentile-based modules
`module_top_n_per_target`	(5, 10, 50)	N values for top-N-per-target modules
`module_min_targets`	20	Minimum absolute target count
`module_min_fraction`	None	Minimum fraction of targets required (pySCENIC 80% rule)
`module_include_tf`	True	Include TF in its own module

Tuning tips:

Increase module_k (e.g., 100) for broader modules
Lower module_min_targets if few TFs survive filtering
Set module_min_fraction=None to disable the fraction-based filter
Multiple module types increase the chance of detecting motif enrichment; set module_percentile_thresholds=() and module_top_n_per_target=() to use only top-k modules

Step 4: cisTarget Pruning#

Validates regulatory hypotheses against motif enrichment using ranking databases.

Parameter	Default	Description
`pruning_rank_threshold`	5000	Max rank for recovery curve
`pruning_auc_threshold`	0.05	Fraction of genome for AUC
`pruning_nes_threshold`	3.0	NES cutoff for enrichment
`pruning_min_genes`	0	Min genes per regulon
`pruning_merge_strategy`	‘union’	‘union’ or ‘best’ across databases
`annotation_motif_similarity_fdr`	0.001	FDR threshold for motif annotations
`annotation_orthologous_identity`	0.0	Min orthologous identity

Tuning tips:

Lower pruning_nes_threshold (e.g., 2.5) if too few regulons survive
Use pruning_merge_strategy='best' to keep only the highest-NES regulon per TF across databases

Step 5: AUCell Scoring#

Computes cell-state-specific regulatory activity scores.

Parameter	Default	Description
`aucell_k`	None (uses module_k)	Top k targets for scoring
`aucell_auc_threshold`	0.05	Fraction of genome for AUC
`aucell_batch_size`	32	Cells per batch (memory vs speed)

Tuning tips:

Increase aucell_batch_size (64, 128) if GPU memory allows for faster processing
aucell_k defaults to module_k but can be set independently

Custom Resource Files#

You can provide your own files instead of the auto-downloaded defaults:

result = fs.run_flashscenic(
    exp_matrix, gene_names,
    tf_list_path='my_custom_tfs.txt',
    ranking_db_paths=['my_db1.feather', 'my_db2.feather'],
    motif_annotation_path='my_annotations.tbl',
)

When all three path arguments are provided, no download occurs.

Pipeline Guide

Contents

Pipeline Guide#

Pipeline Overview#

Step 1: GRN Inference#

Step 2: TF Filtering#

Step 3: Module Filtering#

Step 4: cisTarget Pruning#

Step 5: AUCell Scoring#

Custom Resource Files#