Pipeline Guide#
run_flashscenic() orchestrates five stages. This guide explains each stage and how to tune its parameters.
Pipeline Overview#
Expression Matrix (n_cells x n_genes)
│
▼
Step 1: GRN Inference (RegDiffusion) ← grn_*
│
▼
Step 2: TF Filtering ← tf_list_path, grn_sparsity_threshold
│
▼
Step 3: Module Filtering ← module_*
│
▼
Step 4: cisTarget Pruning ← pruning_*, annotation_*
│
▼
Step 5: AUCell Scoring ← aucell_*
│
▼
Result dict (auc_scores, regulons, ...)
Step 1: GRN Inference#
Uses RegDiffusion to infer a gene-gene adjacency matrix from the expression data.
Parameter |
Default |
Description |
|---|---|---|
|
1000 |
Training iterations. More steps = better convergence. |
|
1.5 |
Edges below this weight are zeroed. Higher = sparser. |
Tuning tips:
Increase
grn_n_stepsto 2000+ for larger datasetsRaise
grn_sparsity_thresholdif you get too many TF modules; lower it if too few survive
Step 2: TF Filtering#
Loads a known transcription factor list and subsets the adjacency matrix to rows corresponding to TFs.
Parameter |
Default |
Description |
|---|---|---|
|
Auto-downloaded |
Path to TF gene list (one per line) |
Step 3: Module Filtering#
Generates multiple module types per TF (matching pySCENIC’s strategy) and filters out TFs with too few targets.
By default, the pipeline creates three kinds of modules for each TF:
Top-k: The top
module_ktargets by adjacency weight (e.g., top50)Percentile: Targets above each percentile threshold in
module_percentile_thresholds(e.g., pct75)Top-N per target: The top N regulators per target gene, regrouped by TF (e.g., top5perTarget, top10perTarget, top50perTarget)
Parameter |
Default |
Description |
|---|---|---|
|
50 |
Top k target genes per TF |
|
(75,) |
Percentile cutoffs for percentile-based modules |
|
(5, 10, 50) |
N values for top-N-per-target modules |
|
20 |
Minimum absolute target count |
|
None |
Minimum fraction of targets required (pySCENIC 80% rule) |
|
True |
Include TF in its own module |
Tuning tips:
Increase
module_k(e.g., 100) for broader modulesLower
module_min_targetsif few TFs survive filteringSet
module_min_fraction=Noneto disable the fraction-based filterMultiple module types increase the chance of detecting motif enrichment; set
module_percentile_thresholds=()andmodule_top_n_per_target=()to use only top-k modules
Step 4: cisTarget Pruning#
Validates regulatory hypotheses against motif enrichment using ranking databases.
Parameter |
Default |
Description |
|---|---|---|
|
5000 |
Max rank for recovery curve |
|
0.05 |
Fraction of genome for AUC |
|
3.0 |
NES cutoff for enrichment |
|
0 |
Min genes per regulon |
|
‘union’ |
‘union’ or ‘best’ across databases |
|
0.001 |
FDR threshold for motif annotations |
|
0.0 |
Min orthologous identity |
Tuning tips:
Lower
pruning_nes_threshold(e.g., 2.5) if too few regulons surviveUse
pruning_merge_strategy='best'to keep only the highest-NES regulon per TF across databases
Step 5: AUCell Scoring#
Computes cell-state-specific regulatory activity scores.
Parameter |
Default |
Description |
|---|---|---|
|
None (uses module_k) |
Top k targets for scoring |
|
0.05 |
Fraction of genome for AUC |
|
32 |
Cells per batch (memory vs speed) |
Tuning tips:
Increase
aucell_batch_size(64, 128) if GPU memory allows for faster processingaucell_kdefaults tomodule_kbut can be set independently
Custom Resource Files#
You can provide your own files instead of the auto-downloaded defaults:
result = fs.run_flashscenic(
exp_matrix, gene_names,
tf_list_path='my_custom_tfs.txt',
ranking_db_paths=['my_db1.feather', 'my_db2.feather'],
motif_annotation_path='my_annotations.tbl',
)
When all three path arguments are provided, no download occurs.