API reference#

stampede#

STAMPede - STAMP data Exploration and Differential Expression

stampede.read_cosmx(slides, samples_df, adata_file, samples_df_columns=None, metadata_df_columns=None, data_dir=None, overwrite=True, verbose=True, **kwargs)#

Read exprMat_file for each slide, convert the contents to sparse anndata objects, and concatenate the results.

Parameters:
  • slides (dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective files

  • samples_df (DataFrame) – a dataframe with sample metadata to be added to adata.obs

  • adata_file (str) – filepath to write the adata object to

  • samples_df_columns (list) – list of columns in samples_df to add to adata.obs (default: all)

  • metadata_df_columns (list) – list of columns in the metadata file to add to adata.obs (default: all)

  • data_dir (str) – optional filepath prefix (default: “”)

  • overwrite (bool) – overwrite existing output (default: True)

  • verbose (bool) – provide written feedback (default: True)

  • **kwargs – keyword arguments passed to pd.read_csv

Return type:

str

Returns:

the value of the adata_file argument

stampede.validate_input(slides, samples_df, data_dir=None)#

Check the contents of the slides dictionary and samples_df for expected keys and columns, respectively.

Parameters:
  • slides (dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective files

  • samples_df (DataFrame) – a dataframe with sample metadata

  • data_dir (str) – optional filepath prefix (default: “”)

Return type:

None

Returns:

Nothing

stampede.pp#

preprocessing functions

stampede.pp.binarize(adata, verbose=True)#

Binarize the values in adata.X

Parameters:
  • adata (AnnData) – adata object

  • verbose (bool) – provide written feedback (default: True)

Return type:

None

Returns:

Nothing, updates adata.layers and adata.X

stampede.pp.cell_qc_postfilter(adata)#

Compute metadata after filtering

Parameters:

adata (AnnData) – an adata object

Return type:

None

Returns:

Nothing, updates adata.obs

stampede.pp.detection_rates(adata, samples_column, normalize=True)#

Calculate gene detection rates per sample in the samples_column of adata.obs.

Parameters:
  • adata (AnnData) – adata object

  • samples_column (str) – column in adata.obs

  • normalize (bool) – normalize detection rates for sample quality

Return type:

DataFrame

Returns:

a dataframe with normalized gene detection rates

stampede.pp.dim_red(adata, n_dims=50, key_added=None, random_state=42)#

Dimensionality reduction using Term Frequency Latent Semantic Indexing.

Parameters:
  • adata (AnnData) – adata object

  • n_dims (int) – number of PCs to use (default: 50)

  • key_added (str) – key in adata.obsm for function output (default: “X_svd”)

  • random_state (int) – random seed value

Return type:

None

Returns:

Nothing, updates adata.obsm and adata.uns

stampede.pp.filter_cells(adata, dist2edge_px_min=0, falsecode_max=5, negprobe_max=3, ntranscript_min=250, ntranscript_max=1500, area_min=25, area_max=100, filter_columns=None, verbose=True)#

Filter adata.obs by a set of qc_params.

Parameters:
  • adata (AnnData) – adata object

  • dist2edge_px_min (int)

  • falsecode_max (int) – maximum number of false codes the cell may have

  • negprobe_max (int) – maximum number of negative probes the cell may have

  • ntranscript_min (int) – minimum number of transcripts the cell must have

  • ntranscript_max (int) – maximum number of transcripts the cell must have

  • area_min (int) – minimum area (in pixels) the cell must have

  • area_max (int) – maximum area (in pixels) the cell must have

  • filter_columns (list) – a list of additional columns to filter by. Columns by (convertible to) boolean, where False values are removed.

  • verbose (bool) – provide written feedback (default: True)

Return type:

AnnData

Returns:

the filtered adata object

stampede.pp.filter_genes(adata, ncell_min=0, ncell_max=inf, ntranscript_min=0, ntranscript_max=inf, signal2noise_threshold=1.0, filter_columns=None, verbose=True)#

Filter adata.var by a set of qc_params.

Parameters:
  • adata (AnnData) – adata object

  • ncell_min (int) – minimum number of cells the gene is found in.

  • ncell_max (int) – maximum number of cells the gene is found in.

  • ntranscript_min (int) – minimum number of transcripts the gene must have.

  • ntranscript_max (int) – maximum number of transcripts the gene must have.

  • signal2noise_threshold (float) – the minimum signal-to-noise ratio the gene must have.

  • filter_columns (str | list) – a list of additional columns to filter by. Columns by (convertible to) boolean, where False values are removed.

  • verbose (bool) – provide written feedback (default: True)

Return type:

AnnData

Returns:

the filtered adata object

stampede.pp.gene_qc(adata, signal2noise_threshold=None, mult=1, overwrite=False)#

Add QC parameters to adata.var.

About the Signal-to-noise filter:

Approach from https://doi.org/10.1038/s41467-025-64990-y Wang et al. “Systematic benchmarking of imaging spatial transcriptomics platforms in FFPE tissues” Nat Com, 2025.

Calculate the mean expression and standard deviation of the negative control probes. Remove genes with average expression < mean + mult* x STD of ctrl probes.

*the paper used mult=2

Parameters:
  • adata (AnnData) – an adata object

  • signal2noise_threshold (float | Iterable) – manually specify the threshold. If None, use the filter specified above.

  • mult (int | float) – if signal2noise_threshold is None, mult is used in the signal2noise threshold computation specified above.

  • overwrite (bool) – overwrite existing qc columns (default: False)

Return type:

None

Returns:

Nothing, updates adata.var

stampede.pp.gene_qc_postfilter(adata)#

Compute metadata after filtering

Parameters:

adata (AnnData) – an adata object

Return type:

None

Returns:

Nothing, updates adata.var

stampede.pp.knn_count_smoothing(adata, layer_added=None, neighbors_use_rep=None, neighbors_key_added=None, neighbors_kwargs=None, verbose=True)#

For each cell, replace its gene vector with the average of its KNN neighborhood.

Runs sc.pp.neighbors if it has not run. See https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.neighbors.html

Parameters:
  • adata (AnnData) – adata object

  • layer_added (str) – key in adata.layers for function output (default: “KNN_binary_mean”)

  • neighbors_use_rep (str) – See sc.pp.neighbors for details

  • neighbors_key_added (str) – See sc.pp.neighbors for details

  • neighbors_kwargs (dict) – kwargs passed to sc.pp.neighbors

  • verbose (bool) – provide written feedback (default: True)

Return type:

None

Returns:

Nothing, updates adata.layers and adata.X

stampede.pp.pseudobulk(adata, samples_column, samples=None, cluster_column=None, cluster=None, layer=None)#

Generate a pseudobulk table (genes x samples) for all samples in the sample_column and the cluster in the cluster_column, if specified.

Parameters:
  • adata (AnnData) – adata object

  • samples_column (str) – column in adata.obs

  • samples (Iterable) – samples in the sample columns to use (default: all)

  • cluster_column (str) – column in adata.obs (only needed if cluster is specified)

  • cluster (str) – name of the cluster in cluster_column to aggregate to pseudobulk

  • layer (str) – layer to aggregate (default: “counts”)

Return type:

DataFrame

Returns:

a dataframe with summed layer values per sample

stampede.pp.slide_qc(adata, slides, data_dir=None)#

Use the fov_positions file to create a dataframe with metadata columns per slide and fov, and store this in adata.uns[“fov_metadata”]. Additional adds columns to adata.obs reflecting the distance from the cell to the camera’s FOV edge.

Parameters:
  • adata (AnnData) – adata object generated using the slides dict

  • slides (dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective files

  • data_dir (str) – optional filepath prefix (default: “”)

Return type:

None

Returns:

Nothing, updates adata.uns and adata.obs

stampede.pl#

plotting functions

stampede.pl.avg_per_pixel(adata, column, fill_cell_area=False, normalize_cell_area=True, log1p=False, cmap=None, background_color=None, figsize=(20, 15), subplot_kwargs=None, plot_kwargs=None)#

Plot the average values of the given column over all FOVs. Color’s the cell’s center pixel, unless fill_cell_area is set to True (slow).

Parameters:
Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.column_distribution(adata, column, axis=None, min_quantile=0.0, max_quantile=0.95, subplot_kwargs=None, plot_kwargs=None)#

Plot the distribution of values for a column present in either adata.obs or adata.var.

Parameters:
  • adata (AnnData) – an adata object.

  • column (str) – a column in either adata.obs or adata.var

  • axis (int) – specify if the column name is present in both obs (0) and var (1).

  • min_quantile (float) – lowest quantile of values to plot (default: 0.00)

  • max_quantile (float) – highest quantile of values to plot (default: 0.95)

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.correlations(adata, xcolumn, ycolumn, log1p_xcolumn=False, log1p_ycolumn=False, color_xcolumn=None, color_ycolumn=None, cmap_2d=None, bins_1d=50, bins_2d=None, stat=None, figsize=(8, 7), subplot_kwargs=None, plot_kwargs=None)#

Plot the distributions and 2D correlation between two columns in adata.obs.

Parameters:
Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.dim_red(adata, columns, obsm_key=None, cmap='tab10', n_dims=6, subset_size=1000, random_state=42)#

Scree plot

Parameters:
Return type:

list[tuple[Figure, Axes]]

Returns:

a list of tuples with matplotlib figure and axis

stampede.pl.ncell_per_condition(adata, columns, offset_between_conditions=1, palette=None, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#

Plot the number of cells per condition in a column in adata.obs.

Parameters:
Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.paired_binomial_glm_volcano(df, drop_perfect_separation=True, pval_thresh=0.05, or_thresh=0.75, to_label=5, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#

Generate a volcano plot from the detection_rates results dataframe.

Parameters:
  • df (DataFrame) – a dataframe

  • drop_perfect_separation (bool) – whether to drop the genes with perfect separations

  • pval_thresh (float) – threshold pvalue_column for genes to be significant

  • or_thresh (float) – threshold for the log2 odds ratios to be considered significant

  • to_label (int | list) – the number of top genes (down and up each) to be labeled

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

  • text_kwargs (dict) – kwargs passed to ax.text

Return type:

tuple[Figure, Axes]

Returns:

matplotlib figure and axis object

stampede.pl.pydeseq2_volcano(df, symbol_column='index', log2fc_column='log2FoldChange', pvalue_column='padj', baseMean_column='baseMean', pval_thresh=0.05, log2fc_thresh=0.75, to_label=5, colors=None, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#

Generate a volcano plot from a pyDESeq2 results dataframe.

Adapted from mousepixels/sanbomics

Parameters:
  • df (DataFrame) – a pyDESeq2 results dataframe

  • symbol_column (str) – column name of gene IDs to use

  • log2fc_column (str) – column name of log2 Fold-Change values

  • pvalue_column (str) – column name of the adjusted p values to be converted to -log10 p-values

  • baseMean_column (str) – column name of base mean values for each gene

  • pval_thresh (float) – threshold pvalue_column for points to be significant

  • log2fc_thresh (float) – threshold for the absolute value of the log2 fold change to be considered significant

  • to_label (int | list) – If an int is passed, that number of top down and up genes will be labeled. If a list of gene Ids is passed, only those will be labeled

  • colors (list) – order and colors to use

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

  • text_kwargs (dict) – kwargs passed to ax.text

Return type:

tuple[Figure, Axes]

Returns:

matplotlib figure and axis object

stampede.pl.scree(adata, obsm_key=None)#

Scree plot

Parameters:
  • adata (AnnData) – adata object

  • obsm_key (str) – key in adata.obsm with dim_red output (default: “X_svd”)

Return type:

tuple[Figure, Axes]

Returns:

matplotlib figure and array of axes

stampede.pl.sketch(adata, obs_column='subset', use_rep='X_svd', plot_kwargs=None)#

Scatterplot highlighting the cells that were sampled. Requires the full adata object.

Parameters:
  • adata (AnnData) – adata object

  • obs_column (str) – column in adata.obs with boolean values if the cell is kept

  • use_rep (str) – use the indicated representation

  • plot_kwargs (dict) – kwargs passed to the main plotting function

Return type:

tuple[Figure, Axes]

Returns:

matplotlib figure and array of axes

stampede.pl.slide_qc(adata, columns=None, figsize=None, subplot_kwargs=None, plot_kwargs=None)#

Plot the values from one or QC columns in adata.uns[“fov_metadata”] (added by slide_qc_data()). Specify columns to limit the number of plots.

Parameters:
  • adata (AnnData) – an adata object

  • columns (str | Iterable) – columns in adata.uns[“fov_metadata”] to plot (default: all)

  • figsize (tuple) – tuple of figure, will be multiplied by the number of plots

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.value_distribution(adata, layer=None, min_quantile=0.0, max_quantile=0.95, subplot_kwargs=None, plot_kwargs=None)#

Plot the number of occurrences of values in the dataset.

Parameters:
  • adata (AnnData) – an adata object.

  • layer (str) – the layer the values are drawn from (default: X)

  • min_quantile (float) – lowest quantile of values to plot (default: 0.00)

  • max_quantile (float) – highest quantile of values to plot (default: 0.95)

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.pl.violin(adata, columns, inner=None, fill=False, cut=0, log_scale=(False, True), subplot_kwargs=None, plot_kwargs=None)#

Violin plots for one or more columns in adata.obs.

Wraps seaborn’s violinplot. See https://seaborn.pydata.org/generated/seaborn.violinplot.html

Parameters:
  • adata (AnnData) – an adata object

  • columns (str | list) – one or more column in adata.obs

  • inner (str) – See sns.violinplot for more details.

  • fill (bool) – See sns.violinplot for more details.

  • cut (int) – See sns.violinplot for more details.

  • log_scale (tuple[bool, bool]) – See sns.violinplot for more details.

  • subplot_kwargs (dict) – kwargs passed to plt.subplots

  • plot_kwargs (dict) – kwargs passed to the main plotting function

Return type:

tuple[Figure, list[Axes]]

Returns:

matplotlib figure and array of axes

stampede.tl#

analysis tools

stampede.tl.paired_binomial_glm(df, adata, samples_column, test_condition, reference_condition, condition_column='condition', covariate_columns=None, random_state=42)#
Runs paired donor-level binomial GLM:

gene_detection_rate ~ condition + covariate(s)

Parameters:
  • df (DataFrame) – dataframe with detection rates per gene per sample

  • adata (AnnData) – the adata from which the detection rates were obtained

  • samples_column (str) – the column in adata.obs from which the detection rate df column names were obtained

  • test_condition (str) – the condition to compare (e.g., “treated”)

  • reference_condition (str) – the baseline condition (e.g., “control”)

  • condition_column (str) – column with the condition

  • covariate_columns (str) – column(s) with covariates (e.g. “donor”)

  • random_state (int) – random seed value

Return type:

DataFrame | None

Returns:

per-gene results including beta, odds_ratio, pval, padj

stampede.tl.pydeseq2(adata, design, contrast, inference=None, n_cpus=16, return_objects=False, dds_kwargs=None, ds_kwargs=None)#

Wrapper around pyDEseq2 for adata objects.

See https://pydeseq2.readthedocs.io/en/latest/auto_examples/plot_minimal_pydeseq2_pipeline.html

Parameters:
  • adata (AnnData) – adata object

  • design (str) – a formula in the format ‘x + z’ or ‘~x+z’. Each factor must be a column in adata.obs

  • contrast (list) – a list of three strings in the following format: [‘variable_of_interest’, ‘tested_level’, ‘ref_level’]

  • inference (Inference) – pyDESeq2 inference class instance

  • n_cpus (int) – number of threads to use

  • return_objects (bool) – return the DeseqDataSet, DeseqStats and the results_df. If False, only return the results_df

  • dds_kwargs (dict) – kwargs passed to DeseqDataSet

  • ds_kwargs (dict) – kwargs passed to DeseqStats

Return type:

tuple[DeseqDataSet, DeseqStats, DataFrame] | DataFrame

Returns:

pydeseq2 output

stampede.tl.sketch(adata, n=None, frac=0.05, use_rep='X_svd', obs_column='subset', random_seed=42, return_subset=False, **kwargs)#

Subset the cells in adata using GeoSketch.

Parameters:
  • adata (AnnData) – adata object

  • n (int) – the number of cells to keep. If None, frac will be used instead.

  • frac (float) – the fraction of cells to keep. Only used if n is None.

  • use_rep (str) – use the indicated representation.

  • obs_column (str) – add this column to adata.obs with boolean values if the cell is kept.

  • random_seed (int) – random seed passed to numpy.

  • return_subset (bool) – if True, return a subset adata object.

  • kwargs – kwargs passed to geosketch.gs.

Return type:

AnnData | None

Returns:

The subset anndata object (if specified)

Configuration#

stampede.config#

A dictionary with package specific settings that may be altered during runtime. Accessed using import stampede as st; st.config.

Keys may not be added or removed, but values may be changed.

Default config items:

{
    # columns found in the exprmat_file that represent metadata
    "exprmat_md_columns": ["fov", "cell_ID"],
    # columns found in the metadata_file that represents metadata
    "metadata_md_columns": ["fov", "cell_ID"],
    # columns found in the sample_file that represents metadata
    "sample_md_columns": ["sample", "slide", "fovs"],
    # directory to write (temporary) adata objects to
    "adata_dir": "adatas",
}