API reference#
stampede#
STAMPede - STAMP data Exploration and Differential Expression
- stampede.read_cosmx(slides, samples_df, adata_file, samples_df_columns=None, metadata_df_columns=None, data_dir=None, overwrite=True, verbose=True, **kwargs)#
Read exprMat_file for each slide, convert the contents to sparse anndata objects, and concatenate the results.
- Parameters:
slides (
dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective filessamples_df (
DataFrame) – a dataframe with sample metadata to be added to adata.obsadata_file (
str) – filepath to write the adata object tosamples_df_columns (
list) – list of columns in samples_df to add to adata.obs (default: all)metadata_df_columns (
list) – list of columns in the metadata file to add to adata.obs (default: all)data_dir (
str) – optional filepath prefix (default: “”)overwrite (
bool) – overwrite existing output (default: True)verbose (
bool) – provide written feedback (default: True)**kwargs – keyword arguments passed to pd.read_csv
- Return type:
- Returns:
the value of the adata_file argument
- stampede.validate_input(slides, samples_df, data_dir=None)#
Check the contents of the slides dictionary and samples_df for expected keys and columns, respectively.
- Parameters:
slides (
dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective filessamples_df (
DataFrame) – a dataframe with sample metadatadata_dir (
str) – optional filepath prefix (default: “”)
- Return type:
- Returns:
Nothing
stampede.pp#
preprocessing functions
- stampede.pp.binarize(adata, verbose=True)#
Binarize the values in adata.X
- stampede.pp.cell_qc_postfilter(adata)#
Compute metadata after filtering
- stampede.pp.detection_rates(adata, samples_column, normalize=True)#
Calculate gene detection rates per sample in the samples_column of adata.obs.
- stampede.pp.dim_red(adata, n_dims=50, key_added=None, random_state=42)#
Dimensionality reduction using Term Frequency Latent Semantic Indexing.
- stampede.pp.filter_cells(adata, dist2edge_px_min=0, falsecode_max=5, negprobe_max=3, ntranscript_min=250, ntranscript_max=1500, area_min=25, area_max=100, filter_columns=None, verbose=True)#
Filter adata.obs by a set of qc_params.
- Parameters:
adata (
AnnData) – adata objectdist2edge_px_min (
int)falsecode_max (
int) – maximum number of false codes the cell may havenegprobe_max (
int) – maximum number of negative probes the cell may haventranscript_min (
int) – minimum number of transcripts the cell must haventranscript_max (
int) – maximum number of transcripts the cell must havearea_min (
int) – minimum area (in pixels) the cell must havearea_max (
int) – maximum area (in pixels) the cell must havefilter_columns (
list) – a list of additional columns to filter by. Columns by (convertible to) boolean, where False values are removed.verbose (
bool) – provide written feedback (default: True)
- Return type:
- Returns:
the filtered adata object
- stampede.pp.filter_genes(adata, ncell_min=0, ncell_max=inf, ntranscript_min=0, ntranscript_max=inf, signal2noise_threshold=1.0, filter_columns=None, verbose=True)#
Filter adata.var by a set of qc_params.
- Parameters:
adata (
AnnData) – adata objectncell_min (
int) – minimum number of cells the gene is found in.ncell_max (
int) – maximum number of cells the gene is found in.ntranscript_min (
int) – minimum number of transcripts the gene must have.ntranscript_max (
int) – maximum number of transcripts the gene must have.signal2noise_threshold (
float) – the minimum signal-to-noise ratio the gene must have.filter_columns (
str|list) – a list of additional columns to filter by. Columns by (convertible to) boolean, where False values are removed.verbose (
bool) – provide written feedback (default: True)
- Return type:
- Returns:
the filtered adata object
- stampede.pp.gene_qc(adata, signal2noise_threshold=None, mult=1, overwrite=False)#
Add QC parameters to adata.var.
- About the Signal-to-noise filter:
Approach from https://doi.org/10.1038/s41467-025-64990-y Wang et al. “Systematic benchmarking of imaging spatial transcriptomics platforms in FFPE tissues” Nat Com, 2025.
Calculate the mean expression and standard deviation of the negative control probes. Remove genes with average expression < mean + mult* x STD of ctrl probes.
*the paper used mult=2
- Parameters:
adata (
AnnData) – an adata objectsignal2noise_threshold (
float|Iterable) – manually specify the threshold. If None, use the filter specified above.mult (
int|float) – if signal2noise_threshold is None, mult is used in the signal2noise threshold computation specified above.overwrite (
bool) – overwrite existing qc columns (default: False)
- Return type:
- Returns:
Nothing, updates adata.var
- stampede.pp.gene_qc_postfilter(adata)#
Compute metadata after filtering
- stampede.pp.knn_count_smoothing(adata, layer_added=None, neighbors_use_rep=None, neighbors_key_added=None, neighbors_kwargs=None, verbose=True)#
For each cell, replace its gene vector with the average of its KNN neighborhood.
Runs sc.pp.neighbors if it has not run. See https://scanpy.readthedocs.io/en/stable/api/generated/scanpy.pp.neighbors.html
- Parameters:
adata (
AnnData) – adata objectlayer_added (
str) – key in adata.layers for function output (default: “KNN_binary_mean”)neighbors_use_rep (
str) – See sc.pp.neighbors for detailsneighbors_key_added (
str) – See sc.pp.neighbors for detailsneighbors_kwargs (
dict) – kwargs passed to sc.pp.neighborsverbose (
bool) – provide written feedback (default: True)
- Return type:
- Returns:
Nothing, updates adata.layers and adata.X
- stampede.pp.pseudobulk(adata, samples_column, samples=None, cluster_column=None, cluster=None, layer=None)#
Generate a pseudobulk table (genes x samples) for all samples in the sample_column and the cluster in the cluster_column, if specified.
- Parameters:
adata (
AnnData) – adata objectsamples_column (
str) – column in adata.obssamples (
Iterable) – samples in the sample columns to use (default: all)cluster_column (
str) – column in adata.obs (only needed if cluster is specified)cluster (
str) – name of the cluster in cluster_column to aggregate to pseudobulklayer (
str) – layer to aggregate (default: “counts”)
- Return type:
- Returns:
a dataframe with summed layer values per sample
- stampede.pp.slide_qc(adata, slides, data_dir=None)#
Use the fov_positions file to create a dataframe with metadata columns per slide and fov, and store this in adata.uns[“fov_metadata”]. Additional adds columns to adata.obs reflecting the distance from the cell to the camera’s FOV edge.
- Parameters:
adata (
AnnData) – adata object generated using the slides dictslides (
dict) – a dictionary with the slide number as keys, and a dictionary as values. The value dict must contain keys “exprmat” and “metadata”, with should map to matching respective filesdata_dir (
str) – optional filepath prefix (default: “”)
- Return type:
- Returns:
Nothing, updates adata.uns and adata.obs
stampede.pl#
plotting functions
- stampede.pl.avg_per_pixel(adata, column, fill_cell_area=False, normalize_cell_area=True, log1p=False, cmap=None, background_color=None, figsize=(20, 15), subplot_kwargs=None, plot_kwargs=None)#
Plot the average values of the given column over all FOVs. Color’s the cell’s center pixel, unless fill_cell_area is set to True (slow).
- Parameters:
adata (
AnnData) – an adata objectcolumn (
str) – a column in adata.obs with numeric valuesfill_cell_area (
bool) – distribute the column value over all pixels covered by the cell, assuming square cells (default: False)normalize_cell_area (
bool) – if fill_cell_area is True, normalize the column value over the cell area (default: True)log1p (
bool) – normalize the final values per pixel?cmap (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – colormap (default: “gist_rainbow”)background_color (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – color for pixels with 0 values (default: “black”)figsize (
tuple) – figure sizesubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.column_distribution(adata, column, axis=None, min_quantile=0.0, max_quantile=0.95, subplot_kwargs=None, plot_kwargs=None)#
Plot the distribution of values for a column present in either adata.obs or adata.var.
- Parameters:
adata (
AnnData) – an adata object.column (
str) – a column in either adata.obs or adata.varaxis (
int) – specify if the column name is present in both obs (0) and var (1).min_quantile (
float) – lowest quantile of values to plot (default: 0.00)max_quantile (
float) – highest quantile of values to plot (default: 0.95)subplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.correlations(adata, xcolumn, ycolumn, log1p_xcolumn=False, log1p_ycolumn=False, color_xcolumn=None, color_ycolumn=None, cmap_2d=None, bins_1d=50, bins_2d=None, stat=None, figsize=(8, 7), subplot_kwargs=None, plot_kwargs=None)#
Plot the distributions and 2D correlation between two columns in adata.obs.
- Parameters:
adata (
AnnData) – an adata objectxcolumn (
str) – columns in adata.obs to plot on the x-axisycolumn (
str) – columns in adata.obs to plot on the y-axislog1p_xcolumn (
bool) – normalize the xcolumn? (default: False)log1p_ycolumn (
bool) – normalize the ycolumn? (default: False)color_xcolumn (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – color of the xcolumn plotcolor_ycolumn (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – color of the ycolumn plotcmap_2d (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – colormap of the 2d correlation plot (default: “Blues”)bins_1d (
str|int) – number of bins on the 1-dimensional histogram plotsbins_2d (
str|int) – number of bins on the 2-dimensional histogram plotstat (
str) – which statistic to plot, see sns.histplot for more details (default: “percent”)figsize (
tuple) – figure sizesubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.dim_red(adata, columns, obsm_key=None, cmap='tab10', n_dims=6, subset_size=1000, random_state=42)#
Scree plot
- Parameters:
adata (
AnnData) – adata objectcolumns (
str|Iterable) – one or more columns in adata.obs to plot. One multiplot per column.obsm_key (
str) – key in adata.obsm with dim_red output (default: “X_svd”)cmap (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – colormapn_dims (
int) – number of PCs to use (default: 50)subset_size (
int) – subsample the data to this number (per column)random_state (
int) – random seed value
- Return type:
- Returns:
a list of tuples with matplotlib figure and axis
- stampede.pl.ncell_per_condition(adata, columns, offset_between_conditions=1, palette=None, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#
Plot the number of cells per condition in a column in adata.obs.
- Parameters:
adata (
AnnData) – an adata objectcolumns (
str|list) – one or more columns in adata.obs to visualize, in order of significanceoffset_between_conditions (
int|list) – distance between different conditions Can be a single value, or a list of offset values for each column (length=len(columns)-1)palette (
tuple[float,float,float] |str|tuple[float,float,float,float] |tuple[tuple[float,float,float] |str,float] |tuple[tuple[float,float,float,float],float]) – color palette (default: “terrain”)subplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting functiontext_kwargs (
dict) – kwargs passed to ax.set_xticks and ax.set_yticks
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.paired_binomial_glm_volcano(df, drop_perfect_separation=True, pval_thresh=0.05, or_thresh=0.75, to_label=5, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#
Generate a volcano plot from the detection_rates results dataframe.
- Parameters:
df (
DataFrame) – a dataframedrop_perfect_separation (
bool) – whether to drop the genes with perfect separationspval_thresh (
float) – threshold pvalue_column for genes to be significantor_thresh (
float) – threshold for the log2 odds ratios to be considered significantto_label (
int|list) – the number of top genes (down and up each) to be labeledsubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting functiontext_kwargs (
dict) – kwargs passed to ax.text
- Return type:
- Returns:
matplotlib figure and axis object
- stampede.pl.pydeseq2_volcano(df, symbol_column='index', log2fc_column='log2FoldChange', pvalue_column='padj', baseMean_column='baseMean', pval_thresh=0.05, log2fc_thresh=0.75, to_label=5, colors=None, subplot_kwargs=None, plot_kwargs=None, text_kwargs=None)#
Generate a volcano plot from a pyDESeq2 results dataframe.
Adapted from mousepixels/sanbomics
- Parameters:
df (
DataFrame) – a pyDESeq2 results dataframesymbol_column (
str) – column name of gene IDs to uselog2fc_column (
str) – column name of log2 Fold-Change valuespvalue_column (
str) – column name of the adjusted p values to be converted to -log10 p-valuesbaseMean_column (
str) – column name of base mean values for each genepval_thresh (
float) – threshold pvalue_column for points to be significantlog2fc_thresh (
float) – threshold for the absolute value of the log2 fold change to be considered significantto_label (
int|list) – If an int is passed, that number of top down and up genes will be labeled. If a list of gene Ids is passed, only those will be labeledcolors (
list) – order and colors to usesubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting functiontext_kwargs (
dict) – kwargs passed to ax.text
- Return type:
- Returns:
matplotlib figure and axis object
- stampede.pl.scree(adata, obsm_key=None)#
Scree plot
- stampede.pl.sketch(adata, obs_column='subset', use_rep='X_svd', plot_kwargs=None)#
Scatterplot highlighting the cells that were sampled. Requires the full adata object.
- Parameters:
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.slide_qc(adata, columns=None, figsize=None, subplot_kwargs=None, plot_kwargs=None)#
Plot the values from one or QC columns in adata.uns[“fov_metadata”] (added by slide_qc_data()). Specify columns to limit the number of plots.
- Parameters:
adata (
AnnData) – an adata objectcolumns (
str|Iterable) – columns in adata.uns[“fov_metadata”] to plot (default: all)figsize (
tuple) – tuple of figure, will be multiplied by the number of plotssubplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.value_distribution(adata, layer=None, min_quantile=0.0, max_quantile=0.95, subplot_kwargs=None, plot_kwargs=None)#
Plot the number of occurrences of values in the dataset.
- Parameters:
adata (
AnnData) – an adata object.layer (
str) – the layer the values are drawn from (default: X)min_quantile (
float) – lowest quantile of values to plot (default: 0.00)max_quantile (
float) – highest quantile of values to plot (default: 0.95)subplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
- stampede.pl.violin(adata, columns, inner=None, fill=False, cut=0, log_scale=(False, True), subplot_kwargs=None, plot_kwargs=None)#
Violin plots for one or more columns in adata.obs.
Wraps seaborn’s violinplot. See https://seaborn.pydata.org/generated/seaborn.violinplot.html
- Parameters:
adata (
AnnData) – an adata objectinner (
str) – See sns.violinplot for more details.fill (
bool) – See sns.violinplot for more details.cut (
int) – See sns.violinplot for more details.log_scale (
tuple[bool,bool]) – See sns.violinplot for more details.subplot_kwargs (
dict) – kwargs passed to plt.subplotsplot_kwargs (
dict) – kwargs passed to the main plotting function
- Return type:
- Returns:
matplotlib figure and array of axes
stampede.tl#
analysis tools
- stampede.tl.paired_binomial_glm(df, adata, samples_column, test_condition, reference_condition, condition_column='condition', covariate_columns=None, random_state=42)#
- Runs paired donor-level binomial GLM:
gene_detection_rate ~ condition + covariate(s)
- Parameters:
df (
DataFrame) – dataframe with detection rates per gene per sampleadata (
AnnData) – the adata from which the detection rates were obtainedsamples_column (
str) – the column in adata.obs from which the detection rate df column names were obtainedtest_condition (
str) – the condition to compare (e.g., “treated”)reference_condition (
str) – the baseline condition (e.g., “control”)condition_column (
str) – column with the conditioncovariate_columns (
str) – column(s) with covariates (e.g. “donor”)random_state (
int) – random seed value
- Return type:
- Returns:
per-gene results including beta, odds_ratio, pval, padj
- stampede.tl.pydeseq2(adata, design, contrast, inference=None, n_cpus=16, return_objects=False, dds_kwargs=None, ds_kwargs=None)#
Wrapper around pyDEseq2 for adata objects.
See https://pydeseq2.readthedocs.io/en/latest/auto_examples/plot_minimal_pydeseq2_pipeline.html
- Parameters:
adata (
AnnData) – adata objectdesign (
str) – a formula in the format ‘x + z’ or ‘~x+z’. Each factor must be a column in adata.obscontrast (
list) – a list of three strings in the following format: [‘variable_of_interest’, ‘tested_level’, ‘ref_level’]inference (
Inference) – pyDESeq2 inference class instancen_cpus (
int) – number of threads to usereturn_objects (
bool) – return the DeseqDataSet, DeseqStats and the results_df. If False, only return the results_dfdds_kwargs (
dict) – kwargs passed to DeseqDataSetds_kwargs (
dict) – kwargs passed to DeseqStats
- Return type:
- Returns:
pydeseq2 output
- stampede.tl.sketch(adata, n=None, frac=0.05, use_rep='X_svd', obs_column='subset', random_seed=42, return_subset=False, **kwargs)#
Subset the cells in adata using GeoSketch.
- Parameters:
adata (
AnnData) – adata objectn (
int) – the number of cells to keep. If None, frac will be used instead.frac (
float) – the fraction of cells to keep. Only used if n is None.use_rep (
str) – use the indicated representation.obs_column (
str) – add this column to adata.obs with boolean values if the cell is kept.random_seed (
int) – random seed passed to numpy.return_subset (
bool) – if True, return a subset adata object.kwargs – kwargs passed to geosketch.gs.
- Return type:
- Returns:
The subset anndata object (if specified)
Configuration#
- stampede.config#
A dictionary with package specific settings that may be altered during runtime. Accessed using import stampede as st; st.config.
Keys may not be added or removed, but values may be changed.
Default config items:
{
# columns found in the exprmat_file that represent metadata
"exprmat_md_columns": ["fov", "cell_ID"],
# columns found in the metadata_file that represents metadata
"metadata_md_columns": ["fov", "cell_ID"],
# columns found in the sample_file that represents metadata
"sample_md_columns": ["sample", "slide", "fovs"],
# directory to write (temporary) adata objects to
"adata_dir": "adatas",
}