Skip to main content

Output Files Reference

This page describes all output files generated by Stage H (clustering and annotation) and Stage I (refinement).

Stage H Outputs

Stage H produces the initial clustering and hierarchical cell type annotation. All outputs are written to the specified output directory.

coarse_clusters.h5ad

The primary output AnnData file containing clustered and annotated cells.

Added .obs columns:

ColumnTypeDescription
cluster_lvl0stringLeiden cluster ID
cell_type_autostringAssigned cell type label
cell_type_auto_rootstringRoot category (Epithelium, Immune, etc.)
cell_type_auto_conffloatConfidence score (min margin along path)
cell_type_auto_pathstringFull hierarchy path
DAPI_intensityfloatTechnical marker intensity
Collagen_IV_intensityfloatTechnical marker intensity
Beta_actin_intensityfloatTechnical marker intensity

Added .uns keys:

KeyDescription
de_wilcoxon_{layer}Differential expression results per layer
stage_h_paramsRun parameters including resolution and thresholds
stage_h_gating_paramsGating thresholds used for marker positivity
stage_h_marker_mapCompiled marker map used for annotation
stage_h_timestampISO timestamp of when Stage H was run

cluster_annotations.csv

Per-cluster summary with assignment details. This file provides a human-readable overview of all cluster assignments.

ColumnTypeDescription
cluster_idstringCluster identifier
assigned_labelstringFinal cell type label
assigned_pathstringFull hierarchy path (e.g., "Immune Cells / Lymphoids / T Cells")
assigned_levelintHierarchy depth reached (0 = root)
assigned_scorefloatScore at assigned level
root_labelstringRoot category
confidencefloatMin margin along path
stop_reasonstringWhy descent stopped
n_cellsintNumber of cells in cluster
coveragefloatMarker coverage (0-1)
resolved_markersstringComma-separated list of markers used
is_ambiguous_rootboolTrue if root was ambiguous
compositionJSONPer-cell vote composition (for ambiguous clusters)

Stop reason values:

ValueMeaning
leaf_reachedReached a terminal node in hierarchy
no_children_passNo child types passed gating threshold
low_marginWinner margin below confidence threshold
insufficient_coverageToo few markers available at next level
max_depthReached configured maximum depth

marker_scores.csv

Full scoring matrix showing scores for all (cluster, cell_type) pairs. Useful for understanding why specific assignments were made.

ColumnTypeDescription
cluster_idstringCluster identifier
labelstringCell type label
pathstringHierarchy path
levelintHierarchy level
scorefloatTotal score
coveragefloatMarker coverage
mean_enrichmentfloatAverage Z-score
mean_positive_fractionfloatAverage positive fraction
frac_markers_onfloatFraction of markers above threshold
de_componentfloatDE bonus contribution
anti_penaltyfloatAnti-marker penalty
resolved_markersstringMarkers used
missing_markersstringMarkers not in panel
n_cellsintCluster size

decision_steps.csv

Step-by-step trace of hierarchical assignment decisions. Essential for debugging and understanding the decision process.

ColumnTypeDescription
cluster_idstringCluster identifier
step_idxintStep number (0 = root selection)
parent_labelstringParent node
child_labelstringCandidate being evaluated
child_passed_gateboolWhether candidate passed gating
child_scorefloatCandidate's score
child_coveragefloatCandidate's coverage
child_pos_fracfloatCandidate's positive fraction
child_enrichmentfloatCandidate's enrichment
child_anti_penaltyfloatCandidate's anti-penalty
selectedboolWhether this candidate was selected
margin_to_runner_upfloatScore gap to second place
fail_reasonstringWhy candidate failed (if applicable)

Fail reason values:

ValueMeaning
below_gateScore below gating threshold
low_coverageInsufficient marker coverage
anti_markersPenalized by anti-marker expression
outscoredAnother candidate had higher score

figures/

QC visualization plots for assessing annotation quality:

FileDescription
cluster_size_distribution.pngHistogram of cluster sizes
confidence_distribution.pngHistogram of confidence scores
depth_distribution.pngHistogram of hierarchy depths reached
stop_reason_summary.pngBar chart of stop reasons
root_composition.pngPie chart of root category distribution
marker_coverage_heatmap.pngHeatmap of marker coverage by cluster

Stage I Outputs

Stage I performs iterative refinement of the initial annotations, including subclustering of heterogeneous clusters and relabeling of misannotated ones.

refined.h5ad

Refined AnnData with updated annotations after one or more refinement iterations.

Added/modified .obs columns:

ColumnTypeDescription
cluster_lvl1stringRefined cluster ID (may include subclusters like "3:0", "3:1")
cell_type_lvl1stringRefined cell type label
cell_type_lvl0stringOriginal Stage H label (preserved)
refinement_actionstringAction taken (SUBCLUSTER, RELABEL, KEEP)
refinement_iterationintIteration when this cell was last modified

Added .uns keys:

KeyDescription
stage_refine_iteration_NProvenance for iteration N
stage_i_paramsRefinement parameters used
stage_i_summarySummary statistics of refinement

Provenance structure (stage_refine_iteration_N):

{
"timestamp": "2024-01-15T10:30:00",
"clusters_processed": 45,
"subclustered": ["3", "7", "12"],
"relabeled": {"5": "B Cells", "9": "Macrophages"},
"skipped": ["1", "2", "4"],
"parameters": {...}
}

diagnostic_report.csv

Recommendations from diagnostic mode. Generated when running Stage I with --diagnostic flag.

ColumnTypeDescription
cluster_idstringCluster identifier
current_labelstringCurrent assignment
recommendationstringSUBCLUSTER, RELABEL, or SKIP
reasonstringWhy this recommendation
criterionstringWhich AutoPolicy criterion matched
best_childstringBest child type (for RELABEL)
child_scorefloatBest child's score
heterogeneity_scorefloatCluster heterogeneity measure
n_cellsintNumber of cells in cluster
priorityintSuggested processing order

Recommendation values:

ValueMeaning
SUBCLUSTERCluster is heterogeneous, should be split
RELABELCluster can be assigned to a more specific type
SKIPNo refinement needed

refinement_log.csv

Detailed log of all refinement actions taken across iterations.

ColumnTypeDescription
iterationintRefinement iteration number
cluster_idstringOriginal cluster identifier
actionstringAction taken
old_labelstringLabel before refinement
new_labelstringLabel after refinement
new_clustersstringComma-separated list of new cluster IDs (for SUBCLUSTER)
confidence_beforefloatConfidence before action
confidence_afterfloatConfidence after action
timestampstringISO timestamp

File Formats

AnnData (.h5ad)

All .h5ad files use the standard AnnData HDF5 format:

  • Compatible with scanpy and anndata Python packages
  • Can be read with anndata.read_h5ad()
  • Stores expression matrices, cell/gene metadata, and analysis results

CSV Files

All CSV files use:

  • UTF-8 encoding
  • Comma delimiter
  • First row as header
  • Double-quote escaping for fields containing commas

JSON Fields

Some CSV columns contain JSON-encoded data:

  • Parse with standard JSON libraries
  • Always valid JSON (not Python dict syntax)

See Also