Iterative Refinement
Stage I supports iterative refinement chains until convergence.
Refinement Chain
H → I (iteration 1) → I (iteration 2) → I (iteration 3) → ...
Each iteration:
- Uses previous output's
cluster_lvl1as newcluster_lvl0 - Selects new candidates based on updated scores
- Further refines clusters that still need improvement
- Tracks full lineage in provenance
Convergence
Typically converges in 2-3 iterations when no more candidates meet criteria:
- All clusters above score threshold
- No heterogeneous parent signals remain
- All clusters below min_cells threshold
Workflow Example
Iteration 1: Initial Refinement
celltype-refinery refine \
--input stage_h/coarse_clusters.h5ad \
--auto --execute \
--out stage_i_v1
Iteration 2: Further Refinement
celltype-refinery refine \
--input stage_i_v1/refined.h5ad \
--auto --execute \
--out stage_i_v2
Iteration 3: Targeted Manual Fixes
celltype-refinery refine \
--input stage_i_v2/refined.h5ad \
--config manual_fixes.yaml --execute \
--out stage_i_final
Cluster ID Evolution
Subclustering creates hierarchical IDs:
Original cluster "3" subclustered:
→ "3:0", "3:1", "3:2"
Further subclustering "3:1":
→ "3:1:0", "3:1:1"
Provenance Tracking
Each iteration stores provenance in adata.uns["stage_refine_iteration_N"]:
{
"parent_stage": "I",
"parent_file": "stage_i_v1/refined.h5ad",
"iteration": 2,
"plan_summary": {"subcluster": 5, "relabel": 3},
"clusters_modified": ["3", "7", "12"],
"subclusters_created": ["3:0", "3:1", "7:0", "7:1"],
"n_cells_modified": 15420,
"timestamp": "2025-01-12T14:30:00"
}
Best Practices
- Start with diagnostic mode to review recommendations
- Use focus controls for targeted refinement per cell type category
- Save each iteration for reproducibility
- Monitor cluster counts - should stabilize by iteration 3
- Check provenance to understand refinement history