Expert Curation Workflow
The expert curation workflow gives you full manual control over cell-type assignments.
Overview
When to Use
- Publication-quality annotations required
- Complex tissue with unusual populations
- Marker panel has known limitations
- Domain expertise available
Step 1: Initial Clustering
Run clustering and annotation to see baseline results:
# Cluster
celltype-refinery cluster \
--input merged.h5ad \
--resolution 0.6 \
--out output/clustered
# Annotate (no refinement)
celltype-refinery annotate \
--input output/clustered/clustered.h5ad \
--marker-map markers.json \
--out output/annotated
Step 2: Review Results
Examine the annotation results:
import pandas as pd
import scanpy as sc
# Load data
adata = sc.read_h5ad("output/annotated/annotated.h5ad")
annotations = pd.read_csv("output/annotated/cluster_annotations.csv")
# Review annotations
print(annotations[["cluster_id", "assigned_label", "score", "stop_reason"]])
# Visualize marker expression
sc.pl.dotplot(adata, groupby="cluster_lvl0", var_names=["CD45", "EpCAM", "CD3"])
Step 3: Create Curation Config
Create a YAML configuration with your corrections:
# curation.yaml
version: "1.0"
iteration: 1
description: "Expert curation for my tissue"
# Direct label overrides
overrides:
- cluster_id: "3"
cell_type: "Ciliated_Epithelium"
reason: "Clear FOXJ1/TUBB4A expression"
- cluster_id: "7"
cell_type: "Regulatory_T_Cells"
reason: "CD3+FOXP3+ population"
# Merge over-split clusters
merge:
- source_clusters: ["5", "6"]
target_label: "Fibroblasts"
reason: "Same marker profile, split by batch"
# Subcluster heterogeneous populations
subcluster:
- cluster_id: "12"
resolution: 0.3
reason: "Mixed myeloid population"
# Relabel without re-clustering
relabel:
- from_cluster: "8"
to_label: "Smooth_Muscle_Cells"
reason: "Misnamed in auto-annotation"
Step 4: Apply Curation
Apply your configuration:
celltype-refinery refine \
--input output/annotated/annotated.h5ad \
--config curation.yaml \
--execute \
--out output/curated
Step 5: Iterate
Review results and repeat if needed:
# Load curated results
adata = sc.read_h5ad("output/curated/refined.h5ad")
# Check cell-type distribution
print(adata.obs["cell_type_curated"].value_counts())
# Verify specific corrections
cluster_3 = adata[adata.obs["cluster_lvl0"] == "3"]
print(f"Cluster 3 label: {cluster_3.obs['cell_type_curated'].iloc[0]}")
If further corrections needed, create curation_v2.yaml and apply:
celltype-refinery refine \
--input output/curated/refined.h5ad \
--config curation_v2.yaml \
--execute \
--out output/curated_v2
Operation Types
Override
Directly assign a cell type to a cluster:
overrides:
- cluster_id: "3"
cell_type: "My_Cell_Type"
reason: "Explanation"
Merge
Combine multiple clusters:
merge:
- source_clusters: ["5", "6", "7"]
target_label: "Combined_Type"
reason: "Explanation"
Subcluster
Re-cluster at finer resolution:
subcluster:
- cluster_id: "12"
resolution: 0.3
reason: "Explanation"
Relabel
Rename without re-clustering:
relabel:
- from_cluster: "8"
to_label: "New_Name"
reason: "Explanation"
Best Practices
- Document everything: Always include
reasonfields - Small iterations: Make a few changes, verify, repeat
- Version configs: Keep
curation.yaml,curation_v2.yaml, etc. - Preserve lineage: Use cluster IDs, not cell-type names
- Review DE genes: Check differential expression before overrides
Audit Trail
All operations are logged in curation_log.json:
{
"timestamp": "2025-01-15T10:30:00",
"operation": "override",
"cluster_id": "3",
"old_label": "Unknown",
"new_label": "Ciliated_Epithelium",
"reason": "Clear FOXJ1/TUBB4A expression",
"n_cells": 5234
}
Next Steps
- Hybrid Workflow - Combine with automation
- Refinement Operations - Detailed operation reference