Preprocessing Overview
The preprocessing module handles data loading, quality control, normalization, alignment, batch correction, and merging.
Pipeline Position
Stages
| Stage | Purpose | Input | Output |
|---|---|---|---|
| Data Loading | Load and validate data | Raw matrices | Verified metadata |
| Cell QC | Remove low-quality cells | Cell matrices | Filtered matrices |
| Normalization | Within-sample normalization | Filtered data | Normalized data |
| Alignment | Cross-sample alignment | Normalized data | Aligned data |
| Batch Correction | Remove batch effects | Aligned data | Corrected data |
| Data Merging | Merge into AnnData | All stages | merged.h5ad |
CLI Usage
celltype-refinery preprocess \
--input data/ \
--config preprocess.yaml \
--out output/preprocessed
Python API
from celltype_refinery.core.preprocessing import PreprocessingPipeline
pipeline = PreprocessingPipeline(config_path="preprocess.yaml")
adata = pipeline.run(input_dir="data/", output_dir="output/")