Skip to main content

Preprocessing Overview

The preprocessing module handles data loading, quality control, normalization, alignment, batch correction, and merging.

Pipeline Position

Stages

StagePurposeInputOutput
Data LoadingLoad and validate dataRaw matricesVerified metadata
Cell QCRemove low-quality cellsCell matricesFiltered matrices
NormalizationWithin-sample normalizationFiltered dataNormalized data
AlignmentCross-sample alignmentNormalized dataAligned data
Batch CorrectionRemove batch effectsAligned dataCorrected data
Data MergingMerge into AnnDataAll stagesmerged.h5ad

CLI Usage

celltype-refinery preprocess \
--input data/ \
--config preprocess.yaml \
--out output/preprocessed

Python API

from celltype_refinery.core.preprocessing import PreprocessingPipeline

pipeline = PreprocessingPipeline(config_path="preprocess.yaml")
adata = pipeline.run(input_dir="data/", output_dir="output/")