Changelog
Version 0.0.4
Released 2026-05-14.
Scanpy-compatible ``groupby`` / ``reference`` parameter aliases –
t_test,wilcoxon_test,nb_glm_test, andcx.tl.rank_genes_groupsnow acceptgroupbyas an alias forperturbation_columnandreferenceas an alias forcontrol_label, matching the parameter names used by Scanpy’ssc.tl.rank_genes_groups. The original names remain the canonical names and are not deprecated. Passing both a canonical name and its alias raisesTypeError.Internal DRY refactor – four private helpers (
_resolve_de_aliases,_try_load_existing_de_result,_print_de_summary,_print_de_perturbation_verbose) consolidate previously triplicated boilerplate across the three DE functions. No behaviour change for existing callers.Verbose improvements – all three DE test functions accept
verbose: int | bool.verbose=1prints a per-run summary (perturbations completed, mean genes tested).verbose=2additionally prints per-perturbation gene-count lines.Decoupled per-condition pct thresholds –
min_pct_bothis complemented by independentmin_pct_ctrl(default0.01) andmin_pct_pert(default0.002) parameters across all three DE test functions (t_test,wilcoxon_test,nb_glm_test) and the internal_low_expr_in_both_maskhelper. The lowermin_pct_pertdefault prevents over-filtering genes induced from near-zero baseline (e.g. transcription-factor target genes). The oldmin_pct_bothkwarg is retained as a convenience alias that silently sets bothmin_pct_ctrlandmin_pct_pertto the same value.Dual-condition pert filter with enabled ``min_mean_pert`` – The perturbed-side filter now always applies a dual condition:
(pct_p < min_pct_pert) AND (mean_p < min_mean_pert). The defaultmin_mean_pertis raised from0.0(v0.0.3) to0.005so that genes with very few but high-count expressing cells (possible doublets or ambient RNA) are correctly excluded. Existing code can restore the v0.0.3 behaviour by passingmin_mean_pert=0.0.NaN initialisation for filtered-gene p-values (Wilcoxon) – The standard single-pass Wilcoxon path previously initialised the chunk p-value array with
np.ones(p=1.0) rather thannp.nan, causing filtered genes to appear as nominally non-significant rather than missing. The array is now initialised withnp.full(..., np.nan), consistent with the streaming path and witht_test/nb_glm_test.
Version 0.0.3
Released 2026-05-13.
Auto-reload for DE results –
wilcoxon_test,t_test, andnb_glm_testnow accept aforce: bool = Falseparameter. WhenFalse(default) and the expected output.h5adfile already exists on disk, the functions load and return the saved result instead of rerunning the analysis. Setforce=Trueto rerun unconditionally and overwrite the existing file. Combined withverbose=True, a notice is printed to stdout identifying the reloaded file path.Fixed ``RecursionError`` when pickling DE results –
AnnData.__getattr__now guards against access before__init__has run (e.g. duringpickle.load), eliminating infinite recursion.AnnDatagains__getstate__/__setstate__so only the file path and access mode are serialised; the HDF5 handle is reopened lazily after unpickling.RankGenesGroupsResultandDifferentialExpressionResultlikewise gain__getstate__/__setstate__that exclude theAnnDatahandle and group cache from the pickle payload, allowing round-trip serialisation withpickle.dumps/pickle.loads.Asymmetric low-expression filter – DE tests (t-test, Wilcoxon, NB-GLM) now accept a
min_mean_pertparameter (default0.0). With the default, the mean-expression check is applied only to the control group; the perturbed group is filtered on fraction-of-expressing-cells (min_pct_both) alone. This prevents the filter from discarding genes that are induced from near-zero baseline expression, which is common in unbalanced CRISPR-screen comparisons. To reproduce the v0.0.2 behaviour passmin_mean_pert=min_mean_ctrl(e.g.min_mean_pert=0.05).
Version 0.0.2
Released 2026-04-28.
Per-condition low-expression filter for DE tests – t-test, Wilcoxon, and NB-GLM now accept
min_pct_both(default0.01) andmin_mean_both(default0.05) parameters. A gene is excluded from a perturbation comparison (reported as NaN inpvalue/effect/logfoldchanges) when the fraction of expressing cells and the mean expression are both below the respective thresholds in both the perturbation and control groups. Setting both thresholds to0.0recovers the 0.0.1 behaviour exactly.ptsand mean expression values are always retained.
Version 0.0.1
Initial release.
Streaming QC and preprocessing (filter cells, perturbations, genes; normalize and log-transform without loading the full matrix)
Pseudo-bulk aggregation: average log expression and pseudo-bulk count matrices
Differential expression: t-test, Wilcoxon rank-sum, NB-GLM with apeGLM LFC shrinkage, multi-core support, and adaptive memory management
Dimension reduction: memory-efficient PCA and KNN graph construction on backed data
Scanpy-compatible API and plotting:
cx.pp,cx.pb,cx.tl,cx.plnamespaces; rank genes plots, volcano, MA, PCA, UMAP, QC summaries, and overlap heatmapsData preparation utilities: edit backed metadata, standardise gene names, normalise perturbation labels, auto-detect metadata columns
HPC support: resume/checkpoint for long-running jobs, configurable
memory_limit_gb, Docker and Singularity supportBenchmarking suite across 12 CRISPR screen datasets