Roadmap

Published

April 21, 2026

Where the project is pointed.

Near-term — close the expression-objective story

Re-run the visualisation pipeline on the latent \(\boldsymbol{\alpha}\) to produce the cell-type DE distance boxplot for the latent objective — the mirror-image plot needed to finish post 5.
Diagnose feature recruitment. The expression objective recruits ~5× more features (77 vs 16). Is that real dimensionality, or weak-feature overshoot? Current hypothesis: the L1 ratio correction over-recruits for small-effect DE genes. Plan: (a) ablate the top-\(n\) DEG cap, (b) check whether the extra features are individually above or below their own activation noise floor, (c) compare alpha sparsity across \(\lambda\).
Scale the SAE. Move from the current x8 (k=32) to a 16× overcomplete, k=32 SAE to reduce feature entanglement. Memory note: entanglement may only affect the long tail of weak DEGs — core CD4/CD8 identity features already look clean.

Medium-term — beyond PBMC3K

Multi-tissue atlas. PBMC is clean but homogeneous. Train SAEs on a larger, more heterogeneous atlas to see whether features generalise, or whether we get tissue-specific dictionaries that require per-tissue SAEs.
Multi-layer circuits. Right now everything happens on layer 12. A proper mechanistic story needs to trace features across layers: what does layer-9 activity feed into layer-12 features, and how is layer-15 driving the head? (Think transformer-circuits style work.)

Longer-term — interpretable perturbation prediction

The end goal, and the concrete template from the paper: MLP adapters on perturbation datasets. Train a small adapter that maps a target cell state (e.g., an observed post-perturbation transcriptome from Perturb-seq) to a sparse feature-steering vector \(\boldsymbol{\alpha}\) that, when injected into AIDO.Cell, reconstructs that state. If it works, you get interpretable perturbation prediction: each prediction comes with the feature recipe that produced it, and the recipe is inspectable as GO programs.

Given a desired phenotypic change (e.g. “make this cell more cytotoxic”), the adapter gives you a sparse feature recipe; the features’ GO annotations tell you what biology was recruited; and the recipe can in principle be back- translated to gene-level perturbations. Two things have to hold for this to work:

Feature-level steering must give coherent expression outputs — this post 5 is investigating.
The feature dictionary must contain causally-useful axes. The regulatory-logic probe suggests the scFM encodes states, not causal regulation — so the “perturbation recipe” is likely at the level of programs (antiviral, CD8-effector) rather than individual TFs. That’s still useful, and arguably the more tractable target.

Open questions

Does feature identity survive across SAE runs? Same data, different seeds — do we recover the same programs, or just different bases for the same subspace? (Planned: dictionary-level similarity across retrains.)
What’s the right granularity for steering? Single feature → state. Sparse combination → cell type. But is there a middle tier for sub-programs, or is it continuous?
Can we validate steering with a real perturbation dataset? e.g. CRISPR-perturb-seq: predict DEGs from a gene knockdown by finding the feature recipe that best matches the post-knockdown expression, and check whether the recipe’s constituent features overlap with the knocked-down gene’s known programs.