Roadmap
Where the project is pointed.
Near-term — close the expression-objective story
- Re-run the visualisation pipeline on the latent \(\boldsymbol{\alpha}\) to produce the cell-type DE distance boxplot for the latent objective — the mirror-image plot needed to finish post 5.
- Diagnose feature recruitment. The expression objective recruits ~5× more features (77 vs 16). Is that real dimensionality, or weak-feature overshoot? Current hypothesis: the L1 ratio correction over-recruits for small-effect DE genes. Plan: (a) ablate the top-\(n\) DEG cap, (b) check whether the extra features are individually above or below their own activation noise floor, (c) compare alpha sparsity across \(\lambda\).
- Scale the SAE. Move from the current x8 (k=32) to a 16× overcomplete, k=32 SAE to reduce feature entanglement. Memory note: entanglement may only affect the long tail of weak DEGs — core CD4/CD8 identity features already look clean.
Medium-term — beyond PBMC3K
- Multi-tissue atlas. PBMC is clean but homogeneous. Train SAEs on a larger, more heterogeneous atlas to see whether features generalise, or whether we get tissue-specific dictionaries that require per-tissue SAEs.
- Multi-layer circuits. Right now everything happens on layer 12. A proper mechanistic story needs to trace features across layers: what does layer-9 activity feed into layer-12 features, and how is layer-15 driving the head? (Think transformer-circuits style work.)
Longer-term — interpretable perturbation prediction
The end goal, and the concrete template from the paper: MLP adapters on perturbation datasets. Train a small adapter that maps a target cell state (e.g., an observed post-perturbation transcriptome from Perturb-seq) to a sparse feature-steering vector \(\boldsymbol{\alpha}\) that, when injected into AIDO.Cell, reconstructs that state. If it works, you get interpretable perturbation prediction: each prediction comes with the feature recipe that produced it, and the recipe is inspectable as GO programs.
Given a desired phenotypic change (e.g. “make this cell more cytotoxic”), the adapter gives you a sparse feature recipe; the features’ GO annotations tell you what biology was recruited; and the recipe can in principle be back- translated to gene-level perturbations. Two things have to hold for this to work:
- Feature-level steering must give coherent expression outputs — this post 5 is investigating.
- The feature dictionary must contain causally-useful axes. The regulatory-logic probe suggests the scFM encodes states, not causal regulation — so the “perturbation recipe” is likely at the level of programs (antiviral, CD8-effector) rather than individual TFs. That’s still useful, and arguably the more tractable target.
Open questions
- Does feature identity survive across SAE runs? Same data, different seeds — do we recover the same programs, or just different bases for the same subspace? (Planned: dictionary-level similarity across retrains.)
- What’s the right granularity for steering? Single feature → state. Sparse combination → cell type. But is there a middle tier for sub-programs, or is it continuous?
- Can we validate steering with a real perturbation dataset? e.g. CRISPR-perturb-seq: predict DEGs from a gene knockdown by finding the feature recipe that best matches the post-knockdown expression, and check whether the recipe’s constituent features overlap with the knocked-down gene’s known programs.