[Abstract & introduction] Three-line summary + problem statement
Three-line summary
- ① Fatal limitation of prior work: GP- or RL-based symbolic regression must restart search from scratch on every new dataset, barely reusing learned “formula grammar.” It is like reinventing the recipe every morning.
- ② Limits of classical tools: Tree boosters and LSTMs predict well but stay black boxes; fully manual factor design cannot scale the enormous search space.
- ③ Core idea: AlphaFormer pre-trains a Transformer on diverse synthetic price paths, then, given real , instantly generates RPN-style alpha formulas—a chef who practiced in many fake kitchens before cooking in a new one.
Analogy: recipe-randomizing robot vs. master chef with grammar in muscle memory
Legacy symbolic search is a robot that re-samples spice ratios from scratch whenever the “kitchen” (market) changes. AlphaFormer pre-trains on synthetic kitchens, learns composition rules, then, seeing real ingredients , plates a formula (alpha factor) on the spot—interpretable without giving up predictive pressure.