Autoregressive models are just the probability chain rule plus a conditional model. Here’s the mental model, the math, and what training is really doing.