This paper proposes a spatial-temporal autoregressive model for the mortality surface, where mortality rates of each age depend on the historical values of itself (temporality) and the neighbouring ages (spatiality). The mortality dynamics is formulated as a large, first order vector autoregressive model which encompasses standard factor models such as the Lee and Carter (1992) model. Sparsity and smoothness constraints are then introduced, based on the idea that the nearer the two ages, the more important the dependence between mortalities at these ages. Our model has several novelties. First, it ensures that in the long-run, mortality rates at different ages do not diverge. Second, it provides a natural explanation of the so-called cohort effect without identifiability difficulties. Third, the model is easily extended to the multiple-population case in a coherent way. Finally, the model is associated with a closed form, non-parametric estimation method: the penalized least square, which ensures spatial smoothness of the age-dependent parameters. Using US and UK mortality data, we find that our model produces reasonable projected mortality profile in the long-run, as well as satisfying short-run out-of-sample forecast performance.