SAMformer
model paperLightweight transformer for multivariate long-term time series forecasting using sharpness-aware minimization and channel-wise attention. Identifies attention as responsible for poor generalization in forecasting transformers and addresses it via SAM optimization. Surpasses TSMixer by 14.33% on average with 4x fewer parameters. Presented as an oral at ICML 2024 from Huawei Paris Noah's Ark Lab.