SageFormer: Series-Aware Graph-Enhanced Transformers for Multivariate Time Series Forecasting

  • Zhang, Zhenwei
  • Wang, Xin
  • Gu, Yuantao

SageFormer is a specialized model adept at multivariate time series forecasting. It effectively models dependencies between different series using graph structures. Key features of SageFormer include:

  • Series-Aware and Graph-Enhanced Architecture: Enhances Transformer models to address temporal patterns and redundancy.
  • Multivariate Time Series Forecasting: Efficiently forecasts multivariate time series, tackling challenges like representing diverse temporal patterns across series and reducing redundant information.


SageFormer aims to resolve several challenges:

  • Modeling Dependencies: Captures and models dependencies between series using graph structures.
  • Temporal Pattern Handling: Enhances Transformer models to effectively address temporal patterns and redundancy.
  • Cross-Domain Applications: Seeks to provide robust solutions for time series analysis and forecasting across various domains.


  1. Effective Representation of Temporal Patterns:
    • Series-Aware Approach: Augments the series-independent framework by introducing global tokens before input tokens.
    • Global Tokens: Gathers global information for each variable through self-attention, enabling series interaction via graph aggregation.
    • Learning Temporal Patterns and Dependencies: SageFormer learns individual series’ temporal patterns and focuses on dependencies between series, enhancing diversity and overcoming series-independent limitations.
  2. Mitigating Redundancy Across Series:
    • Sparsely Connected Graph Structures: Proposed to reduce the impact of redundant information in unrelated series.
    • Low-Rank Datasets: Designed to evaluate model effectiveness with sparse data, showing stable performance as series dimensions increase.
    • Effective Utilization: Utilizes low-rank properties effectively, contrasting with series-mixing method which suffers from prediction deterioration as series dimensions grow.



SageFormer is designed to augment the capability of Transformer-based models in addressing interseries dependencies. And it has three key components:

  • Series-aware global tokens
  • Graph structure learning
  • Iterative message passing

Series-aware Global Tokens

  • Inspiration: Derived from class tokens in language models and Vision Transformer.
  • Implementation:
    • Prepend learnable tokens to each series.
    • Utilize these tokens to capture inter-series dependencies.
    • Reshape input multivariate time series, adding learnable embeddings (global tokens) before patched sequences.
  • Interaction: Facilitate interaction across series and enhance positional information through 1D position embeddings.

Graph structure learning

  • End-to-End Learning: Learn adjacency matrix capturing implicit relationships across series.
  • Equations: Utilize equations for node embeddings and adjacency matrix transformation.
  • Unidirectional Dependencies: Follow MTGNN approach to learn unidirectional dependencies, yielding a sparse adjacency matrix.

Iterative message passing

  • Embedding Tokens Processing: Processed by SageFormer encoder layers for temporal encoding and graph aggregation.
  • Graph Aggregation: Fuses each series’ information with neighbors’, enhancing series with related patterns.
  • Temporal Encoding: Graph-enhanced embeddings processed by Transformer components, disseminating GNN-aggregated information via self-attention, enhancing model expressiveness.


  • SageFormer: Novel model for long-term Multivariate Time Series (MTS) forecasting, amalgamating GNN with Transformer structures.
  • Performance: Impressive versatility, state-of-the-art performance on real-world/synthetic datasets.
  • Future Research:
    • Dependencies captured don’t strictly represent causality, leading to potential unreliability in practical scenarios due to time series non-stationarity, with a focus on enhancing long-term forecasting over graph structure interpretability.
    • Improving GNN component for causality learning, application to non-Transformer models.