SageFormer: Series-Aware Graph-Enhanced Transformers for Multivariate Time Series Forecasting
- Zhang, Zhenwei
- Wang, Xin
- Gu, Yuantao
SageFormer is a specialized model adept at multivariate time series forecasting. It effectively models dependencies between different series using graph structures. Key features of SageFormer include:
- Series-Aware and Graph-Enhanced Architecture: Enhances Transformer models to address temporal patterns and redundancy.
- Multivariate Time Series Forecasting: Efficiently forecasts multivariate time series, tackling challenges like representing diverse temporal patterns across series and reducing redundant information.
Introduction
SageFormer aims to resolve several challenges:
- Modeling Dependencies: Captures and models dependencies between series using graph structures.
- Temporal Pattern Handling: Enhances Transformer models to effectively address temporal patterns and redundancy.
- Cross-Domain Applications: Seeks to provide robust solutions for time series analysis and forecasting across various domains.
Challenges
- Effective Representation of Temporal Patterns:
- Series-Aware Approach: Augments the series-independent framework by introducing global tokens before input tokens.
- Global Tokens: Gathers global information for each variable through self-attention, enabling series interaction via graph aggregation.
- Learning Temporal Patterns and Dependencies: SageFormer learns individual series’ temporal patterns and focuses on dependencies between series, enhancing diversity and overcoming series-independent limitations.
- Mitigating Redundancy Across Series:
- Sparsely Connected Graph Structures: Proposed to reduce the impact of redundant information in unrelated series.
- Low-Rank Datasets: Designed to evaluate model effectiveness with sparse data, showing stable performance as series dimensions increase.
- Effective Utilization: Utilizes low-rank properties effectively, contrasting with series-mixing method which suffers from prediction deterioration as series dimensions grow.
Methodology
Overview
SageFormer is designed to augment the capability of Transformer-based models in addressing interseries dependencies. And it has three key components:
- Series-aware global tokens
- Graph structure learning
- Iterative message passing
Series-aware Global Tokens
- Inspiration: Derived from class tokens in language models and Vision Transformer.
- Implementation:
- Prepend learnable tokens to each series.
- Utilize these tokens to capture inter-series dependencies.
- Reshape input multivariate time series, adding learnable embeddings (global tokens) before patched sequences.
- Interaction: Facilitate interaction across series and enhance positional information through 1D position embeddings.
Graph structure learning
- End-to-End Learning: Learn adjacency matrix capturing implicit relationships across series.
- Equations: Utilize equations for node embeddings and adjacency matrix transformation.
- Unidirectional Dependencies: Follow MTGNN approach to learn unidirectional dependencies, yielding a sparse adjacency matrix.
Iterative message passing
- Embedding Tokens Processing: Processed by SageFormer encoder layers for temporal encoding and graph aggregation.
- Graph Aggregation: Fuses each series’ information with neighbors’, enhancing series with related patterns.
- Temporal Encoding: Graph-enhanced embeddings processed by Transformer components, disseminating GNN-aggregated information via self-attention, enhancing model expressiveness.
Conclusion
- SageFormer: Novel model for long-term Multivariate Time Series (MTS) forecasting, amalgamating GNN with Transformer structures.
- Performance: Impressive versatility, state-of-the-art performance on real-world/synthetic datasets.
- Future Research:
- Dependencies captured don’t strictly represent causality, leading to potential unreliability in practical scenarios due to time series non-stationarity, with a focus on enhancing long-term forecasting over graph structure interpretability.
- Improving GNN component for causality learning, application to non-Transformer models.