Understanding how the probabilistic forecasting is combined with the Transformer architecture of Vaswani2017 in GluonTS #2817
Unanswered
deltaproximity
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
For one university project I have used the Transformer implementation from GluonTS package. The result was quite good, but I cannot find a clear explanation/documentation how the original vanilla Transformer architecture of Vaswani2017 has been adopted/changed for the probabilistic time series forecasting in GluonTS. The GluonTS paper from 2019 (GluonTS: Probabilistic Time Series Models in Python) briefly mentions the implementation of Transformers in section "Discriminative Models", where it is said that "... the prediction horizon τ has to be fixed beforehand in sequence-to-sequence models, and a complete retraining is needed if the forecast is required beyond τ steps" unlike to auto-regressive models. What puzzles me is that the original paper Vaswani2017 is actually auto-regressive if one would like to generate sequences of multiple symbols. Additionally the first published application of Vaswani's Transformer to forecasting (Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case) is using Transformer for one step-ahead forecasting and one would need to use auto-regression to generate multi-horizon forecast. I have found though one paper on probabilistic forecasting which touches upon Transformer architecture (Multivariate probabilistic time series forecasting via conditioned normalizing flows) by Kashif Rasul et al. dated back by Jan 2021. Since this paper is two years older than the above GluonTS paper, I have been wondering if there is a chance that this might be the way the probabilistic Transformer is actually implemented in GluonTS package. The documentation in the GluonTS package only says that the Transformer implementation is similar to Vaswani2017. I would appreciate if anyone could clarify this question.
Beta Was this translation helpful? Give feedback.
All reactions