Navigating through real-world challenges in Timeseries Forecasting

September 27, 2023

3

minutes read

Tuning the Ear to Understand the Melody of Data

The key to mastering time series forecasting lies in learning the rhythm that drives the data. Benchmark datasets often provide a simplification of this rhythm, showcasing clear seasonal patterns and trends without strong influences from external factors. From air passenger numbers to Boston house prices, typical benchmark datasets in academia are united by simple underlying problem dynamics.

However, when operations move from this controlled environment into the chaos of real-world data, traditional methods can falter, giving rise to a wide array of challenges.

The World of Spikes, Cold Starts, and Intermittent Periods

In the business landscape, data spikes are a common occurrence. Promotions and sales days can cause drastic occurrences that can easily throw off a prediction model leading to bad consequences for the business planning. Companies also have to grapple with the so-called “cold start problem” where new product launches obviously cannot provide any historical data so far but still accurate predictions are required for the given product to steer the business effectively.

Intermittent time series, mainly characterised by having lots of 0-values in the data itself, are common in the realm of e-commerce and clothing retailers. They further destabilize common prediction model approaches. However, creative application of statistical methods like Croston Methods or ADIDA (Aggregated-Disaggregated Intermittent Demand Approach) can lend stability during such periods.

‍

There is no Prediction Model yet to serve them all!

Since dealing with datasets that can exceed millions of predicted items including above described data complexities, we need to find matching model approaches. Therefore there is not one model that will cover all these complexities by itself best. For time series in a low volume sector with only less variance and no external drivers, statistical models empirically perform well. For high volume products including high variance by external factors, deep learning or machine learning models are best practice.

Our tech stack at paretos gives us the opportunity to effectively ensemble from those approaches and therefore provide optimal results for various different problem characteristics.

A Guiding Light Amidst the Chaos: Running Baselines

But how we do we know if a prediction model actually performs good, bad, superb or just mediocre?

The first and most crucial step in time series forecasting is the running of baselines. This will provide a lot of context for the performance of the final prediction model and is essentially to steer the machine learning training iterations.

Analysing model performance based on data clusters (for example clustered by volume, variance, forecasting error etc.) further helps in honing the predictions. This becomes essential especially in datasets with more than ~100k predicted items where the decision maker needs to be guided to the most important items straight away to review them fast and effective.

In the world of timeseries forecasting, paretos offers data-informed insights and optimized forecasts, ensuring that your business stays ahead of the curve. Navigate through the complexities of real-world data and turn every challenge into an opportunity. Begin forecasting with paretos today.

paretos

We are the leading AI-based decision intelligence platform for effective, data-driven decision-making processes in companies. No more bad decisions!

Robert Haase

AI Scientist

As a physicist always seek to find well-suited explanations and modelling approaches for complex mechanisms. Actively working with various KMUs and international companies in the field of forecasting and demand planing enabled me to generate maximal business value with the help of state-of-the-art machine learning approaches.