How To Predict Multiple Time Series With Scikit-Learn (With a Sales Forecasting Example)

You got a lot of time series and want to predict the next step (or steps). What should you do now? Train a model for each series? Is there a way to fit a model for all the series together? Which is better?

I have seen many data scientists think about approaching this problem by creating a single model for each product. Although this is one of the possible solutions, it’s not likely to be the best.

Here I will demonstrate how to train a single model to predict multiple time series at the same time. This technique usually creates powerful models that help teams win machine learning competitions and can be used in your project.

  1. Hi Mario Filho,

    Thanks for providing this solution, but I think there is a flaw in the working. You are training the model again for predicting future weeks.
    Whereas, you should your model’s output as input to predict future weeks. Then only the mean error will make sense.

    • Not necessarily, you can retrain your model every week when you get new data (if you get new data every week, of course). If you use the model output you can accumulate errors. If you need predictions for later weeks you need to change the approach to not use last week inputs and have the label far in the future.

        • I was trying to implement the solution on similar problem, where I have to predict sales based on other variables. I am stuck on creating features for test data to predict.

  2. Navon Francis

    Hi Mario,

    I have a question regarding this article:

    When the model is built, you say “Now, to get predictions for each time series, your just need put the product code, week and compute the new features.”

    How are you supposed to implement this? Say I want to make a prediction of product 1 for week 52, or product 40 for week 53.

    Thank you!

    • In this example we get new sales numbers every week and try to predict the next week. So when implemented in production we have a retraining/prediction cycle every week when we get new data. If you want predictions for weeks greater than next week I recommend changing the approach (inputs and outputs).

  3. hey thanks for the explanation,
    I was checking if we want to forecast for next 2 weeks (currently you are doing it for 1 week) how do you prepare the val set with prediction for current week to be used as prev week sales for next week in python?

  4. Hi Mario, great post

    I had a similar situation but since the products where bought in a basket basis, there was an extra dimension of information that I could add. Instead of using the product identifier such as 1,2,3,… as an ordinal variable, I extracted an embedding out of the basket information using (word/prod)2vec and skipgrams. Now instead of an ID I have a higher dimensional identifier that kept similar products close to each other, and the resulting predictions were much better. In another scenario where there was no basket I extracted this vector space out of the user access log. If the user saw product 1,3,7,13, I ran skipgrams in this small series, got the space vector and moved on. So when your problem is a basket related one, or if you can learn similarities between products due to user visit patterns, there is another *possibility* improve the solution.

  5. Parth Gadoya

    Hi sir,
    Thanks for an informative case study. I am the beginner in forecasting domain.

    I am working with sales forecasting problem, in which my goal is to predict sales for each product on each store. I need an accuracy of more than 70%, for 85% of store-product combinations. Problem is I have lots of ZERO actual sales and that is why Absolute Percentage Error is failing. Will you please suggest a robust way to measure error such that I can understand accuracy on store-product combinations?

    Also I came across Mean Absoute Scaled Error (MASE), but don’y know how to use it with multiple time series and compare with my success criteria.

    Thank you very much.

  6. Shreyas Sabnis

    Hi Mario,

    Great example!
    I have one question, however. We are treating the product code as a numerical variable. Is that the best way to handle them? Would it not be more appropriate to treat product code as a categorical variable?


  7. Camille Toarmino

    Hi Mario,
    Fantastic post. This really helped me think about how I want to approach a project I’m working on.
    I do have a question. For my project, I am trying to predict when a battery in a specific lock will fail. I have about 4,000 locks that have various characteristics. The locks were not all installed at the same time, so I don’t have the same amount of data for each lock. Could I still use something like random forest if I have multiple time series of different lengths? Do you have any thoughts on a specific algorithm for a problem like this?

    • Mario Filho

      Hi Camille,

      Yes, you can still use RF, no problem. Just make sure that your features can be computed for new locks (that will come with different lengths of data). Your validation split is very important. I would guess validating with different locks between training and test is a good idea.



      • Camille Toarmino

        Thanks for your response! I have this method implemented for my data now. I am wondering if you have any thoughts on how I could use this to predict the battery charge for the next two weeks, rather than just the next week. Thanks.

        • Mario Filho


          You can change your target variable to be the next two weeks.

          Your target variable can be any period you would like to predict.

          • DEBNEIL SAHA ROY

            Thank you for the post. How do you change the target variable for two week ahead prediction? Include the previous two week sales?

          • Mario Filho

            In this case, you just change your target variable to be two weeks ahead. For example, instead of using features from Week 40 to predict Week 41, you use week 40 to predict week 42.

  8. Mario,
    Thank you so much for sharing. Would this approach work for a much larger number of products, say a little over 8,000, over a shorter time frame (daily data, 43 days), or is there another approach you would recommend?
    Thanks in advance for any guidance provided, it is much appreciated.

    • Mario Filho

      Hi Emily, this approach should work well.

      Usually the predicting a shorter time frame is actually more accurate than longer.

      • Thanks for the response! It looks like your example provides an output of total sales for the week. Is it possible to adjust to get an output for the predicted sales for the weeks per product? If so, how would one go about making that adjustment?
        Thanks in advance for any guidance provided!

        • Mario Filho

          In the example I calculate the error for each week, but the predictions are at the (product, week) level. Each row we use to predict has a different product and week.

  9. Can you please explain How RMSLE is approximation to MAPE?
    Because of zero actuals, Planning to move to similar measure of MAPE. Any reference or explanation would be great help.

    • Mario Filho


      I don’t have a resource that explains it in detail, but you can create a plot of MAPE and RMSLE to compare.

      Basically here we are using the fact that log( m / n ) = log( m ) – log ( n ). So when you minimize the squared difference between the log of the variables, it’s a proxy for the ratio.

      For the zeros, just do log(x + 1) and exp(x) – 1 as inverse.

Deixe uma resposta

O seu endereço de e-mail não será publicado. Campos obrigatórios são marcados com *

Esse site utiliza o Akismet para reduzir spam. Aprenda como seus dados de comentários são processados.