Shared Task

Welcome to the Retail Sales Forecasting Challenge!

Sales forecasting is one of the real-life business forecasting tasks that is of importance to retail stores. Sales Forecasting involves prediction of future sales based on historical data. The goal will be a projection of future revenue within a specific period of time. Inaccurate business forecasts could result in actual or opportunity losses. The dataset to be used in this challenge is real-life data from one of South Africa’s large retail stores. The retail store is interested in being able to forecast sales for the next 70 days at 2 levels of time granularity (per day, and per week). The data to be used in this competition is multi-dimensional with the following dimensions; date, department and store, over which the forecasts should be done.
The forecasts should be as follows:

  1. Sales per department
  2. Sales per store
  3. Sales per department per store

Apart from using the traditional time series forecasting methods, you are challenged to use machine learning techniques for this task. The classical approach to tackle this challenge for each level of time granularity would be to fit (i) d time series models, one for each of the d departments; (ii) s time series models, one for each of the s stores; and (iii) d*s models, one for each department-store pair. The problem with such an approach is the individual models don’t learn from each other. It is reasonable to hypothesize that there are some patterns that are shared across stores and across departments that can be leveraged on to come up with better sales forecasts for this task. Whilst this challenge is not against the use of the classical approach, participants are encouraged to assume the afore-mentioned hypothesis, and come up with times series models that can share patterns across departments and across stores to improve the forecasting accuracy.

Participants will be given training data to be used to train the forecasting models. Owing to the nature of time series data, no testing data will be released to the participants. The participants are expected to make predictions for the next 70 trading days and submit their forecasts.

If successful, your work will continue to advance the theory and practice of time series forecasting in retail sales data.

Contact the organizers via:

Evaluation criteria

The submissions will be ranked according to the Root Mean Squared Error (RMSE) on the held out test set. Owing to the nature of time series data, no testing data will be released to the participants. Participants are expected to submit the 3 csv files with the following forecasts for the next 70 trading days:

  1. weekly sales per department per store
  2. daily sales per store
  3. daily sales per department

To get the overall score for the competition weighted RMSE will be used. The Rand value of the sales will be used to formulate the weights.

Important dates

  • Start of the competition. Sample dataset released: 26 August 2022
  • Training data released: 1 September 2022 (No test data will be released. Participants are expected to make their predictions for the next 70 trading days. The prediction will automatically be evaluated based on the heldout observed sales for the next 70 days)
  • Entry deadline: 22 September 2022
  • Competition ends: 14 October 2022 14 November 2022


  1. Participants must register to Codalab and provide a valid email address.
  2. To have access to the dataset for this competition you are expected to register for the competition.


1st Prize: R5000.00 Takealot Voucher

2nd Prize: R 3000.00 Takealot Voucher

3rd Prize: R 1000.00 Takealot Voucher