Comparison of Machine Learning Algorithms in Demand Prediction Problem
This study is intended the major issues of applying econometric and machine learning techniques to a daily demand prediction problem. The purpose of the paper is going to be achieved via the models’ predictive power comparison on bakery retail chain data.
Рубрика | Экономико-математическое моделирование |
Вид | дипломная работа |
Язык | английский |
Дата добавления | 14.07.2020 |
Размер файла | 528,8 K |
Отправить свою хорошую работу в базу знаний просто. Используйте форму, расположенную ниже
Студенты, аспиранты, молодые ученые, использующие базу знаний в своей учебе и работе, будут вам очень благодарны.
The main disadvantage of regression tree is a high variance due to overfitting and low stability of regression tree as input variables are taken randomly (Zhang & Suganthan, 2014). The tightening of rules saying when to end splits leads to underestimation of the model.
One possible way of overcoming the model restrictions is an ensembling. Combining of different regression trees predictions improve the quality of the prediction. Though each regression tree has a high variance, its predictions are unbiased, and they are quite accurate for specific groups of observations.
We selected two ways of ensembling regression trees: random forest and gradient boosting. The difference in the approaches is in evaluation approach. In a random forest ensemble, regression trees are grown independently from each other through bootstrap aggregating (bagging) approach (the size of sample is set by bagging fraction hyperparameter), when observations for evaluating trees are randomly selected from the training sample, so the ensemble prediction is a simple average between trees predictions. Gradient boosting provides another approach of trees ensembling. The training samples for trees forms sequentially, when results of evaluation the tree influence the input sample for the next grown tree. The training sample with a higher probability (depends on learning rate hyperparameter) includes the observations with the highest evaluation errors. Such approach allows to concentrate on the observations which have the least accurate predictions.
Both models aim at creating the most appropriate input training data for regression trees and minimizing overall prediction error in terms of bias and variance. However, the approaches aimed at different desired properties of the prediction model what influence on the quality of prediction (Zhang & Suganthan, 2014). Random forest model gives regression trees equal weights. Therefore, the model minimizes average error at the whole dataset. At the same time, gradient boosting concentrates on the outliers, parts of the dataset, where dependent variable does not follow the common pattern of dependence on explanatory variables.
Consequently, the random forest model prediction has low bias and high variance, while gradient boosting model prediction has high bias and low variance. The reason is that random forest model focus on the average prediction quality which leads to possible underestimation and therefore high variance of the forecasts. On the other hand, gradient boosting model more likely prone to overfitting as the model unable to predict the general distribution of the dependent variable by construction. The model focuses on the areas with the highest prediction errors and have a lower variance than random forest model. However, gradient boosting predictions are often biased because of overfitting.
The choice between gradient boosting and random forest in demand prediction problem should be made depending on the business specifics. The presence of anomalies in demand (for instance, regular significant surges in particular SKU sales) make random forest model predictions less appropriate as it tends to underfit and smooth the outliers and the gradient boosting model is more preferable. Stable patterns of sales with outliers mostly caused by statistical randomness should be explained using random forest model. Gradient boosting model in this case may overfit and emphasize on outliers prediction, which leads to estimating false relationships.
The final model is an ensemble constructed from previously described models. Input variables for the ensemble are predictions, the method of estimation is a linear model. The ensemble estimation includes additional validation set in order to avoid overfitting (Bajari & Nekipelov, 2015).
Results
In this part we show the results of models estimation and give an interpretation of them. Taking into account the conclusions from the previous section, we aimed at finding the model with the best predictive out-of-sample power (with the lowest accuracy metric MQE) and with the lowest bias, if it is possible. Quantile t in quantile loss function, is chosen 0.67 as it reflects the average proportion between marginal profit and costs on producing one unit for the chosen retailer. The results of chosen models comparison and loss functions are presented in table 7. MAPE reflects error in percent of average volume sales, MQE in volume of sales and economic effect in rubles.
Table 7
Comparison of techniques
Model |
Loss function |
MAPE |
MQE |
Economic effect (in 5 bakeries per month) |
||
Actual sales |
2.71 |
0.0 |
0.0 |
1 244 064 |
||
Baseline |
2.76 |
20.3 |
0.27 |
0 |
||
LM |
MAE |
2.71 |
16.74 |
0.23 |
180 499 |
|
Quantile |
2.97 |
17.73 |
0.19 |
330 900 |
||
SVR |
MAE |
2.77 |
16.93 |
0.22 |
224 569 |
|
Quantile |
2.94 |
17.80 |
0.20 |
304 836 |
||
RF |
MAE |
2.68 |
15.15 |
0.21 |
238 120 |
|
Quantile |
2.83 |
15.23 |
0.18 |
376 205 |
||
GB |
MAE |
2.68 |
15.13 |
0.21 |
266 477 |
|
Quantile |
2.90 |
15.70 |
0.18 |
408 509 |
The first row in the table shows actual sales that do not have any prediction error, the row describes average sales in sample and economic effect in comparison to baseline strategy of forecasting. The baseline row depicts possible MAPE near 20% and MQE 0.27 if the ordering is based on sales in the last week. Average baseline forecast is higher than actual sales. It is shown that the current strategy of ordering in bakeries gives an economy more than 1.2 million rubles per month over baseline strategy of ordering.
The forecasting using linear model based on different loss functions provides different results. MAE loss function gives lower MAPE when quantile loss function provides lower MQE as expected according to the theoretical results. Average prediction on the model evaluated using MAE loss function provides average forecast equal actual sales, while average prediction of LM with quantile loss function gives overvalued forecast. That is also in line with described theory of optimal management behavior. The key result of comparison the LM models is that overvalued prediction got by evaluating model with the quantile loss function allows to almost double an economic effect of the prediction from 180 to 330 thousand rubles.
The similar pattern of the lower MAPE for model estimated on MAE rather than quantile loss function and lower MQE for quantile loss function is observed for all other models. The same is true for the higher economic effect estimated by different models with quantile loss functions. The findings allow to suggest that sales prediction algorithm in retail should include asymmetric loss function (for instance, quantile) as it provides a higher economic effect of the implementing the algorithm. Moreover, is suggested to assess the algorithm quality using asymmetric accuracy metric.
Special attention also needs to be focused on the choice of the best model. According to the table, the biggest economic effect provides GB and RF models evaluated using quantile loss function, which is the most appropriate to data in the study. The result is in line with the fact that the models have the highest weight in the final ensemble model. The weights of models in the ensemble are in table 8.
Table 8
Weights in ensemble (in %)
Model |
Weight |
|
LM |
0.0 % |
|
SVR |
0.0 % |
|
RF |
55.7 % |
|
GB |
44.3 % |
In average, the best sales forecasting in study bakery chain is achieved by ensembling two models, in other words, a weighted combination of RF and GB predictions. Predictions by LM and SVR do not provide additional information in ensemble as they have zero weights in the ensemble. That is to say, we may conclude that in our dataset tree-based models are superior to SVM and RF, however there is no evidence to suggest that the results are fair for other datasets.
Additional review of results may be made through the best hyperparameters analysis. The optimal values of the hyperparameters for SVR, RF and GB are presented in appendix 1. An optimal SVR kernel function is polynomial with the fourth degree, that indicates about non-linear relation between sales volume and other variables. It partly explains a low prediction power of linear model. The limitation of the study in linearity of regression model, and it may be improved by including a higher degrees of variables. Another issue is that RF model try to grow deep trees with a weak restriction on further splits while GB includes a high level of randomness in feature selection and bagging with not so deep trees. It leads to avoiding overfitting, therefore, GB and RF prediction power in our dataset are similar.
Conclusion
This study set out to investigate different prediction techniques in demand prediction task in retail. The research question was constructed: “which prediction algorithm provides the highest accuracy in 1-day ahead retail sales prediction problem?”. In order to answer the research question, we study a literature from different fields of knowledge: economics, management, machine learning. As a result, we made a comparative analysis of the most frequently used techniques in retail demand forecasting from the theoretical point of view.
We got to the conclusion that choice of the most appropriate algorithm to the task should include comparison of prediction methods and ways of their evaluation. For this, we chose linear model, support vector regression, random forest, gradient boosting model as prediction models and mean absolute error and quantile loss function as a technique of models evaluation. We described its relation to the forecasting problem and highlighted the possible advantages and disadvantages from their exploiting. Finally, we compared techniques prediction power empirically using POS transaction data from large Russian retailer.
The study suggests that there are a number of reasons of comparing different models as they may provide different forecasting accuracy depending on some conditions. The main finding from the work is that we showed that quantile loss function, as an example of asymmetric accuracy metric, provides better prediction accuracy calculated as an economic effect of implementing forecast, including specific of the food retail: asymmetric costs with the prevalence of shortage costs over excess costs. It implies in need for deeper analysis before model construction and evaluation in practice from the cost and benefits prospective.
References
Arellano, M., & Bond, S. (1988). Dynamic panel data estimation using DPD-A guide for users. London: Institute for Fiscal Studies.
Armstrong, J. S., & Collopy, F. (1992). Error measures for generalizing about forecasting methods: Empirical comparisons. International journal of forecasting, 8(1), 69-80.
Auffhammer, M. (2007). The rationality of EIA forecasts under symmetric and asymmetric loss. Resource and Energy Economics, 29(2), 102-121.
Bajari, P., Nekipelov, D., Ryan, S. P., & Yang, M. (2015). Demand estimation with machine learning and model combination (No. w20955). National Bureau of Economic Research.
Bergmeir, C., & Benнtez, J. M. (2012). On the use of cross-validation for time series predictor evaluation. Information Sciences, 191, 192-213.
Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of machine learning research, 13(Feb), 281-305.
Bouktif, S., Fiaz, A., Ouni, A., & Serhani, M. A. (2018). Optimal deep learning lstm model for electric load forecasting using feature selection and genetic algorithm: Comparison with machine learning approaches. Energies, 11(7), 1636.
Bozkir, A. S., & Sezer, E. A. (2011). Predicting food demand in food courts by decision tree approaches. Procedia Computer Science, 3, 759-763.
Darbellay, G. A., & Slama, M. (2000). Forecasting the short-term demand for electricity. International Journal of Forecasting, 16(1), 71-83.
Dopke, J., Fritsche, U., & Siliverstovs, B. (2009). Evaluating German business cycle forecasts under an asymmetric loss function (No. 5/2009). DEP (Socioeconomics) Discussion Papers, Macroeconomics and Finance Series.
Ehrenthal, J. C. F., Honhon, D., & Van Woensel, T. (2014). Demand seasonality in retail inventory management. European Journal of Operational Research, 238(2), 527-539.
Flores, B. E. (1986). A pragmatic view of accuracy measurement in forecasting. Omega, 14(2), 93-98.
Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International journal of forecasting, 22(4), 679-688.
Jin, Y. “Henry”, Williams, B. D., Tokar, T., & Waller, M. A. (2015). Forecasting With Temporally Aggregated Demand Signals in a Retail Supply Chain. Journal of Business Logistics, 36(2), 199-211.
Kimes, S. E., Chase, R. B., Choi, S., Lee, P. Y., & Ngonzi, E. N. (1998). Restaurant revenue management: Applying yield management to the restaurant industry. Cornell Hotel and Restaurant Administration Quarterly, 39(3), 32-39.
Lasek, A., Cercone, N., & Saunders, J. (2016). Restaurant sales and customer demand forecasting: Literature survey and categorization of methods. In Smart City 360° (pp. 479-491). Springer, Cham.
Liu, L.-M., Bhattacharyya, S., Sclove, S., Chen, R., Lattyak, W. (2001): Data mining on time series: an illustration using fast-food restaurant franchise data. Comput. Stat. Data Anal. 37, 455-476 (2001).
Nenni, M. E., Giustiniano, L., & Pirolo, L. (2013). Demand Forecasting in the Fashion Industry: A Review. International Journal of Engineering Business Management, 5, 37.
Qiu, X., Zhang, L., Ren, Y., Suganthan, P., & Amaratunga, G. (2014). Ensemble deep learning for regression and time series forecasting. 2014 IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL), 1-6.
Schweitzer, M. E., & Cachon, G. P. (2000). Decision bias in the newsvendor problem with a known demand distribution: Experimental evidence. Management Science, 46(3), 404-420.
Tanizaki, T., Hoshino, T., Shimmura, T., & Takenaka, T. (2019). Demand forecasting in restaurants using machine learning and statistical analysis. Procedia CIRP, 79, 679-683.
Tsoumakas, G. (2019). A survey of machine learning techniques for food sales prediction. Artificial Intelligence Review, 52(1), 441-447.
Wang, C. X., Webster, S., & Suresh, N. C. (2009). Would a risk-averse newsvendor order less at a higher selling price?. European Journal of Operational Research, 196(2), 544-553.
Weatherford, L. R., Kimes, S. E., & Scott, D. A. (2001). Forecasting for hotel revenue management: Testing aggregation against disaggregation. Cornell hotel and restaurant administration quarterly, 42(4), 53-64.
Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate research, 30(1), 79-82.
Yu, K., Lu, Z., & Stander, J. (2003). Quantile regression: applications and current research areas. Journal of the Royal Statistical Society: Series D (The Statistician), 52(3), 331-350.
Zhang, L., & Suganthan, P. N. (2014). Random Forests with ensemble of feature spaces. Pattern Recognition, 47(10), 3429-3437.
Appendix
Optimal values for hyperparameters with quantile loss function
Hyperparameter |
SVR |
RF |
GB |
|
Kernel function |
Polynomial |
- |
||
C |
0.1 |
|||
Tolerance |
0.001 |
|||
Degree |
4 |
|||
Gamma |
0.1 |
|||
Learning rate |
- |
0.10 |
0.02 |
|
Maximum depth |
270 |
150 |
||
Minimum samples in leaf |
80 |
80 |
||
Bagging fraction |
- |
0.667 |
||
Feature fraction |
- |
0.542 |
Размещено на Allbest.ru
...Подобные документы
Процесс построения и анализа эконометрической модели в пакете Econometric Views. Составление, расчет и анализ существующей проблемы. Проверка адекватности модели реальной ситуации на числовых данных в среде Eviews. Построение регрессионного уравнения.
курсовая работа [1,3 M], добавлен 17.02.2014Исследование изменения во времени курса акций British Petroleum средствами эконометрического моделирования с целью дальнейшего прогноза с использованием компьютерных программ MS Excel и Econometric Views. Выбор оптимальной модели дисперсии ошибки.
курсовая работа [1,2 M], добавлен 14.06.2011A theoretic analysis of market’s main rules. Simple Supply and Demand curves. Demand curve shifts, supply curve shifts. The problem of the ratio between supply and demand. Subsidy as a way to solve it. Effects of being away from the Equilibrium Point.
курсовая работа [56,3 K], добавлен 31.07.2013Machine Translation: The First 40 Years, 1949-1989, in 1990s. Machine Translation Quality. Machine Translation and Internet. Machine and Human Translation. Now it is time to analyze what has happened in the 50 years since machine translation began.
курсовая работа [66,9 K], добавлен 26.05.2005What is Demand. Factors affecting demand. The Law of demand. What is Supply. Economic equilibrium. Demand is an economic concept that describes a buyer's desire, willingness and ability to pay a price for a specific quantity of a good or service.
презентация [631,9 K], добавлен 11.12.2013Law of demand and law of Supply. Elasticity of supply and demand. Models of market and its impact on productivity. Kinds of market competition, methods of regulation of market. Indirect method of market regulation, tax, the governmental price control.
реферат [8,7 K], добавлен 25.11.2009The Chernobyl disaster is a huge global problem of 21st century. Current status of Chernobyl NPP. The most suitable decision of solving problem of wastes is a reburial in the repository "Buryakovka". The process of the Arch assembling and sliding.
реферат [396,5 K], добавлен 19.04.2011The pillars of any degree of comparison. Morphological composition of the adjectives. An introduction on degrees of comparison. Development and stylistic potential of degrees of comparison. General notes on comparative analysis. Contrastive linguistics.
курсовая работа [182,5 K], добавлен 23.12.2014Investigation of the problem with non-local conditions on the characteristic and on the line of degeneracy . The solution of the modied Cauchy problem with initial data. The solution of singular integral equations. Calculation of the inner integral.
статья [469,4 K], добавлен 15.06.2015Planning a research study. Explanation, as an ability to give a good theoretical background of the problem, foresee what can happen later and introduce a way of solution. Identifying a significant research problem. Conducting a pilot and the main study.
реферат [26,5 K], добавлен 01.04.2012A theory of price. Analysis of Markets. Simple Supply and Demand curves. Demand curve shifts. Supply curve shifts. Effects of being away from the Equilibrium Point. Vertical Supply Curve. Other market forms. Discrete Example. Application: Subsidy.
контрольная работа [84,0 K], добавлен 18.07.2009Сritical comparison of Infrared analysis and Mass Spectrometry. Summary of the uses in forensic, the molecular structural mass spectral. The method provides better sensitivity in comparison. To conclude, both techniques are helpful in the forensic study.
реферат [20,1 K], добавлен 21.12.2011The air transport system in Russia. Project on the development of regional air traffic. Data collection. Creation of the database. Designing a data warehouse. Mathematical Model description. Data analysis and forecasting. Applying mathematical tools.
реферат [316,2 K], добавлен 20.03.2016Natural gas market overview: volume, value, segmentation. Supply and demand Factors of natural gas. Internal rivalry & competitors' overview. Outlook of the EU's energy demand from 2007 to 2030. Drivers of supplier power in the EU natural gas market.
курсовая работа [2,0 M], добавлен 10.11.2013Principles of learning and language learning. Components of communicative competence. Differences between children and adults in language learning. The Direct Method as an important method of teaching speaking. Giving motivation to learn a language.
курсовая работа [66,2 K], добавлен 22.12.2011Machine Learning как процесс обучения машины без участия человека, основные требования, предъявляемые к нему в сфере медицины. Экономическое обоснование эффективности данной технологии. Используемое программное обеспечение, его функции и возможности.
статья [16,1 K], добавлен 16.05.2016Анализ существующего программного обеспечения эмпирико-статистического сравнения текстов: сounter оf сharacters, horos, graph, advanced grapher. Empirical-statistical comparison of texts: функциональность, процедуры и функции тестирование и внедрение.
дипломная работа [4,4 M], добавлен 29.11.2013Economics: macroeconomics, microeconomics, economic policy. Terms: "economics", "macroeconomics", "microeconomics", "economic policy", "demand", "supply" and others. Economic analysis. Reasons for a change in demand. Supply. Equilibrium. Elasticity.
реферат [17,3 K], добавлен 12.11.2007The development in language teaching methodology. Dilemma in language teaching process. Linguistic research. Techniques in language teaching. Principles of learning vocabulary. How words are remembered. Other factors in language learning process.
учебное пособие [221,2 K], добавлен 27.05.2015Traditional and modern methods in foreign language teaching and learning. The importance of lesson planning in FLTL. Principles of class modeling. Typology of the basic models of education: classification by J. Harmer, M.I. Makhmutov, Brinton and Holten.
курсовая работа [2,1 M], добавлен 20.05.2015