

Real estate valuation based on big data
https://doi.org/10.32609/0042-8736-2022-12-118-136
Abstract
The paper considers the application of the web scrapping and machine learning algorithms for the assessment of the real estate price on the secondary housing market in Moscow. For this, we collect and process the data from the CIAN website and the data from “Reforma GKH”. To evaluate real estate objects, we consider such machine learning algorithms as Elastic Net, Random Forest and Gradient Boosting. We also apply Shapley vector-based approach to interpret the results of the black-box algorithms. The results suggest that the use of black-box algorithms in assessing the price of apartments on the Moscow secondary housing market allows to obtain more accurate price estimates both for different price segments and for the sample as a whole. At the same time, Gradient Boosting has demonstrated the best accuracy among other algorithms. Interpretation based on the Shapley vector shows that the total area, year of construction, ceiling height, renovation, as well as monolithic construction technology had a positive effect on the price. The price is negatively affected by the number of floors in the house, the possibility of mortgage and lack of repairs. Developed methodology can be applied in real estate insurance, mortgage, determination of cadastral value of real estate and others.
About the Authors
M. O. MamedliRussian Federation
Mariam O. Mamedli
Moscow
A. V. Umnov
Russian Federation
Аndrey V. Umnov
Moscow
References
1. Balash V., Balash O., Harlamov A. (2011). A spatial econometric analysis of the housing market. Applied Econometrics, No. 22, pp. 62—77. (In Russian).
2. Goncharov G., Natkhov T. (2020). Textual analysis of pricing in the Moscow residential real estate market. HSE Economic Journal, No. 1, pp. 101—116. (In Russian). https://doi.org/10.17323/1813-8691-2020-24-1-101-116
3. Leyfer L., Chernaya E. (2020). Mass appraisal of real estate objects based on machine learning technologies. Analysis of various methods for assessing the market value of apartments. Imushchestvennye Otnosheniya v Rossiyskoy Federatsii, No. 3, pp. 32—42. (In Russian).
4. Ozhegov E., Kosolapov N., Pozolotina Y. (2017). On dependence between housing value and school characteristics. Applied Econometrics, No. 47, pp. 28—48. (In Russian).
5. Bischl B. et al. (2021). Hyperparameter optimization: Foundations, algorithms, best practices and open challenges. Unpublished manuscript. https://doi.org/10.48550/arXiv.2107.05847
6. Breiman L. (2001). Random forests. Machine Learning, Vol. 45, pp. 5—32. https://doi.org/10.1023/A:1010933404324
7. Friedman J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, Vol. 29, No. 5, pp. 1189—1232. https://doi.org/10.1214/aos/1013203451
8. Friedman J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, Vol. 38, No. 4, pp. 367—378. https://doi.org/10.1016/S0167-9473(01)00065-2
9. Johannemann J., Hadad V., Athey S., Wager S. (2019). Sufficient representations for categorical variables. Unpublished manuscript. https://doi.org/10.48550/arXiv.1908.09874
10. Loberto M., Luciani A., Pangallo M. (2018). The potential of big housing data: Аn application to the Italian real-estate market. Bank of Italy Working Paper, No. 1171. https://doi.org/10.2139/ssrn.3176962
11. Merrick L., Taly A. (2020). The explanation game: Explaining machine learning models using Shapley values. In: A. Holzinger, P. Kieseberg, A. Tjoa, E. Weippl (eds.). Machine learning and knowledge extraction. Cham: Springer, pp. 17—38. https:// doi.org/10.1007/978-3030-57321-8_2
12. Moosavi V. (2017). Urban data streams and machine learning: A case of Swiss real estate market. Unpublished manuscript. https://doi.org/10.48550/arXiv.1704.04979
13. Myttenaere A., Golden B., Grand B., Rossi F. (2017). Mean absolute percentage error for regression models. Neurocomputing, Vol. 192, pp. 38—48. https://doi.org/10.1016/j.neucom.2015.12.114
14. Nguyen T. (2019). Faster feature selection with a dropping forward-backward algorithm. Unpublished manuscript. https://doi.org/10.48550/arXiv.1910.08007
15. Tchuente D., Nyawa S. (2022). Real estate price estimation in French cities using geocoding and machine learning. Annals of Operations Research, Vol. 308, pp. 571—608. https://doi.org/10.1007/s10479-021-03932-5
16. Zou H., Hastie T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B, Vol. 67, No. 2, pp. 301—320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
Supplementary files
Review
For citations:
Mamedli M.O., Umnov A.V. Real estate valuation based on big data. Voprosy Ekonomiki. 2022;(12):118-136. (In Russ.) https://doi.org/10.32609/0042-8736-2022-12-118-136