Preview

Voprosy Ekonomiki

Advanced search
Open Access Open Access  Restricted Access Subscription Access

Real estate valuation based on big data

https://doi.org/10.32609/0042-8736-2022-12-118-136

Abstract

The paper considers the application of the web scrapping and machine learning algorithms for the assessment of the real estate price on the secondary housing market in Moscow. For this, we collect and process the data from the CIAN website and the data from “Reforma GKH”. To evaluate real estate objects, we consider such machine learning algorithms as Elastic Net, Random Forest and Gradient Boosting. We also apply Shapley vector-based approach to interpret the results of the black-box algorithms. The results suggest that the use of black-box algorithms in assessing the price of apartments on the Moscow secondary housing market allows to obtain more accurate price estimates both for different price segments and for the sample as a whole. At the same time, Gradient Boosting has demonstrated the best accuracy among other algorithms. Interpretation based on the Shapley vector shows that the total area, year of construction, ceiling height, renovation, as well as monolithic construction technology had a positive effect on the price. The price is negatively affected by the number of floors in the house, the possibility of mortgage and lack of repairs. Developed methodology can be applied in real estate insurance, mortgage, determination of cadastral value of real estate and others.

About the Authors

M. O. Mamedli
HSE University
Russian Federation

Mariam O. Mamedli

Moscow



A. V. Umnov
SberBank
Russian Federation

Аndrey V. Umnov

Moscow



References

1. Balash V., Balash O., Harlamov A. (2011). A spatial econometric analysis of the housing market. Applied Econometrics, No. 22, pp. 62—77. (In Russian).

2. Goncharov G., Natkhov T. (2020). Textual analysis of pricing in the Moscow residential real estate market. HSE Economic Journal, No. 1, pp. 101—116. (In Russian). https://doi.org/10.17323/1813-8691-2020-24-1-101-116

3. Leyfer L., Chernaya E. (2020). Mass appraisal of real estate objects based on machine learning technologies. Analysis of various methods for assessing the market value of apartments. Imushchestvennye Otnosheniya v Rossiyskoy Federatsii, No. 3, pp. 32—42. (In Russian).

4. Ozhegov E., Kosolapov N., Pozolotina Y. (2017). On dependence between housing value and school characteristics. Applied Econometrics, No. 47, pp. 28—48. (In Russian).

5. Bischl B. et al. (2021). Hyperparameter optimization: Foundations, algorithms, best practices and open challenges. Unpublished manuscript. https://doi.org/10.48550/arXiv.2107.05847

6. Breiman L. (2001). Random forests. Machine Learning, Vol. 45, pp. 5—32. https://doi.org/10.1023/A:1010933404324

7. Friedman J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, Vol. 29, No. 5, pp. 1189—1232. https://doi.org/10.1214/aos/1013203451

8. Friedman J. H. (2002). Stochastic gradient boosting. Computational Statistics & Data Analysis, Vol. 38, No. 4, pp. 367—378. https://doi.org/10.1016/S0167-9473(01)00065-2

9. Johannemann J., Hadad V., Athey S., Wager S. (2019). Sufficient representations for categorical variables. Unpublished manuscript. https://doi.org/10.48550/arXiv.1908.09874

10. Loberto M., Luciani A., Pangallo M. (2018). The potential of big housing data: Аn application to the Italian real-estate market. Bank of Italy Working Paper, No. 1171. https://doi.org/10.2139/ssrn.3176962

11. Merrick L., Taly A. (2020). The explanation game: Explaining machine learning models using Shapley values. In: A. Holzinger, P. Kieseberg, A. Tjoa, E. Weippl (eds.). Machine learning and knowledge extraction. Cham: Springer, pp. 17—38. https:// doi.org/10.1007/978-3030-57321-8_2

12. Moosavi V. (2017). Urban data streams and machine learning: A case of Swiss real estate market. Unpublished manuscript. https://doi.org/10.48550/arXiv.1704.04979

13. Myttenaere A., Golden B., Grand B., Rossi F. (2017). Mean absolute percentage error for regression models. Neurocomputing, Vol. 192, pp. 38—48. https://doi.org/10.1016/j.neucom.2015.12.114

14. Nguyen T. (2019). Faster feature selection with a dropping forward-backward algorithm. Unpublished manuscript. https://doi.org/10.48550/arXiv.1910.08007

15. Tchuente D., Nyawa S. (2022). Real estate price estimation in French cities using geocoding and machine learning. Annals of Operations Research, Vol. 308, pp. 571—608. https://doi.org/10.1007/s10479-021-03932-5

16. Zou H., Hastie T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B, Vol. 67, No. 2, pp. 301—320. https://doi.org/10.1111/j.1467-9868.2005.00503.x


Supplementary files

Review

For citations:


Mamedli M.O., Umnov A.V. Real estate valuation based on big data. Voprosy Ekonomiki. 2022;(12):118-136. (In Russ.) https://doi.org/10.32609/0042-8736-2022-12-118-136

Views: 1564


ISSN 0042-8736 (Print)