Mi Casa no Es Tu Casa: An Agile Strategy to Generate Synthetic Data to Overcome Security Challenges in Mexico
Journal
New Challenges in Software Engineering : volume 1
ISSN
1860-949X
1860-9503
Publisher
Springer Nature Switzerland
Date Issued
2025
Author(s)
Rodríguez Pueblita, José Carlos
Díaz, Edgar Oswaldo
Type
text::book::book part
Abstract
This document describes an agile strategy in statistical analysis to generate synthetic data to overcome increasing obstacles to carry out face-to-face surveys in Mexico, such as increasing insecurity, limited access to certain areas controlled by organized crime and budgetary constraints. We use two data sources: the 2020 Income-Expenditure Survey or ENIGH by its acronym in Spanish, and the 2020 Population and Housing Census, or CPyV, both carried out by the National Institute of Statistics and Geography of Mexico (INEGI), and several statistical learning techniques such as PCA, clustering, random forest and classification methods to generate granular synthetic data with scientific, policy and commercial uses. The result is an algorithm that allows characterizing the socioeconomic level and the income and expenditure profiles of urban households at block level for all the country, and we suggest metrics to validate the synthetic data due to the impossibility of having disaggregated data to validate our results. This agile strategy can be replicated for various contexts where data layers that satisfy certain basic conditions. ©The authors ©Springer.
License
Acceso Restringido
How to cite
Pueblita, J.C.R., Díaz, E.O. (2025). Mi Casa no Es Tu Casa: An Agile Strategy to Generate Synthetic Data to Overcome Security Challenges in Mexico. In: Mejía, J., Muñoz, M., Rocha, A., Espinosa-Faller, F.J., Trejo-Sanchez, J.A. (eds) New Challenges in Software Engineering. Studies in Computational Intelligence, vol 1209. Springer, Cham. https://doi.org/10.1007/978-3-031-90310-6_21
