Weighting a snowball sample through a pseudo-calibration: the case study of same-sex civil unions in Italy
DOI:
https://doi.org/10.71014/sieds.v79i1.295Keywords:
snowballing, sampling, weighting, pseudo-calibration, calibrationAbstract
This paper presents an approach to addressing the challenges of selection bias and non-probabilistic characteristics of snowball sampling design, particularly in the context of social research involving hidden or hard-to-reach populations. The primary aim is to refine the weighting methodology of snowball sampling by introducing the pseudo-calibration technique to adjust the direct weights, equal to one, according to some available auxiliary variables, with the ultimate goal of producing more reliable and unbiased estimates.
The data for this case study is derived from a snowball sample survey, called the “Over the Rainbow” project, carried out on Instagram users who tag their photos with popular LGBTQ+ community hashtags. The username list is collected using web-scraping tools that identify relevant users. To address the sampling design's limitations, the study employs calibration and post-stratification methods. Calibration involves adjusting the weights of the sample data to match known population totals, while post-stratification involves dividing the sample into subgroups that align with known demographic distributions.
The proposed sampling weights are benchmarked on the Istat survey on all civil unions between same-sex couples, celebrated by Italian municipalities since 2016; this known totals source represents a reliable external reference to adopt sampling weights. The expected results of this study are twofold. First, the application of calibration should yield sample weights that are more representative of the target population, by aligning the sample distribution with known population totals. Second, post-stratification is anticipated to define pseudo-weights, adjusting the direct weights equal to one, refining the sample by ensuring that the subgroups within the sample correspond proportionally to those in the broader population. The combination of these methods is expected to significantly reduce the biases associated with traditional snowball sampling and attribute sampling weights not just equal to 1, as usually happens in snowball sampling.
The paper contributes to the statistical and social research field by offering a methodologically sound approach to improving the accuracy of snowball sampling designs, with practical implications for studying hard-to-reach populations on social media platforms.
References
ABDUL-QUADER A. S., HECKATHORN D. D., SABIN K., SAIDEL T. 2006. Implementation and analysis of respondent-driven sampling: lessons learned from the field. Journal of Urban Health, No. 83, pp. 1-5 DOI: https://doi.org/10.1007/s11524-006-9108-8
BAKER R., BRICK J. M., BATES N. A., BATTAGLIA M., COUPER M. P., DEVER J. A., TOURANGEAU R. 2013. Summary report of the AAPOR task force on non-probability sampling. Journal of Survey Statistics and Methodology, Vol. 1, No. 2, pp. 90-143 DOI: https://doi.org/10.1093/jssam/smt008
DE ROSA E., DE VITIIS C., INGLESE F., VITALINI A. 2020. Il web-Respondent Driven Sampling per lo studio della popolazione LGBT+, RIEDS - The Italian Journal of Economic, Demographic and Statistical Studies, Vol. 74, No. 1, pp. 73-84
DEVILLE J.C., SARNDAL C.E., 1992. Calibration Estimators in Survey Sampling, Journal of the American Statistical Association, Vol. 87, No. 418, pp. 376-382 DOI: https://doi.org/10.1080/01621459.1992.10475217
ELLIOTT M. R. 2009. Combining Data from Probability and Non‐Probability Samples Using Pseudo‐Weights. Survey Practice, Vol. 2, No. 6 DOI: https://doi.org/10.29115/SP-2009-0025
GOLINI N., RIGHI P. 2024. Integrating probability and big non-probability samples data to produce Official Statistics. Statistical Methods & Applications, Vol. 33, No. 2, pp. 555-580 DOI: https://doi.org/10.1007/s10260-023-00740-y
HECKATHORN D. D. 1997. Respondent-driven sampling: a new approach to the study of hidden populations. Social problems,Vol. 44, No. 2, pp. 174-199 DOI: https://doi.org/10.1525/sp.1997.44.2.03x0221m
KIM K.S. 2024. Methodology of Non-Probability Samples Through Data Integration. American Journal of Biomedical Science & Research. Vol. 21, No. 5 DOI: https://doi.org/10.34297/AJBSR.2024.21.002880
KISH L. 1992. Weighting for unequal Pi. Journal of Official Statistics, No. 8, pp. 183-200
ISTAT. 2023. Matrimoni e unioni civili in ripresa ma ancora non ai livelli pre-pandemia. Statistiche Report
SHAFIE T. 2010. Design-based estimators for snowball sampling. Available at SSRN 2471006 DOI: https://doi.org/10.2139/ssrn.2471006
SNIJDERS T. A. 1992. Estimation on the basis of snowball samples: how to weight?. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique, Vol. 36, No. 1, pp. 59-70 DOI: https://doi.org/10.1177/075910639203600104
TERRIBILI, M. D. 2022. Surveying the LGBTQ population (s) through social media. AG About Gender - International Journal of Gender Studies, Vol. 11, No. 21
VITALINI, A. 2010. L'uso delle reti sociali per la costruzione di campioni probabilistici: possibilità e limiti per lo studio di popolazioni senza lista di campionamento. Milan: Università Cattolica del Sacro Cuore
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Marco Dionisio Terribili

This work is licensed under a Creative Commons Attribution 4.0 International License.