Comparative Analysis of Imputation Methods for Missing Environmental Data: A Case Study on Ozone Concentrations
DOI:
https://doi.org/10.37934/ard.134.1.6376Keywords:
Imputation method, missing data, mean-before-after, ozone concentrationsAbstract
Handling missing values is crucial to environmental data analysis since missing datasets can lead to biassed results. Using Weibull distributions, this study compared six single-imputation methods (mean, median, mean-before-after (MBA), cubic interpolation, linear interpolation, last observation carried forward (LOCF)) for estimating missing ozone concentration data in Petaling Jaya, Selangor. The present study simulated data for sample sizes of 50 and 150 with varying missing value percentages (5%, 10%, 15%, 20%, and 25%). The performance of each imputation method was evaluated using prediction accuracy, root mean square error (RMSE) and mean absolute error (MAE). The findings suggested that the MBA approach outperformed all examined cases, followed by linear interpolation and LOCF. Conversely, cubic interpolation, mean, and median substitution approaches performed poorly, especially as the proportion of missing data increased. This study emphasises the critical role of selecting appropriate imputation methods to enable accurate and trustworthy environmental data analysis. The findings can help researchers select efficient approaches for addressing missing values in air quality datasets, thus boosting the reliability of environmental studies.
Downloads
