ENHANCING PUBLIC HEALTH SURVEILLANCE: SARIMAX MODEL INCORPORATING BAIDU SEARCH INDEX FOR HCV PREDICTION IN CHINA

Enhancing public health surveillance: SARIMAX model incorporating Baidu search index for HCV prediction in China

Enhancing public health surveillance: SARIMAX model incorporating Baidu search index for HCV prediction in China

Blog Article

Abstract Background Hepatitis C virus (HCV) has become a serious global public health issue.Existing study has confirmed the potential role of internet search data in the prediction of infectious diseases.This study aims to explore predictive models suitable for HCV, providing references and recommendations for actively responding to the World Health Organization’s strategy to eliminate HCV by 2030.Methods We collected the publicly available daily reported case numbers of HCV and the daily search volume of HCV-related keywords in Baidu Search Index (BSI) from January 2011 to September 2023.Identify keywords highly correlated with HCV using Spearman rank correlation and time series cross-correlation analysis to construct a comprehensive search index.

Finally, seasonal autoregressive integrated moving average (SARIMA) and seasonal autoregressive integrated moving average with exogenous lycogel variables (SARIMAX) models were developed based on monthly HCV case numbers and the comprehensive search index (CSI).The performance of these predictive models was evaluated using mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE).Results From January 2011 to September 2023, China reported a cumulative total of 2,949,160 HCV cases, with a monthly average case number of 19,276.The incidence of HCV is highest in March each year, with a slight decrease in February, exhibiting a seasonal cycle pattern.Influenced by the COVID-19 pandemic, the incidence of HCV showed a notable decrease in 2020 and 2022, but overall, it has been on click here an upward trend.

After correlation analysis, a total of five keywords were included, with a potential time lag of 0 months for the keywords.The selected SARIMA(0,1,1)(2,1,1)(12) + CSI(Lag = 0) model produced lower errors in both fitting and forecasting than the SARIMA(0,1,1)(2,1,1)(12) model: MAE (706.7052 versus 759.1066 and 1754.05 versus 3940.

86), RMSE (973.3811 versus 1123.343 and 2733.02 versus 4846.49), MAPE (3.

72% versus 4.08% and 0.12% versus 0.24%).Conclusions The findings suggest that the SARIMA(0,1,1)(2,1,1)(12) + CSI(Lag = 0) model exhibits superior predictive performance, maintaining accuracy even when faced with the public health emergency of COVID-19.

The proposed SARIMAX model in the study will provide more robust support for achieving the 2030 HCV dynamic zero strategy.

Report this page