Kaur S, Larsen E, Harper J, Purandare B, Uluer A, Hasdianda MA, Umale NA, Killeen J, Castillo E, Jariwala S. J Med Internet Res 2023;25:e44410 doi: 10.2196/44410
Abstract
Background:Vocal biomarker–based machine learning approaches have shown promising results in the detection of various health conditions, including respiratory diseases, such as asthma.
Objective:This study aimed to determine whether a respiratory-responsive vocal biomarker (RRVB) model platform initially trained on an asthma and healthy volunteer (HV) data set can differentiate patients with active COVID-19 infection from asymptomatic HVs by assessing its sensitivity, specificity, and odds ratio (OR).
Methods:A logistic regression model using a weighted sum of voice acoustic features was previously trained and validated on a data set of approximately 1700 patients with a confirmed asthma diagnosis and a similar number of healthy controls. The same model has shown generalizability to patients with chronic obstructive pulmonary disease, interstitial lung disease, and cough. In this study, 497 participants (female: n=268, 53.9%; <65 years old: n=467, 94%; Marathi speakers: n=253, 50.9%; English speakers: n=223, 44.9%; Spanish speakers: n=25, 5%) were enrolled across 4 clinical sites in the United States and India and provided voice samples and symptom reports on their personal smartphones. The participants included patients who are symptomatic COVID-19 positive and negative as well as asymptomatic HVs. The RRVB model performance was assessed by comparing it with the clinical diagnosis of COVID-19 confirmed by reverse transcriptase–polymerase chain reaction.
Results:The ability of the RRVB model to differentiate patients with respiratory conditions from healthy controls was previously demonstrated on validation data in asthma, chronic obstructive pulmonary disease, interstitial lung disease, and cough, with ORs of 4.3, 9.1, 3.1, and 3.9, respectively. The same RRVB model in this study in COVID-19 performed with a sensitivity of 73.2%, specificity of 62.9%, and OR of 4.64 (P<.001). Patients who experienced respiratory symptoms were detected more frequently than those who did not experience respiratory symptoms and completely asymptomatic patients (sensitivity: 78.4% vs 67.4% vs 68%, respectively).
Conclusions:The RRVB model has shown good generalizability across respiratory conditions, geographies, and languages. Results using data set of patients with COVID-19 demonstrate its meaningful potential to serve as a prescreening tool for identifying individuals at risk for COVID-19 infection in combination with temperature and symptom reports. Although not a COVID-19 test, these results suggest that the RRVB model can encourage targeted testing. Moreover, the generalizability of this model for detecting respiratory symptoms across different linguistic and geographic contexts suggests a potential path for the development and validation of voice-based tools for broader disease surveillance and monitoring applications in the future.