LLM Chatbot Enhances Primary-to-Specialist Care Transitions: Randomized Controlled Trial
Summary of Statistical Analysis Methods
This text details the statistical methods used in a study comparing three groups: PreA-only, PreA-human, and No-PreA.Here’s a breakdown:
1. Baseline Covariate Balance:
* methods: ANOVA (for continuous variables) and Chi-squared test (for categorical variables) were used to ensure the groups were comparable at the start of the study.
2. Intergroup Comparisons (Healthcare Delivery):
* Primary Comparison: PreA-only vs. No-PreA. the relative treatment effect was calculated as (difference in means) / (mean of No-PreA group).
* Tests:
* Two-sample Student’s t*-tests (unequal variances) for approximately normal data.
* Mann–Whitney *U-test for skewed distributions.
* Significance: *P* < 0.05 (two-tailed).
* Multiple Comparisons: Benjamini-Hochberg procedure was used to adjust for multiple testing.
* Subgroup Analysis: Consistency of findings was checked across demographic and socioeconomic subgroups.
* Software: Python v.3.7 and R v.4.3.0.
3. Matched-Pairs Analysis (Physician Workload & patient Waiting Times):
* Physician Workload: Matched pairs based on medical specialty, age group, sex, and professional title. Outcome: number of patients seen per shift. Wilcoxon signed-rank test used for comparison.
* Patient Waiting Times: Matched pairs based on medical specialty, age group, sex, professional title, and working week (instead of shift). Matching also considered the number of patients seen per shift. wilcoxon signed-rank test used for comparison.
* Significance: Two-sided Wilcoxon signed-rank tests.
4. Analysis of Clinical Notes:
* Classification Analysis: To detect differences in clinical notes between PreA-assisted groups and the No-PreA group.
* Data: Randomly selected subset of notes (n=291, 285, 300 for each group respectively).
* Method: A binary classifier was trained to distinguish PreA-assisted notes from No-PreA notes.
* Data Split: Training and test sets (2:1 ratio).
* performance Metric: F1 score (harmonic mean of precision and recall).
* Significance: A statistically significant F1 score exceeding a baseline (ΔF1 > 0.02) would suggest distinguishable clinical decision-making patterns. The baseline represents performance under the null hypothesis of no difference between groups (random classification).
In essence, the study employs a combination of standard statistical tests (t-tests, Mann-Whitney U, ANOVA, Chi-squared) alongside more specialized methods like matched-pairs analysis and machine learning (binary classification) to rigorously evaluate the impact of the PreA intervention.
