Adaptive Baseline Calibration for Voice Stress Assessment in Speech Disfluency Monitoring

Kozak, Nazar

Abstract

Voice stress assessment systems commonly employ fixed thresholds for classifying acoustic features (jitter, shimmer, F0 variability) into stress levels. We show that fixed thresholds produce highly skewed stress score distributions when applied to diverse speakers, with 61.4% of clips scored as high-stress (≥0.8) in the SEP-28K dataset — likely an artifact of inter-speaker vocal variability rather than genuine stress variation. We propose an adaptive baseline algorithm using Welford's online algorithm for per-speaker calibration, followed by exponential moving average tracking. Applied to 14,645 clips with valid pitch estimates, the adaptive approach produces a more symmetric distribution (μ=0.530, σ=0.162) with substantially fewer extreme scores. We additionally report that YIN-based pitch detection achieves 98.1% F0 extraction rate on SEP-28K, compared to 12.1% with naive autocorrelation. We discuss implications for pediatric speech applications, where children's vocal characteristics differ substantially from adults.