Adaptive Baseline Calibration for Voice Stress Assessment in Speech Disfluency Monitoring

Nazar Kozak

engrXiv Preprint, April 2026 · Planned submission to Speech Communication

Abstract

Voice stress assessment systems commonly employ fixed thresholds for classifying acoustic features (jitter, shimmer, F0 variability) into stress levels. We show that fixed thresholds produce highly skewed stress score distributions when applied to diverse speakers, with 61.4% of clips scored as high-stress (≥0.8) in the SEP-28K dataset — likely an artifact of inter-speaker vocal variability rather than genuine stress variation. We propose an adaptive baseline algorithm using Welford's online algorithm for per-speaker calibration, followed by exponential moving average tracking. Applied to 14,645 clips with valid pitch estimates, the adaptive approach produces a more symmetric distribution (μ=0.530, σ=0.162) with substantially fewer extreme scores. We additionally report that YIN-based pitch detection achieves 98.1% F0 extraction rate on SEP-28K, compared to 12.1% with naive autocorrelation. We discuss implications for pediatric speech applications, where children's vocal characteristics differ substantially from adults.

Keywords
Voice stress analysis, adaptive baseline, Welford's algorithm, pitch detection, YIN, pediatric speech, speaker normalization, F0 variability
Status
Preprint on engrXiv (April 2026) · Planned submission to Speech Communication
Key Results
Fixed thresholds: 61.4% high-stress (overestimation) · Adaptive baseline: symmetric distribution (μ=0.53, σ=0.16) · YIN: 98.1% F0 rate vs 12.1% naive
Dataset
SEP-28K (14,645 clips with valid pitch estimates)
Author
Nazar Kozak — Kozak Technologies Inc., Los Angeles, CA, USA
Contact
nzrkzk@gmail.com · ORCID

← Back to all publications