Staff Software Engineer at Yahoo · IEEE Senior Member
On-device AI, mobile SDK architecture, speech signal processing
We present DisfluoSDK, an on-device framework for real-time speech disfluency detection and voice stress analysis on iOS. CNN models (617K params, 1.2 MB) achieve sub-millisecond CoreML inference. Evaluated on SEP-28K with episode-grouped cross-validation.
Large-scale analysis (N=14,645) showing all correlations between voice stress features and disfluency labels are negligible (|r| < 0.05, Cohen's d < 0.04). Supports separating stress and disfluency assessment in multimodal systems.
Welford-based adaptive baseline algorithm for per-speaker voice stress calibration. Fixed thresholds overestimate stress (61.4% high-stress); adaptive approach produces symmetric distribution. YIN pitch detection achieves 98.1% F0 rate.
A system for runtime DEX loading on Android that solved the 65K method limit for mobile ad mediation SDKs. Deployed at Appodeal, serving millions of apps.