Staff Software Engineer at Yahoo · IEEE Senior Member
On-device AI, mobile SDK architecture, speech signal processing
A 616K-parameter CNN trained on SEP-28k that predicts whether the next three-second window will contain a disfluency. Stratified evaluation reveals severity-selective predictive signal: block prediction AUC 0.601 [0.554, 0.651] and sound-rep AUC 0.617 [0.567, 0.667] (95% bootstrap CIs exclude chance) while fillers and word-reps stay at chance. Cross-corpus zero-shot transfer to pediatric Children-Who-Stutter cohort at AUC 0.66. Lossless export to CoreML / ONNX / TFLite, 0.25 ms Apple Neural Engine latency on iPhone 17 Pro Max.
DisfluoSDK, an on-device framework for real-time speech disfluency detection on iOS. CNN models (617K params, 1.2 MB) achieve sub-millisecond CoreML inference. 5-class evaluation on SEP-28K with episode-grouped cross-validation; cross-platform export fidelity verified across PyTorch and CoreML.
Large-scale analysis (N=14,645) showing voice stress markers are orthogonal to disfluency labels. TOST equivalence testing rejects non-equivalence for all 20 correlations at Bonferroni-adjusted α=0.0025. Predictive baselines under episode-grouped 5-fold CV land at chance (AUC 0.51-0.55, 95% CIs cover 0.5).
Welford-based adaptive baseline algorithm for per-speaker voice stress calibration. Fixed thresholds overestimate stress (61.4% high-stress); adaptive approach produces symmetric distribution. YIN pitch detection achieves 98.1% F0 rate.
A system for runtime DEX loading on Android that solved the 65K method limit for mobile ad mediation SDKs. Deployed at Appodeal, serving millions of apps.
Smoothed transcription mode — recovers the speaker's intended words and drops disfluencies. WER 20.46% on FluencyBank Adults Who Stutter (vs 26.04% baseline, −21.4% relative). 13.5 MB adapter, applies on top of base Whisper-small.
Verbatim transcription mode — preserves filled pauses (uh, um), exact word repetitions, and partial-word fragments for clinical / research use. WER 20.29% on held-out validation (−22% relative).