Nazar Kozak

Staff Software Engineer at Yahoo · IEEE Senior Member
On-device AI, mobile SDK architecture, speech signal processing

Publications

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device

Nazar Kozak · Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processing, 2026 Under review

A 616K-parameter CNN trained on SEP-28k that predicts whether the next three-second window will contain a disfluency. Stratified evaluation reveals severity-selective predictive signal: block prediction AUC 0.601 [0.554, 0.651] and sound-rep AUC 0.617 [0.567, 0.667] (95% bootstrap CIs exclude chance) while fillers and word-reps stay at chance. Cross-corpus zero-shot transfer to pediatric Children-Who-Stutter cohort at AUC 0.66. Lossless export to CoreML / ONNX / TFLite, 0.25 ms Apple Neural Engine latency on iPhone 17 Pro Max.

On-Device Multi-Type Disfluency Detection with Sub-Millisecond Inference on Apple Silicon

Nazar Kozak · Submitted to Computer Speech & Language (Elsevier), 2026 Under review

DisfluoSDK, an on-device framework for real-time speech disfluency detection on iOS. CNN models (617K params, 1.2 MB) achieve sub-millisecond CoreML inference. 5-class evaluation on SEP-28K with episode-grouped cross-validation; cross-platform export fidelity verified across PyTorch and CoreML.

Acoustic Voice-Stress Markers Are Empirically Separable from Speech-Disfluency Labels at Clip Level: A Large-Scale Equivalence and Predictive-Baseline Analysis on SEP-28K

Nazar Kozak · Submitted to Speech Communication (Elsevier), 2026 Under review

Large-scale analysis (N=14,645) showing voice stress markers are orthogonal to disfluency labels. TOST equivalence testing rejects non-equivalence for all 20 correlations at Bonferroni-adjusted α=0.0025. Predictive baselines under episode-grouped 5-fold CV land at chance (AUC 0.51-0.55, 95% CIs cover 0.5).

Adaptive Baseline Calibration for Voice Stress Assessment

Nazar Kozak · engrXiv Preprint, 2026 Preprint

Welford-based adaptive baseline algorithm for per-speaker voice stress calibration. Fixed thresholds overestimate stress (61.4% high-stress); adaptive approach produces symmetric distribution. YIN pitch detection achieves 98.1% F0 rate.

Dynamic Multidex Loading for Cross-Platform SDK Integration

Nazar Kozak · Zenodo, 2026 Published

A system for runtime DEX loading on Android that solved the 65K method limit for mobile ad mediation SDKs. Deployed at Appodeal, serving millions of apps.

Software & Models

whisper-small-disfluent-smoothed-lora

LoRA adapter for OpenAI Whisper-small · ASR for stuttered / disfluent speech

Smoothed transcription mode — recovers the speaker's intended words and drops disfluencies. WER 20.46% on FluencyBank Adults Who Stutter (vs 26.04% baseline, −21.4% relative). 13.5 MB adapter, applies on top of base Whisper-small.

whisper-small-disfluent-verbatim-lora

LoRA adapter for OpenAI Whisper-small · verbatim ASR (preserves disfluencies)

Verbatim transcription mode — preserves filled pauses (uh, um), exact word repetitions, and partial-word fragments for clinical / research use. WER 20.29% on held-out validation (−22% relative).

Recognition & Service

Google Scholar · ORCID: 0009-0001-8858-6098 · GitHub · HuggingFace · nzrkzk@gmail.com