Nazar Kozak — Publications

Staff Software Engineer at Yahoo · IEEE Senior Member
On-device AI, mobile SDK architecture, speech signal processing

Publications

Predicting Upcoming Stuttering Events from Three-Second Audio: Stratified Evaluation Reveals Severity-Selective Precursors, and the Model Deploys Fully On-Device

Nazar Kozak · Preprint (arXiv), 2026 Preprint

A 616K-parameter CNN trained on SEP-28k that predicts whether the next three-second window will contain a disfluency. Stratified evaluation reveals severity-selective predictive signal: block prediction AUC 0.601 [0.554, 0.651] and sound-rep AUC 0.617 [0.567, 0.667] (95% bootstrap CIs exclude chance) while fillers and word-reps stay at chance. Cross-corpus zero-shot transfer to pediatric Children-Who-Stutter cohort at AUC 0.66. Lossless export to CoreML / ONNX / TFLite, 0.25 ms Apple Neural Engine latency on iPhone 17 Pro Max.

arXiv:2604.27279 DOI

On-Device Multi-Type Disfluency Detection with Sub-Millisecond Inference on Apple Silicon

Nazar Kozak · Preprint (engrXiv), 2026 Preprint

DisfluoSDK, an on-device framework for real-time speech disfluency detection on iOS. CNN models (617K params, 1.2 MB) achieve sub-millisecond CoreML inference. 5-class evaluation on SEP-28K with episode-grouped cross-validation; cross-platform export fidelity verified across PyTorch and CoreML.

PDF Details engrXiv

Acoustic Voice-Stress Markers Are Empirically Separable from Speech-Disfluency Labels at Clip Level: A Large-Scale Equivalence and Predictive-Baseline Analysis on SEP-28K

Nazar Kozak · Submitted to Speech Communication (Elsevier), 2026 Under review

Large-scale analysis (N=14,645) showing voice stress markers are orthogonal to disfluency labels. TOST equivalence testing rejects non-equivalence for all 20 correlations at Bonferroni-adjusted α=0.0025. Predictive baselines under episode-grouped 5-fold CV land at chance (AUC 0.51-0.55, 95% CIs cover 0.5).

PDF Details engrXiv

Adaptive Baseline Calibration for Voice Stress Assessment

Nazar Kozak · engrXiv Preprint, 2026 Preprint

Welford-based adaptive baseline algorithm for per-speaker voice stress calibration. Fixed thresholds overestimate stress (61.4% high-stress); adaptive approach produces symmetric distribution. YIN pitch detection achieves 98.1% F0 rate.

PDF Details engrXiv

Dynamic Multidex Loading for Cross-Platform SDK Integration

Nazar Kozak · Preprint (Zenodo), 2026 Preprint

A system for runtime DEX loading on Android that addressed the 65K method-reference limit for cross-platform SDK integration. Originally developed at Appodeal.

PDF Details DOI

Software & Models

whisper-small-disfluent-smoothed-lora

LoRA adapter for OpenAI Whisper-small · ASR for stuttered / disfluent speech

Smoothed transcription mode — recovers the speaker's intended words and drops disfluencies. WER 20.46% on FluencyBank Adults Who Stutter (vs 26.04% baseline, −21.4% relative). 13.5 MB adapter, applies on top of base Whisper-small.

HuggingFace Hub

whisper-small-disfluent-verbatim-lora

LoRA adapter for OpenAI Whisper-small · verbatim ASR (preserves disfluencies)

Verbatim transcription mode — preserves filled pauses (uh, um), exact word repetitions, and partial-word fragments for clinical / research use. WER 20.29% on held-out validation (−22% relative).

HuggingFace Hub

Recognition & Service

IEEE Senior Member (No. 101490965) — peer-reviewed elevation
Fellow, IET (Institution of Engineering and Technology) — application under assessment, 2026
Toptal Top 3% Network Member since 2021