NEWS
whisper 0.3.0 (2026-03-13)
- Language auto-detection:
transcribe() now defaults to language = NULL,
which detects the spoken language from the audio before decoding. New
exported function detect_language() for standalone language identification.
Breaking: previous default was language = "en". Code relying on the
default now auto-detects instead of assuming English. Pass language = "en"
explicitly to restore old behavior.
- Segment-level and word-level timestamps via DTW alignment
- Beam search decoding with temperature sampling and fallback
- SDPA attention (FlashAttention on GPU)
whisper_pipeline() for cached model reuse across multiple transcriptions
- Hardcoded special token table (eliminates
added_tokens.json download)
- Fixed invalid multibyte string crash in BPE decoder
- Fixed DTW boundary guards and seek loop in
transcribe_chunk()
whisper 0.1.0 (2026-02-06)
- Initial CRAN submission
- Native R torch implementation of OpenAI Whisper
- Support for all model sizes: tiny, base, small, medium, large-v3
- Automatic model download from HuggingFace
- Model-specific special token handling for large-v3 compatibility
- KV caching for efficient autoregressive decoding
- Long audio chunking for files longer than 30 seconds
- Optional timestamp and segment extraction