Package: whisper 0.3.0

Troy Hernandez

whisper: Native R 'torch' Implementation of 'OpenAI' 'Whisper'

Speech-to-text transcription using a native R 'torch' implementation of 'OpenAI' 'Whisper' model <https://github.com/openai/whisper>. Supports multiple model sizes from tiny (39M parameters) to large-v3 (1.5B parameters) with integrated download from 'HuggingFace' <https://huggingface.co/> via the 'hfhub' package. Provides automatic speech recognition with optional language detection and translation to English. Audio preprocessing, mel spectrogram computation, and transformer-based encoder-decoder inference are all implemented in R using the 'torch' package.

Authors:Troy Hernandez [aut, cre], cornball.ai [cph], OpenAI [cph]

whisper_0.3.0.tar.gz
whisper_0.3.0.zip(r-4.7)whisper_0.3.0.zip(r-4.6)whisper_0.3.0.zip(r-4.5)
whisper_0.3.0.tgz(r-4.6-any)whisper_0.3.0.tgz(r-4.5-any)
whisper_0.3.0.tar.gz(r-4.7-any)whisper_0.3.0.tar.gz(r-4.6-any)
whisper_0.3.0.tgz(r-4.6-emscripten)
manual.pdf |manual.html
card.svg |card.png
whisper/json (API)
NEWS

# Install 'whisper' in R:
install.packages('whisper', repos = c('https://cornball-ai.r-universe.dev', 'https://cloud.r-project.org'))

Bug tracker:https://github.com/cornball-ai/whisper/issues

On CRAN:

Conda:

4.45 score 7 stars 9 scripts 544 downloads 14 exports 35 dependencies

Last updated from:1a34db6b99. Checks:9 OK. Indexed: yes.

TargetResultTimeFilesSyslog
linux-devel-x86_64OK137
source / vignettesOK174
linux-release-x86_64OK144
macos-release-arm64OK130
macos-oldrel-arm64OK89
windows-develOK129
windows-releaseOK92
windows-oldrelOK91
wasm-releaseOK106

Exports:audio_to_meldetect_languagedownload_whisper_modellist_downloaded_modelslist_whisper_modelsload_audioload_whisper_modelmodel_existstranscribewhisper_configwhisper_devicewhisper_dtypewhisper_pipelinewhisper_tokenizer

Dependencies:askpassavbitbit64callrclicorocurldescfarverfilelockfsgluehfhubhttrjsonlitelabelinglifecyclemagrittrmimeopensslprocessxpsR6RColorBrewerRcpprlangsafetensorsscalessystorchtriebeardurltoolsviridisLitewithr

Readme and manuals

Help Manual

Help pageTopics
Apply BPE Mergesapply_bpe
Apply Timestamp Token Rulesapply_timestamp_rules
Get Audio Durationaudio_duration
Convert Audio to Mel Spectrogramaudio_to_mel
Beam Search Decodebeam_search_decode
Build Reverse Byte Decoderbuild_byte_decoder
Convert Byte to BPE Tokenbyte_to_token
Clean Transcribed Textclean_text
Compression Ratiocompression_ratio
Compute STFT Magnitudecompute_stft
Word-Level Timestamp Alignmentcompute_word_timestamps
Copy Weight if Existscopy_if_exists
Create Decoder from Configcreate_decoder
Create Encoder from Configcreate_encoder
Create Mel Filterbank (Fallback)create_mel_filterbank_fallback
Decode BPE Bytes Back to Textdecode_bpe_bytes
Decode Timestamp Tokendecode_timestamp
Decode with Temperature Fallbackdecode_with_fallback
Language Detectiondetect_language
Detect Language from Mel Spectrogramdetect_language_from_mel
Detect Language from Pipelinedetect_language_from_pipeline
Download Tokenizer Files from HuggingFacedownload_tokenizer_files
Download Model from HuggingFacedownload_whisper_model
DTW Alignmentdtw_align
Ensure Tokenizer Files are Downloadedensure_tokenizer_files
Expand KV Cache for Beam Searchexpand_kv_cache
Extract Segments with Timestampsextract_segments
Forced Decodeforced_decode
Get Initial Decoder Tokensget_initial_tokens
Get Model Cache Pathget_model_path
Get Path to Model Weightsget_weights_path
Greedy Decodinggreedy_decode
Group Subword Tokens into Wordsgroup_into_words
Convert Hz to Mel Scalehz_to_mel
Check if Token is Timestampis_timestamp_token
List Downloaded Modelslist_downloaded_models
List Available Modelslist_whisper_models
Load and Preprocess Audioload_audio
Load Decoder Weightsload_decoder_weights
Load Encoder Weightsload_encoder_weights
Load Pre-computed Mel Filterbankload_mel_filterbank
Load Whisper Modelload_whisper_model
Load Weights from Safetensorsload_whisper_weights
1D Median Filtermedfilt1
Convert Mel Scale to Hzmel_to_hz
Check if Model is Downloadedmodel_exists
Pad or Trim Audio to Fixed Lengthpad_or_trim
Parse Device Argumentparse_device
Parse Dtype Argumentparse_dtype
Rearrange KV Cache by Beam Indicesrearrange_kv_cache
Sample Decodesample_decode
Split Long Audio into Chunkssplit_audio
Decode Token IDs to Texttokenizer_decode
Encode Text to Token IDstokenizer_encode
Transcribe Audiotranscribe
Transcribe Single Chunktranscribe_chunk
Transcribe Long Audiotranscribe_long
Whisper Encoderwhisper_attention
Whisper Model Configurationswhisper_config
Text Decoderwhisper_decoder
Whisper Decoderwhisper_decoder_layer
Device and Dtype Managementwhisper_device
Get Default Dtypewhisper_dtype
Audio Encoderwhisper_encoder
Encoder Layerwhisper_encoder_layer
Get Language Code from Token IDwhisper_lang_from_id
Get Language Token IDwhisper_lang_token
Whisper Language Tablewhisper_language_table
Whisper Modelwhisper_model
Whisper Transcriptionwhisper_pipeline
Audio Preprocessing for WhisperWHISPER_SAMPLE_RATE
Special Token IDswhisper_special_tokens
Whisper BPE Tokenizerwhisper_tokenizer