Audio

Misc

  • Packages
    • {carelesswhisper} - Speech-to-Text; Automatic speech recognition in R using whisper.cpp; Translation; Can use huggingface and ggerganov models; no dependencies
  • Strong Baseline Model: ResNet/EffNet
    • Typically these are image models, but converting the audio to a spectrogram and using these models produces good results
    • Also see this guide for suitable baseline models: link