LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices


Reply
  Search this Thread
Old 02-01-2023, 10:24 PM   #1
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,762

Rep: Reputation: 263Reputation: 263Reputation: 263
Whisper: a speech transcriber you can download


This guy in The New Yorker thinks highly of it: https://www.newyorker.com/tech/annal...modular-future .

I haven't been able to get it to run yet. I get
Quote:
whisper_model_load: invalid model data (bad magic)

https://github.com/ggerganov/whisper.cpp

Later:

I got it to work: one needs to download a database first to build successfully. I also can't make the '-m' switch, which points to the directory of the databases, to work, but run in its directory.

It works well: it gets many rare words, punctuates mostly correctly, which means it gets the ends of sentences correct. It identifies quotes, usually correctly. It gets most proper nouns (i.e., capitalized) correct.

Last edited by RandomTroll; 02-05-2023 at 08:34 PM. Reason: More information.
 
Old 02-07-2023, 07:03 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 14,887

Rep: Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060
Very interesting. Runs on a RazPi, huh? Two questions:
  1. Where do you get the database?
  2. Do you know how big a chunk of speech it will take at a time?
One other thing: is it half or full duplex? Will it translate and listen at the same time?
 
Old 02-07-2023, 04:16 PM   #3
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,696

Rep: Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025Reputation: 2025
It seems like this only makes sense to use on Macs?

https://github.com/ggerganov/whisper.cpp/issues/430
Quote:
Whisper is the original speech recognition model created and released by OpenAI. It is implemented in Python and supports running both on the CPU and on the GPU.

whisper.cpp is a custom inference implementation of the same model. It's implemented in C/C++ and runs only on the CPU. It tries to provide the same functionality as the original, but there are differences in the implementation.

In terms of speed, Whisper running on a GPU would in general be more efficient than the other options - multiple times faster on modern GPUs. However, if you want to run the model on a CPU, in some cases whisper.cpp can give you advantage. For example, currently on Apple Silicon, whisper.cpp should be faster.

In terms of accuracy, Whisper is the "gold standard". whisper.cpp should be similar and sometimes worse.
 
Old 02-08-2023, 06:53 AM   #4
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,762

Original Poster
Rep: Reputation: 263Reputation: 263Reputation: 263
Quote:
Originally Posted by business_kid View Post
Where do you get the database?
From the same place I got the source. I didn't read the README before I built the first time.

Quote:
Originally Posted by business_kid View Post
Do you know how big a chunk of speech it will take at a time?
I've used it on hour-long files. It takes 4 hours on my 14-year-old single CPU 4 processor computer for an hour's worth of speech.

Quote:
Originally Posted by business_kid View Post
is it half or full duplex? Will it translate and listen at the same time?
Half.

Quote:
Originally Posted by ntubski View Post
It seems like this only makes sense to use on Macs?
I use it on not-Mac.
 
Old 02-08-2023, 12:03 PM   #5
mjolnir
Member
 
Registered: Apr 2003
Posts: 802

Rep: Reputation: 88
Will it work from a list of files or would a person have to script it to do so? Thanks for posting this, it sounds very interesting.
 
Old 02-08-2023, 12:12 PM   #6
suramya
Member
 
Registered: Jan 2022
Location: Earth
Distribution: Debian
Posts: 246

Rep: Reputation: 100Reputation: 100
Thanks for sharing. This looks very interesting.
 
Old 02-08-2023, 11:16 PM   #7
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,762

Original Poster
Rep: Reputation: 263Reputation: 263Reputation: 263
Quote:
Originally Posted by mjolnir View Post
Will it work from a list of files or would a person have to script it to do so?
I don't know. How's thor?
 
Old 02-09-2023, 07:42 AM   #8
mjolnir
Member
 
Registered: Apr 2003
Posts: 802

Rep: Reputation: 88
Quote:
Originally Posted by RandomTroll View Post
I don't know. How's thor?
Unworthy.
 
Old 02-21-2023, 05:57 AM   #9
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,762

Original Poster
Rep: Reputation: 263Reputation: 263Reputation: 263
After 20 transcriptions of hour-long programs that took about 4 hours each, the 21st took 10, the 22nd took 6 hours to transcribe 5 minutes before I killed it. I downloaded the latest and it seems to be working as well as it did before. Odd.
 
Old 02-21-2023, 12:33 PM   #10
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 14,887

Rep: Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060
4 hours working on 1 hour of speech? I doubt if I'll try it on a RazPi, then.

I got ffmpeg equipped on the Razpi. But whereas my box re-encodes 1080p videos at about 3.75x, the RazPi only achieved 0.2x. So it would take 5 hours to re-encode a 1 hour video.
 
Old 02-21-2023, 11:32 PM   #11
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,762

Original Poster
Rep: Reputation: 263Reputation: 263Reputation: 263
Quote:
Originally Posted by mjolnir View Post
Will it work from a list of files or would a person have to script it to do so?
It accepts multiple input files.
Quote:
Originally Posted by business_kid View Post
4 hours working on 1 hour of speech? I doubt if I'll try it on a RazPi,
I have a 14-year-old laptop. I hope a modern Pi is more powerful. It won't hurt to try.
 
Old 02-22-2023, 04:15 AM   #12
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 14,887

Rep: Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060Reputation: 2060
Nope, the Pi will hardly be more powerful

A twin core laptop will have two threads per core, and so have a maximum of 400% in top. The Pi is 4 core but only does 1 thread per core. Your laptop will be clocked faster. The cores sare also more powerful.

Arm charges a premium for it's beefy cores so people are buying A-72 & A-76 cores because you can bundle those into price sensitive SoCs. They are inclined to be fabricated in klunky (by today's standard) wafer fab lines. So current & wattage increases with speed. There's half a motherboard in those SoCs.
 
Old 02-23-2023, 06:29 PM   #13
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,762

Original Poster
Rep: Reputation: 263Reputation: 263Reputation: 263
Quote:
Originally Posted by business_kid View Post
Nope, the Pi will hardly be more powerful
You can use a smaller model. I use the base. You can try the tiny model, which is half the size.

Last edited by RandomTroll; 02-23-2023 at 11:06 PM. Reason: Corrected.
 
Old 02-26-2023, 10:21 PM   #14
RandomTroll
Senior Member
 
Registered: Mar 2010
Distribution: Slackware
Posts: 1,762

Original Poster
Rep: Reputation: 263Reputation: 263Reputation: 263
I found a glitch: it takes huge amounts of time on singing that is close to intelligibility. Unintelligible singing it codes as music without taking a lot of time; clear singing it transcribes about as well as talking. It has a switch, -ot , that sets the point in the file at which it starts transcribing. I kill a process hung up on singing and restart it after the singing.
 
Old 03-02-2023, 06:35 AM   #15
mjolnir
Member
 
Registered: Apr 2003
Posts: 802

Rep: Reputation: 88
@RandomTroll - Thanks for the on-going review.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Got a PRISM and Boundless Informant problem? Whisper and TOR can help LXer Syndicated Linux News 1 06-11-2013 01:00 PM
trying to install Transcriber/ tarz.g/extracted now what? sixathome Linux - Software 8 10-25-2012 04:34 AM
Hello! (Introduction, Transcriber, VLC, back up routine (verbose):-) soontobenerd LinuxQuestions.org Member Intro 1 03-12-2010 04:25 AM
Free music transcriber for openSuse?.. mariotski Linux - Software 2 05-01-2007 04:54 PM
LXer: Microsoft's anti-Linux whisper game LXer Syndicated Linux News 3 04-21-2007 07:02 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > General

All times are GMT -5. The time now is 07:49 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration