GeneralThis forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I got it to work: one needs to download a database first to build successfully. I also can't make the '-m' switch, which points to the directory of the databases, to work, but run in its directory.
It works well: it gets many rare words, punctuates mostly correctly, which means it gets the ends of sentences correct. It identifies quotes, usually correctly. It gets most proper nouns (i.e., capitalized) correct.
Last edited by RandomTroll; 02-05-2023 at 08:34 PM.
Reason: More information.
Whisper is the original speech recognition model created and released by OpenAI. It is implemented in Python and supports running both on the CPU and on the GPU.
whisper.cpp is a custom inference implementation of the same model. It's implemented in C/C++ and runs only on the CPU. It tries to provide the same functionality as the original, but there are differences in the implementation.
In terms of speed, Whisper running on a GPU would in general be more efficient than the other options - multiple times faster on modern GPUs. However, if you want to run the model on a CPU, in some cases whisper.cpp can give you advantage. For example, currently on Apple Silicon, whisper.cpp should be faster.
In terms of accuracy, Whisper is the "gold standard". whisper.cpp should be similar and sometimes worse.
After 20 transcriptions of hour-long programs that took about 4 hours each, the 21st took 10, the 22nd took 6 hours to transcribe 5 minutes before I killed it. I downloaded the latest and it seems to be working as well as it did before. Odd.
4 hours working on 1 hour of speech? I doubt if I'll try it on a RazPi, then.
I got ffmpeg equipped on the Razpi. But whereas my box re-encodes 1080p videos at about 3.75x, the RazPi only achieved 0.2x. So it would take 5 hours to re-encode a 1 hour video.
A twin core laptop will have two threads per core, and so have a maximum of 400% in top. The Pi is 4 core but only does 1 thread per core. Your laptop will be clocked faster. The cores sare also more powerful.
Arm charges a premium for it's beefy cores so people are buying A-72 & A-76 cores because you can bundle those into price sensitive SoCs. They are inclined to be fabricated in klunky (by today's standard) wafer fab lines. So current & wattage increases with speed. There's half a motherboard in those SoCs.
I found a glitch: it takes huge amounts of time on singing that is close to intelligibility. Unintelligible singing it codes as music without taking a lot of time; clear singing it transcribes about as well as talking. It has a switch, -ot , that sets the point in the file at which it starts transcribing. I kill a process hung up on singing and restart it after the singing.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.