LinuxQuestions.org - Finding Corrupt MP3s in Library

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Finding Corrupt MP3s in Library (https://www.linuxquestions.org/questions/linux-newbie-8/finding-corrupt-mp3s-in-library-632053/)

Finding Corrupt MP3s in Library

A while ago, I had a harddrive failure which resulted in me having to rebuild/reinstall the ReiserFS. As a result, some of my MP3 collection was corrupted. I'd say 90%ish of my Mp3s are fine, but there are a few that either won't play, or will gum up Amarok. In either case, there are enough corrupted files that Amarok can't finish building a collection (dies at 98% complete).

Since I have over 6000 songs, and I'd really rather not go through and just re-rip/encode everything, is there some sort of script or other doo-dad that will go through my collection and identify corrupt files for me?

Thanks for your help!

Well you could file them (see $(man file)); any file that doesn't identify as MP3 will most probably be corrupt.

This worked for me:

Code:

file * | grep -v 'MP3 file with' | less -S

Needless to say, it was executed in a dir. of all MP3's. If you have more than one dir. of MP3's, you will have to loop through them -- I couldn't find a recurse option for find.

Thanks. That does help with finding the more corrupt ones. Is there a way to pipe a rm command to remove anything that isn't a valid MP3 file?

Also, is there a way to find those MP3s that checkout ok, but are either cut short or are scrambled on playback?

You could try mp3info. For instance, the following 'find' construct will find all the mp3 files in the current directory (and below), and check them for errors. It prints a message if any corrupt ones are found.

Code:

find . -iname '*.mp3' -exec sh -c \

    'curfname="{}";\

      errors=`mp3info -p "%b" "$curfname"`;\

      if [ $errors -gt 0 ];\

      then \

        echo $curfname has $errors errors; \

      fi ' \;

Change the "echo" to an rm if you really want to delete the files.

Hope this is what you want, but be aware that it will pick up mp3's with even a little corruption. These would probably play okay, so you might want to increase the error check value. Also, it'd probably be better to just move the corrupt ones to a temporary directory for checking manually. Have a play.

Good luck

edit: Changed it according to archtoad's suggestions. Never realised about Konqueror. And it does make it easier to read. Cheers.

beadyallen,
Please do me & all other Konqueror users a favor: edit your post to break your code into multiple lines -- it's not handled right by the browser, it's triggering horizontal scrolling that makes all posts difficult to read. I believe this is a stylesheet problem & I believe it's inherited from vBulletin because other fora have the same problem. AND your code will be easier to read for everyone. TIA.

dlackovic,
Using find will solve the recursion/looping problem. Of course, it's possible that you might want to process your music dirs. one at a time -- that way you can re-rip/encode the bad ones & put the fresh copies where the old ones were.

How do you want handle the replacement problem? Is it necessary to ID them all immediately, or just the ones that "gum up Amarok"?

Thanks everyone for the info.

I was originally planning on going directory by directory, but archtoad6, are you saying that the Find will recurse through all the directories of my music folder? If that's the case I can redirect that echo part of the above command to dump to say, a text file so that then I'd have a list of all the bad files, right?

Try this:

Code:

find . -maxdepth 1 -iname '*.mp3' -exec sh -c 'curfname="{}";\

      errors=`mp3info -p "%b" "$curfname"`;\

      if [ $errors -gt 0 ];\

      then \

        echo $curfname has $errors errors; \

      fi ' \; | tee ./badfiles.log

Run the command in each of the directories you want to check. It'll create a 'badfiles.log' file (as well as printing on the screen. It's the '-maxdepth 1' that stops it traversing directories.

Quote:

Originally Posted by beadyallen (Post 3111242)

Try this:

Code:

find . -maxdepth 1 -iname '*.mp3' -exec sh -c 'curfname="{}";\

      errors=`mp3info -p "%b" "$curfname"`;\

      if [ $errors -gt 0 ];\

      then \

        echo $curfname has $errors errors; \

      fi ' \; | tee ./badfiles.log

Run the command in each of the directories you want to check. It'll create a 'badfiles.log' file (as well as printing on the screen. It's the '-maxdepth 1' that stops it traversing directories.

I also have a large collection that I stream. Occasionally the stream will stop because of one or more bad mp3s. I installed the mp3info package and ran your command, however I get the following error: bash: syntax error near unexpected token `|'. What do I need to do to resolve this?

Hey thanks a bunch, beadyallen. That code works like a dream for me.

Now that I'm looking through the data, I'm noticing that the find didn't log any of the files too corrupt to even be recognized as an mp3. I found that I can make a log file that points out the corrupt files using archtoad6's code:

Code:

File * | grep -v 'MP3 file with' | tee ./corruptfiles

But is there any way to make this recurse through everything, so that I can have 1 corruptfiles file? I'm guessing this could be done with some sort of loop? How would that work?

Quote:

Originally Posted by {BBI}Nexus{BBI} (Post 3111252)

Ignore this, I found the error, I had an extra space.

Quote:

But is there any way to make this recurse through everything, so that I can have 1 corruptfiles file? I'm guessing this could be done with some sort of loop? How would that work?

You could do it with a loop, but why not just use find, like for the others. Maybe I'd better explain a bit about what my find construct does.

Code:

find . -iname '*.mp3' -exec sh -c

The "-iname '*.mp3'" tells find to search for ALL filenames matching *.mp3 (iname just means match upper and lower case, so myfile.Mp3 will get matched as well). Next, '-exec sh -c' tells find that for each match found, execute the command 'sh -c ....'. So it's going to run a shell command for each match. The rest of the code

Code:

curfname="{}";\

      errors=`mp3info -p "%b" "$curfname"`;\

      if [ $errors -gt 0 ];\

      then \

        echo $curfname has $errors errors; \

      fi '

Is just a shell script that detects whether the file is corrupted. The "{}" is a special variable that tells find to substitute the current matched filename at this point. So the script first sets the variable curfname to the currently matched file, the runs mp3info on that file. '-p "%b"' is how you print out the number of errors. Have a look at the man page for mp3info for details.
Anyway, now it's got the number of errors in the file, there's a check to see if 'errors' is more than zero (i.e. there IS an error in the file). If there is, then we output something to standard output.
So, to grab both totally knackered (not even recognized as mp3), and sightly corrupt (a few bad blocks but still 'playable') files, you'd just need to first check if the mp3 was recognized, before you check for bad blocks.
Something like this should work:

Code:

find . -iname '*.mp3' -exec sh -c 'curfname="{}";\

errors=`mp3info -p "%b" "$curfname" 2>&1`; \

notmp3=`echo $errors | grep corrupt`; \

if [ "$notmp3" ]; \

  then echo "$curfname is not even detected as mp3"; \

else \

  if [ $errors -gt 0 ];\

    then echo $curfname has $errors errors; \

  fi; \

fi' \; | tee ./allbad.log

The only change is that there's an extra check to see if the mp3info command was successful. This means you've got to redirect stderr to stdout (hence the 2>&1).
To be honest, the script is getting a bit long, so it might be better to create a separate 'check.sh' file, containing the check lines, and then call that from find, with

Code:

find . -iname '*.mp3' -exec check.sh {} \;

Hope this is what you're looking for.

@beadyallen, your script runs fine and recurses. It does identify corrupt mp3s (i see this in the console output) except at the end all I get is an empty allbad.log file :(

Well all I can say is it works okay for me. Are you copying and pasting from the post, or are you typing it in? Sounds like you've made a typo somewhere. Especially as the output seems to be working, just the log file isn't created.
Are you working from the full command, or have you split it up and created a check.sh?