ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have used 'mp3report' to make a text file listing all my MP3s suitable for import into OO-Calc. It looks like the excerpt shown below. In spite of the appearance of this section, the file is not sorted by Artist/Title. (Fields 2 and 3, if you will.) Obviously, if the actual files were in one directory, there would be no duplicates, but they're in about 30 different directories.
Code:
0001|April Stevens|Teach Me Tiger|01|2.19 MB|128 kbps|44.1 kHz|02:23
0002|Billy Joe Shaver|I'm Going Crazy in 3-4 Time|01|3.23 MB|128 kbps|44.1 kHz|03:31
0003|Billy Joe Shaver|Old Chunk of Coal|01|2.34 MB|128 kbps|44.1 kHz|02:33
0004|Billy Joe Shaver|Serious Souls|01|2.00 MB|128 kbps|44.1 kHz|02:10
0005|Blue Velvet Band|Hitch Hiker|01|3.32 MB|160 kbps|44.1 kHz|02:53
0006|Blue Velvet Band|Ramblin' Man|01|3.25 MB|160 kbps|44.1 kHz|02:50
0007|Blue Velvet Band|Sittin' on Top of the World|01|3.89 MB|160 kbps|44.1 kHz|03:23
0008|Blue Velvet Band|Somebody Else You've Known|01|2.83 MB|160 kbps|44.1 kHz|02:28
0009|Blue Velvet Band|Sweet Moments|01|2.86 MB|160 kbps|44.1 kHz|02:29
0010|Blue Velvet Band|The Knight Upon the Road|01|4.21 MB|160 kbps|44.1 kHz|03:40
0011|Blue Velvet Band|Weary Blues From Waitin'|01|3.51 MB|160 kbps|44.1 kHz|03:03
0012|Blue Velvet Band|You'll Find Her Name Written There|01|3.16 MB|160 kbps|44.1 kHz|02:45
0013|Bonnie Raitt|Let me In|01|3.36 MB|128 kbps|44.1 kHz|03:40
0014|Burl Ives|Time|01|2.70 MB|128 kbps|44.1 kHz|02:57
.
.
.
I would like to have a program or script that will scan the entire file and identify any Artist/Title duplicates.
If necessary, I could of course create another file that contains only that (Artist/Title) data... but if I could scan the file as is, that would be even better.
I've looked thru google and these forums, but suggestions for identifying duplicate files I've seen are based on file size or some sort of hashing scheme rather than file names. I could easily have the same song twice with different file sizes so that won't work.
Up to now, I've moved such a file to Windows, and used MS Access to identify the duplicates. For obvious reasons, I'd like to stop doing that. Suggestions would be appreciated.
I'm pretty sure you're going to need human intervention to positively ID all the dupes, but if the artist and titles are identical for many of them, you could do this:
All right! Those commands, exactly as you gave them, work perfectly.
Code:
debian:~$ grep -f dupes mp3.txt
3612|June Carter Cash|Meeting in the Air|18|1.88 MB|128 kbps|44.1 kHz|02:02
3769|June Carter Cash|Meeting in the Air|19|1.88 MB|128 kbps|44.1 kHz|02:03
4684|Cluster Pluckers|Keep on the Sunny Side|25|2.64 MB|128 kbps|44.1 kHz|02:52
4981|Cluster Pluckers|Keep on the Sunny Side|27|2.65 MB|128 kbps|44.1 kHz|02:53
debian:~$
0001|April Stevens|Teach Me Tiger|01|2.19 MB|128 kbps|44.1 kHz|02:23
0002|Billy Joe Shaver|I'm Going Crazy in 3-4 Time|01|3.23 MB|128 kbps|44.1 kHz|03:31
0003|Billy Joe Shaver|Old Chunk of Coal|01|2.34 MB|128 kbps|44.1 kHz|02:33
0004|Billy Joe Shaver|Serious Souls|01|2.00 MB|128 kbps|44.1 kHz|02:10
0013|Bonnie Raitt|Let me In|01|3.36 MB|128 kbps|44.1 kHz|03:40
0014|Burl Ives|Time|01|2.70 MB|128 kbps|44.1 kHz|02:57
3612|June Carter Cash|Meeting in the Air|18|1.88 MB|128 kbps|44.1 kHz|02:02
3769|June Carter Cash|Meeting in the Air|19|1.88 MB|128 kbps|44.1 kHz|02:03
xxxx|June Carter Cash|Keep on the Sunny Side|25|2.64 MB|128 kbps|44.1 kHz|02:52
4684|Cluster Pluckers|Keep on the Sunny Side|25|2.64 MB|128 kbps|44.1 kHz|02:52
4981|Cluster Pluckers|Keep on the Sunny Side|27|2.65 MB|128 kbps|44.1 kHz|02:53
0011|Blue Velvet Band|Weary Blues From Waitin'|01|3.51 MB|160 kbps|44.1 kHz|03:03
0012|Blue Velvet Band|You'll Find Her Name Written There|01|3.16 MB|160 kbps|44.1 kHz|02:45
Notice the made up line "xxxx" to ensure that 2 artists doing the same title isn't called a duplicate.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.