ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Objective:
Match "Title", "Artist" of a song in file1: "Playlist", to
"Title", "Artist" in file2: "Music-List". Both lists are Tab delimited.
Print the result only if the Song Title and Artist match the Playlist, otherwise print "Not Found"
Playlist; <artist> <title>
ZZ Top Tush
Peter Gabriel Sledgehammer (Remix)
Music-List; <artist> <title> <bitrate> <path>
ZZ Top Tush 32000 /Music/Z/ZZ Top - Tush.mp3
Peter Gabriel 32000 /Music/P/Peter Gabriel - Sledgehammer.mp3
Cactus Jack Tush 128000 /Music/C/Cactus Jack - Tush.mp3
Other's suggestions;
1. Use an awk index to avoid special characters (-.[) etc.
2. Use an awk array for the Playlist file <artist> <title>
For example, the Song "Tush" appears 2 times with different artists,
if a line in Playlist searches for "Tush", it should also check for
the Artist "ZZ Top", to avoid accidentally printing the other "Tush" song by Cactus Jack.
Could someone please recommend a one-line to acomplish this?
Thank you much.
Last edited by Seemoi; 06-21-2012 at 12:11 PM.
Reason: Corrections
Please use ***[code][/code] tags*** around your code and data, to preserve formatting and to improve readability. Please do not use quote tags, colors, or other fancy formatting.
Unless the files are formatted with a clear delimiter between the fields that's not found anywhere in the text itself, I can't see any easy or reliable way to accomplish this. It will certainly take more than a one-liner.
On the other hand, it would probably be trivial to do if the input was, say, tab-delimited.
BTW, is this a homework question? It reads like one.
Quote:
Do not post homework assignments verbatim. We're happy to assist if you have specific questions or have hit a stumbling point, however. Let us know what you've already tried and what references you have used (including class notes, books, and searches) and we'll do our best to help. Keep in mind that your instructor might also be an LQ member.
I would also be curious what output you would expect from your example? Assuming, as David has mentioned, that an appropriate delimiter is used, the only match to your criteria would be
the ZZ Top entry ... is this correct?
No code was entered.
Both lists are Tab delimited.
This is not homework, I am consolidating many music playlists.
Expected result? OK...
Playlist Entry: ZZ Top Tush
For each entry in Playlist...
*Search field 2 "Tush" in Music-List field 2
*If matched, next check for a match for "ZZ Top" in field 1 of Music-List
If both title and artist match (in Music-List), print the line in Music-List
Result: ZZ Top Tush 32000 /Music/Z/ZZ Top - Tush.mp3
Actually if you read a little closer, David said both code and data do well from being placed in code tags.
Quote:
Both lists are Tab delimited.
This is not only an important detail to omit, but would also have been kept had you used code tags
Quote:
Result: ZZ Top Tush 32000 /Music/Z/ZZ Top - Tush.mp3
So just to confirm again, based on the following input data:
Code:
$ cat playlist
ZZ Top Tush
Peter Gabriel Sledgehammer (Remix)
$ cat music-list
ZZ Top Tush 32000 /Music/Z/ZZ Top - Tush.mp3
Peter Gabriel 32000 /Music/P/Peter Gabriel - Sledgehammer.mp3
Cactus Jack Tush 128000 /Music/C/Cactus Jack - Tush.mp3
Then the result is still only ZZ Top?
Assuming all the above is correct, I would say awk is your friend and that there are several examples on LQ never mind the net.
Standard format is:
1. Store all files from the first comparison file in an array
2. Check if associated items from the second file in corresponding fields are in the array
Here is a link for the gawk manual if you do not already have one:
#!/usr/bin/mawk -f
BEGIN {
# Each line (using any convention) is a separate record.
# Also remove any leading and trailing whitespace on a line.
RS = "[\t\v\f ]*(\r\n|\n\r|\r|\n)[\t\v\f ]*"
# Fields are separated by a single tab character.
FS = "[\t]"
# For output, use linefeeds.
ORS = "\n"
# Input file number.
file = 0
}
# Increase input file number before processing its first record.
(FNR == 1) {
file++
}
# Compute an associative array key from the artist and song names (for all input files).
{
# Start with "artist" "|" "song", converted to lower case.
key = tolower($1 "|" $2)
# Remove all non-alphanumeric characters from the key.
gsub(/[^0-9a-z|]+/, "", key)
}
# First file is the playlist we compare against. Remember the keys seen here.
(file == 1) {
playlist[key]
}
# All other files are music lists.
(file > 1) {
# Output the record to standard output, if this artist-song
# was listed in the initial playlist.
# Otherwise, output the record to standard error.
if (key in playlist)
printf("%s%s", $0, ORS)
else
printf("%s%s", $0, ORS) > "/dev/stderr"
}
The script takes two or more file names. The first names the file containing just the artist and song names. For all the other files, the script will output the record to standard output if the artist and song was named in the first file, and to standard error otherwise.
The idea is that the first two fields from each record in the first file are saved as keys in associative array playlist. Usually, there are some typos in the names, so the script converts the array key to uppercase, then removes all but numbers and letters. A pipe character is used to keep the artist and song names separate (so that "Artist" "Song Name" and "Artist Song" "Name" are distinguishable from each other).
Given first file
Code:
ZZ Top Tush
Peter Gabriel Sledgehammer (Remix)
and second file
Code:
ZZ Top Tush 32000 /Music/Z/ZZ Top - Tush.mp3
Peter Gabriel 32000 /Music/P/Peter Gabriel - Sledgehammer.mp3
Cactus Jack Tush 128000 /Music/C/Cactus Jack - Tush.mp3
the script will output to standard output
Code:
ZZ Top Tush 32000 /Music/Z/ZZ Top - Tush.mp3
and to standard error
Code:
Peter Gabriel 32000 /Music/P/Peter Gabriel - Sledgehammer.mp3
Cactus Jack Tush 128000 /Music/C/Cactus Jack - Tush.mp3
Note that the "Peter Gabriel" entry in the second file is missing the song name; the script sees "Peter Gabriel" as the artist, and "32000" as the song name, therefore it does not match "Sledgehammer (Remix)" in the play list.
To get better results in practice, I'd edit the key string. For example, if the playlist has annotations in parentheses or brackets you want to ignore, you could replace the key code with
Code:
# Compute an associative array key from the artist and song names (for all input files).
{
# Start with "artist" "|" "song", converted to lower case.
key = tolower($1 "|" $2)
# Remove stuff in parentheses
gsub(/ *\([^\)]*\) */, " ", key)
# Remove stuff in brackets
gsub(/ *\[[^\]]*\] */, " ", key)
# Remove all non-alphanumeric characters from the key.
gsub(/[^0-9a-z|]+/, "", key)
}
With a bit of hacking the above -- depending on how much variance and typos there are in your playlists -- you can probably save yourself a lot of hand-editing.
Thank you for helping me without reprimand and non answers.
I very much appreciate it.
I am not a programmer so I will have to study what you did to try and get this into
one line. It's been difficult transitioning from using a filemaker database to acomplish this.
I realized the omission for the Peter Gabriel too late, thanks for catching that.
I've tried so many incarnations that didn't work so far...
cat Playlist | while read z; do tit="$z" ; awk -v title="$tit" '{FS = "\t"} $2 ~ title {print $0}' Music-List ; done
awk -F'\t' '{for(N in var){if(index($2,N)){print; next}}}' Playlist Music-List
It is difficult to decipher what is asked when the data and inputs do not match the description or the intent. I basically just guessed.
While the responses here may read like they were reprimands, they really were just requests for clarification. While I may be overstepping some social bounds, I can guarantee you that both grail and David the H. only wanted to help you, but found your description frustratingly difficult to understand.
Quote:
Originally Posted by Seemoi
I am not a programmer so I will have to study what you did to try and get this into one line.
If you save the entire script from my previous post, just as written, into say file merge-playlist in your home directory (not desktop, your home directory), you can make it executable by simply running command
Code:
chmod a+x ~/merge-playlist
once; one time only. The ~/ refers to your home directory.
to save all the Music-List entries that match the Playlist into New-List or Combined-List. All the Music-List entries that were not listed in the Playlist (and are not saved in New-List or Combined-List) will be shown on-screen.
You can even edit the script (using gedit, emacs, vim, nano, or any text editor you wish -- just don't use a word processor like Abiword, OpenOffice Writer, LibreOffice Writer, or so on). All editors I've used retain the executable flag, so you won't need to run the chmod command again; you can just save your changes to the script, and run the ~/merge-playlist command immediately.
If you seriously need the mawk command to work on a single command line, then you can use this:
You can either keep it as it is -- because the script part is in single quotes, it will be parsed as a single command even if it is on more than one line -- or just omit the newlines, putting it all on a single very long line. Both will work exactly the same. I basically just omitted all comments, added semicolons to separate expressions.. and that's about it.
Hope you find this useful, and cut some slack to grail and David the H.; they too were just trying to help you,
Nominal Animal is correct. We get a lot of poorly-worded requests here, so perhaps we sometimes get a little impatient, but we really just want to clarify the requirements first so that we can give the most appropriate solutions, instead of trying to guess what you really want.
We also hesitate to simply give out complete scripting solutions, as we know you'll learn more by doing it yourself. We expect you to do as much as you can on your own, and to come back for more help whenever you get stuck. We volunteer our time here as guides and helpers, not tech support.
To get the maximum benefit in help forums like this, please read Eric S. Raymond's excellent How To Ask Questions The Smart Way when you have the time.
+1 to both David and NA. I have provided too many solutions to what I 'thought' was where the question was going only to find it was nothing to do with my solution
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.