LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 05-04-2008, 01:45 PM   #1
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
text formatting in bash variable


I'm trying to take the output from "mplayer -identify" and place it in a bash variable. So I do something like this:

Code:
OUTPUT="$(mplayer -vo /dev/null -ao /dev/null -identify mediafile.ogg 2>/dev/null)"
This puts the output into the variable like I want, but all the formatting is lost. $OUTPUT shows everything concatenated into one line, with only spaces as delimiters.

If I run the command directly from the prompt or pipe it to a file, mplayer outputs each entry on a separate line, which is what I want.

How can I get the contents of the variable to mirror the output in stdout?
 
Old 05-04-2008, 02:05 PM   #2
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Original Poster
Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
Even better, I could use some help extracting tag data from the list. What I really need to do is locate the string that says "artist" for example, then get the value for that string.

Here's some sample output, after grepping for "ID":

Code:
ID_AUDIO_ID=0 ID_CLIP_INFO_NAME0=Name ID_CLIP_INFO_VALUE0=Ballroom Blitz ID_CLIP_INFO_NAME1=Artist ID_CLIP_INFO_VALUE1=Sweet ID_CLIP_INFO_NAME2=Genre ID_CLIP_INFO_VALUE2=Rock ID_CLIP_INFO_NAME3=Comments ID_CLIP_INFO_VALUE3=1973 ID_CLIP_INFO_N=4 ID_FILENAME=Sweet_-_Ballroom_Blitz.ogg ID_DEMUXER=ogg ID_AUDIO_FORMAT=vrbs ID_AUDIO_BITRATE=0 ID_AUDIO_RATE=44100 ID_AUDIO_NCH=2 ID_LENGTH=247.29 ID_AUDIO_BITRATE=128024 ID_AUDIO_RATE=44100 ID_AUDIO_NCH=2
I'm sure I can do it with grep or awk, but how can I grab just the tags I need? My regex-fu just isn't good enough to figure it out on my own.

Last edited by David the H.; 05-04-2008 at 02:46 PM.
 
Old 05-04-2008, 02:12 PM   #3
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
If I am not mistaken, this is the normal behavior. Perhaps the question is: Can a variable contain a newline character?

Code:
A series of experiment says "maybe":
 echo '
 
a
 
b    

c

'>fil
cat fil|hexdump -C shows all the newlines
var=$(cat fil); echo $var eliminates all newlines except the last one

If I uses echo ' to enter 3 newlines, the variable winds up with just one.

By the way: I have been meaning to ask for a long time. Were you (or are you) active in an Epson printer forum? (Or maybe one on digital cameras?)
 
Old 05-04-2008, 02:42 PM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Original Poster
Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
Ok, so what does it mean?

[Can you get formatted output or not? Can you pipe the output through something that can read the newlines?]

edit: Whoops, sorry, I think I misread your post. If I understand you correctly now, you're saying that there are no newline characters in the variable contents. So does that mean I'd have to work through a temporary file instead?

Anyway, the reason I was asking is that I thought it would make it easier to grep the tags I want. But further investigation shows that the tags don't always come in the same order for different files, so I have to be able to read the ID_NAME first, then get the corresponding ID_VALUE. And that's probably easier to do when it's all on a single line--maybe.

(This whole project is turning out to be a lot harder than I thought it would.)


Finally, to answer your last question, no, I haven't been active in any printing forums. I just got tired of always linking to the same printer & scanner pages, so I put them in my sig to make things easier.

Last edited by David the H.; 05-04-2008 at 02:56 PM.
 
Old 05-04-2008, 03:20 PM   #5
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
I know nothing except the brief experiments that I reported...

So, there is another David in Japan that is really into Epson printers.....
 
Old 05-04-2008, 03:56 PM   #6
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,426

Rep: Reputation: 822Reputation: 822Reputation: 822Reputation: 822Reputation: 822Reputation: 822Reputation: 822
More experiments indicate this is possible by changing IFS:
Code:
~$ var='x
>
> x'
~$ echo $var
x x
~$ IFS=' ' var='x
>
> x'
~$ echo $var
x

x
Quote:
And that's probably easier to do when it's all on a single line--maybe.
I think that without newlines, you would need non-greedy operators to match the tags.

Code:
~$ echo $ids
ID_AUDIO_ID=0
ID_CLIP_INFO_NAME0=Name
ID_CLIP_INFO_VALUE0=Ballroom Blitz
ID_CLIP_INFO_NAME1=Artist
ID_CLIP_INFO_VALUE1=Sweet
ID_CLIP_INFO_NAME2=Genre
ID_CLIP_INFO_VALUE2=Rock
ID_CLIP_INFO_NAME3=Comments
ID_CLIP_INFO_VALUE3=1973
ID_CLIP_INFO_N=4
ID_FILENAME=Sweet_-_Ballroom_Blitz.ogg
ID_DEMUXER=ogg
ID_AUDIO_FORMAT=vrbs
ID_AUDIO_BITRATE=0
ID_AUDIO_RATE=44100
ID_AUDIO_NCH=2
ID_LENGTH=247.29
ID_AUDIO_BITRATE=128024
ID_AUDIO_RATE=44100
ID_AUDIO_NCH=2

~$ id_name=$(echo $ids | sed -n 's/\(ID[A-Z0-9_]\+\)=Artist/\1/p')
~$ echo $id_name
ID_CLIP_INFO_NAME1
~$ id_value=${id_name/NAME/VALUE}
~$ echo $id_value
ID_CLIP_INFO_VALUE1
~$ echo $ids | sed -n "s/$id_value=\(.*\)/\1/p"
Sweet
Is that what you had in mind?


EDIT: without changing IFS, not sure if this can work without greedy ops
Code:
~$ echo $ids_n
ID_AUDIO_ID=0 ID_CLIP_INFO_NAME0=Name ID_CLIP_INFO_VALUE0=Ballroom Blitz ID_CLIP_INFO_NAME1=Artist ID_CLIP_INFO_VALUE1=Sweet ID_CLIP_INFO_NAME2=Genre ID_CLIP_INFO_VALUE2=Rock ID_CLIP_INFO_NAME3=Comments ID_CLIP_INFO_VALUE3=1973 ID_CLIP_INFO_N=4 ID_FILENAME=Sweet_-_Ballroom_Blitz.ogg ID_DEMUXER=ogg ID_AUDIO_FORMAT=vrbs ID_AUDIO_BITRATE=0 ID_AUDIO_RATE=44100 ID_AUDIO_NCH=2 ID_LENGTH=247.29 ID_AUDIO_BITRATE=128024 ID_AUDIO_RATE=44100 ID_AUDIO_NCH=2
~$ id_name=$(echo $ids_n | sed -n 's/.*\(ID[A-Z0-9_]\+\)=Artist.*/\1/p')
~$ echo $id_name
ID_CLIP_INFO_NAME1
~$ id_value=${id_name/NAME/VALUE}
~$ echo $id_value
ID_CLIP_INFO_VALUE1

Actually, I think this could fail if the value has an "I" in
it. What I meant was to match anything that's not "ID_", but I think 
that would be a much more complicated regex
~$ echo $ids_n | sed -n "s/.*$id_value=\(\([^I][^D][^_]\)*\).*/\1/p"
Sweet

Last edited by ntubski; 05-04-2008 at 04:57 PM.
 
Old 05-06-2008, 10:09 AM   #7
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Original Poster
Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
Well, I'm proud to say that I actually thought about the IFS variable, but I wasn't sure what to change it to. Changing it to IFS=' ' certainly does work to output the format correctly. I don't understand why it doesn't break on the spaces inside the tags though.

Thanks for all the work you did figuring this out ntubski. I really appreciate it. It does seem to be a bit more effort than I had hoped for, having to read one variable in order to get another.

And you seem to be right that your regex chokes on a tag with an I in it. But your last regex also fails to work. Indeed it shows some really strange behavior, and I can't quite figure it out.

Code:
$ output="$(mplayer -vo /dev/null -ao /dev/null -identify 
"Iron_Maiden_-_A_Real_Live_One_-_03_-_Can_I_Play_With_Madness_(Live).ogg" 2>/dev/null|grep ID)"

$ echo $output
ID_AUDIO_ID=0 ID_CLIP_INFO_NAME0=Comments ID_CLIP_INFO_VALUE0= ID_CLIP_INFO_NAME1=Name ID_CLIP_INFO_VALUE1=Can I Play With Madness (Live) ID_CLIP_INFO_NAME2=Artist ID_CLIP_INFO_VALUE2=Iron Maiden ID_CLIP_INFO_NAME3=Genre ID_CLIP_INFO_VALUE3=Heavy Metal ID_CLIP_INFO_NAME4=Creation Date ID_CLIP_INFO_VALUE4=1993 ID_CLIP_INFO_NAME5=Album ID_CLIP_INFO_VALUE5=A Real Live One ID_CLIP_INFO_NAME6=Track ID_CLIP_INFO_VALUE6=03 ID_CLIP_INFO_N=7 ID_FILENAME=Iron_Maiden_-_A_Real_Live_One_-_03_-_Can_I_Play_With_Madness_(Live).ogg ID_DEMUXER=ogg ID_AUDIO_FORMAT=vrbs ID_AUDIO_BITRATE=0 ID_AUDIO_RATE=44100 ID_AUDIO_NCH=2 ID_LENGTH=284.20 ID_AUDIO_BITRATE=192000 ID_AUDIO_RATE=44100 ID_AUDIO_NCH=2

$ id_name=$(echo $output | sed -n 's/.*\(ID[A-Z0-9_]\+\)=Name.*/\1/p')
$ echo $id_name
ID_CLIP_INFO_NAME1

$ id_value=${id_name/NAME/VALUE}
$ echo $id_value
ID_CLIP_INFO_VALUE1

$ echo $output | sed -n "s/.*$id_value=\(\([^I][^D][^_]\)*\).*/\1/p"
Can I Play With Madness (Live) ID_CL
Nothing I can do with the values of ([^I][^D][^_]) gives me a consistent match. It almost always either truncates or overshoots the line. However, after playing around a bit I found this, which seems to work:

Code:
$ echo $output | sed -n "s/.*$id_value=\(\([^_]\)*\s\).*/\1/p"
Can I Play With Madness (Live)
I couldn't tell you why, though.

Still, the biggest problem for me seems to be that there's just too much variability in tag names. I've been trying to find a simple all-in-one solution to extracting metadata tags, but it looks like I'll have to use a combination of tools. I'm still looking at tagpy and exiftool also.

Last edited by David the H.; 05-06-2008 at 10:13 AM.
 
Old 05-06-2008, 10:39 AM   #8
ararus
Member
 
Registered: Mar 2008
Location: UK
Distribution: Slackware
Posts: 56

Rep: Reputation: 15
e.g.

Code:
mplayer -vo /dev/null -ao /dev/null -identify some_file.ogg 2>/dev/null | \
    sed -n/^\(ID.*\)=\(.*\)$/\1 \2/p' | \
    while read ID TAG; do
        echo "id is \"$ID\", tag is \"$TAG\""
    done
If you need to reference the variables outside the loop, just eval them:

Code:
while read ...; do
    eval "$ID"="$TAG"
done

Last edited by ararus; 05-06-2008 at 10:43 AM.
 
Old 05-06-2008, 05:52 PM   #9
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,261

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
To be honest, I'd that in Perl, its very strong in text munging.
 
Old 05-06-2008, 05:59 PM   #10
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,426

Rep: Reputation: 822Reputation: 822Reputation: 822Reputation: 822Reputation: 822Reputation: 822Reputation: 822
Quote:
Originally Posted by David the H. View Post
And you seem to be right that your regex chokes on a tag with an I in it. But your last regex also fails to work. Indeed it shows some really strange behavior, and I can't quite figure it out.

Nothing I can do with the values of ([^I][^D][^_]) gives me a consistent match. It almost always either truncates or overshoots the line.
Well, that regex isn't really close to working, it's just the point where I stopped bothering to get the thing working. I know there is a regex that can do this, because I can see how to make the DFA for it, but the DFA to regexp algorithm doesn't guarantee pretty results. Plus the solution with setting IFS should work fine as far I as I can tell.

Quote:

However, after playing around a bit I found this, which seems to work:

Code:
$ echo $output | sed -n "s/.*$id_value=\(\([^_]\)*\s\).*/\1/p"
Can I Play With Madness (Live)
I couldn't tell you why, though.
That should be fine as long as there's no underscore in the value. I think it could be written better as
Code:
"s/.*$id_value=\([^_]*\)\s.*/\1/p"
Basically that takes any number of non-underscore characters followed by a space.

In this example the blue letters are the non-underscore characters, and the ")" is last one that is followed by a space:
Code:
Can I Play With Madness (Live) ID_
Quote:
Still, the biggest problem for me seems to be that there's just too much variability in tag names. I've been trying to find a simple all-in-one solution to extracting metadata tags, but it looks like I'll have to use a combination of tools. I'm still looking at tagpy and exiftool also.
Was there something wrong with the first solution? I probably shouldn't have posted the half-baked second attempt, it seems to have distracted us from the original problem, I just got caught up in trying to see if it's possible.

I think ararus' solution looks even nicer. And yes, it probably would work better in perl, where you could have an actual hash table instead of having to resort to creating variables.
 
Old 05-09-2008, 12:02 PM   #11
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Original Poster
Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
Thank you all again for your help. I'm still trying to figure it all out, but I am making headway.

To make it clear, I'm trying to write a fairly simple script that extracts the metatags from various media files and uses them to creat playlist files. The script itself isn't at all complicated, but finding a reliable, universal way to get the tags is.

There are three possible solutions I'm working with, mplayer, exiftool, and tagpy, and all of them have strengths and weaknesses.

At a glance mplayer seemed promising. It can read just about any file and it's a commonly-available program on most machines. But getting the right tags requires a lot of work, as we've seen. Complicating it even more is that different file types don't always use the same tags. wma files, for example, seem to use "author" instead of "artist". I've also seen some files that use "name" as well.

For a while tagpy (python extension for taglib) seemed to be the best bet, because taglib has universal functions for calling the basic meta info no matter what the file type. But I've since discovered that it only supports a few common file types, and segfaults on anything it doesn't understand, such as wma or m4a files, not to mention video files such as avi or mpg. Also, I don't like having to depend on having an interpreted language environment+extensions installed. It makes the script less portable (such as to my pda, which doesn't have python or perl, but does have mplayer).

Finally there's exiftool. It's another taglib extension and cli frontend based on perl. The frontend is very nice and easy to use, and it doesn't seem to have the file limitations of tagpy. But it has the same problem with variable tag names as mplayer. Also, it can't extract the playing time length unless the file has a metadata tag containing it. Finally, it also depends on an interpreted environment.

I'm currently thinking about going with exiftool for most of the data, and extracting the playing length with mplayer. But if I can get the mplayer solution to work reliably I might use it exclusively. Either way, I'd have to either ignore or test for non-standard metatags. Unless there's some other solution I don't know about.

Perhaps it would be better to do it in perl or python directly, but I don't know anything about those languages yet. I'm still working on my bash scripting right now. In any case, I'm learning a lot about scripting and regex from this exercise, so I don't consider any of this a waste of time.


Edit: I've just run across another possible option that appears promising. It's a different python wrapper called kaa.metadata, (was mmpython). It doesn't seem to be able to extract comment tags for some reason, but that's not a big loss. It looks like it will give me pretty much everything else I want without much hassle and it can handle a larger number of file types (though it's still a bit limited; it doesn't appear to handle mpc files, a few of which I have). I'm going to have to experiment a bit.

Last edited by David the H.; 05-09-2008 at 12:29 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
passing variable from bash to perl in a bash script quadmore Programming 6 02-21-2011 04:11 AM
Getting a variable from a text-file (bash) PatrickBecks Programming 4 02-14-2008 08:52 PM
Bash read in variable length text records lynx81191 Programming 4 11-17-2007 08:53 PM
Bash store last line from displayed text output in a variable carl0ski Programming 1 01-16-2007 03:38 AM
C Text Formatting oulevon Programming 6 02-10-2006 09:39 PM


All times are GMT -5. The time now is 07:29 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration