LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-19-2007, 01:07 AM   #1
PatrickNew
Senior Member
 
Registered: Jan 2006
Location: Charleston, SC, USA
Distribution: Debian, Gentoo, Ubuntu, RHEL
Posts: 1,148
Blog Entries: 1

Rep: Reputation: 48
/bin/sh awk command troubles


I am just now learning awk, and I really want to do some basic things in a script.

I have, essentially a key-value pair file, on each line is a text key, then a text key, separated by a tab. I want to search the file for the key, then return the value.

Code:
TWO='$2'
RESULT=awk -F'\t' "/$KEY_VAL/ {print $TWO}" /path/to/file
The "TWO=$2" is because I need all that to be enclosed in the same parenthesis, but I need KEY_VAL to be expanded, but I need the literal text $2.

This worked fine, until I realized that keys could contain the '/' character. Now I'm stuck, as I don't know how to escape the '/' character to prevent the expansion of KEY_VAL from artificially ending the search block.

Ideally, I would also like to only return the entry from the first line the key is found on.

And yes, I swear I did search this first, but I couldn't find anything on escaping the '/'.
 
Old 08-19-2007, 01:38 AM   #2
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
How is the goofy KEY_VAL being read in? Perhaps the simplest thing is to edit the input, so that an argument passed like 'goofy/key' becomes 'goofy\/key' before awk ever sees it, like 'echo $ORIG_KEY_VAL | sed 's,/,\\/,'

Last edited by slakmagik; 08-19-2007 at 01:48 AM.
 
Old 08-19-2007, 01:49 AM   #3
PatrickNew
Senior Member
 
Registered: Jan 2006
Location: Charleston, SC, USA
Distribution: Debian, Gentoo, Ubuntu, RHEL
Posts: 1,148

Original Poster
Blog Entries: 1

Rep: Reputation: 48
In particular, my file holds in the first column the output of file and in the second collumn an executable handler. When I feed it the the key "PNG image data, 253 x 338, 8-bit/color RGBA, non-interlaced" I get the error message

awk: /PNG image data, 253 x 338, 8-bit/color RGBA, non-interlaced/ {print $2}
awk: ^ syntax error

but this works fine for keys not containing "/". If I escape the '/' before sending it, what will awk see in argv, "...bit\/color..." or just "...bit/color..."?
 
Old 08-19-2007, 01:56 AM   #4
PatrickNew
Senior Member
 
Registered: Jan 2006
Location: Charleston, SC, USA
Distribution: Debian, Gentoo, Ubuntu, RHEL
Posts: 1,148

Original Poster
Blog Entries: 1

Rep: Reputation: 48
Okay, so a quick test script confirms that awk will see ...bits/color...

----
But alas, this leaves me with the same problem. The '\/' escape sequence is preventing /bin/sh from reading any special meaning into the '/', but its awk who is having trouble with it.

Last edited by PatrickNew; 08-19-2007 at 02:00 AM.
 
Old 08-19-2007, 02:06 AM   #5
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Quote:
Originally Posted by PatrickNew View Post
Okay, so a quick test script confirms that awk will see ...bits/color...

----
But alas, this leaves me with the same problem. The '\/' escape sequence is preventing /bin/sh from reading any special meaning into the '/', but its awk who is having trouble with it.
No, awk will see 'bits\/color' if it's quoted. This isn't directed at you, as such, but at *everybody*. Every question like this should have three code blocks. One for input, one for the script, one for expected or actual output.


Code:
:cat test.txt
foo     bar
baz     mu
goofy/key       fnord
PNG image data, 253 x 338, 8-bit/color RGBA, non-interlaced picture     wtf
Code:
:cat test.sh
KEY_VAL="$(grep RGBA test.txt | cut -d' ' -f2 | sed 's,/,\\/,')"
TWO='$2'
RESULT=$(awk -F'\t' "/$KEY_VAL/ {print $TWO}" test.txt)
echo $RESULT
Code:
:sh test.sh
wtf
Now, I have to do that absolutely daft bit of assignment to KEY_VAL because I don't know how you're getting it. But that's one way to illustrate a demonstration case that extracts 'wtf' from the file, where it's $2 attached to the PNG junk. If that's not what you're doing, throw three code blocks back at me.

-- Oh, and I should note that the delimiter for cut is a literal tab, though it looks like a space.

-- And this:

Code:
KEY_VAL="$(grep RGBA test.txt | cut -d' ' -f2 | sed 's,/,\\/,')"
RESULT=$(awk -F'\t' "/$KEY_VAL/ {print \$2}" test.txt)
echo $RESULT
produces the same output and gets rid of the assignment to TWO.

Last edited by slakmagik; 08-19-2007 at 02:11 AM. Reason: whitespace issue - and getting rid of the assignment to TWO
 
Old 08-19-2007, 02:21 AM   #6
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Actually, even better would modifying 'file's output so your input is pre-processed:
file -b foo.png | sed 's,/,_,g'
 
Old 08-19-2007, 02:21 AM   #7
PatrickNew
Senior Member
 
Registered: Jan 2006
Location: Charleston, SC, USA
Distribution: Debian, Gentoo, Ubuntu, RHEL
Posts: 1,148

Original Poster
Blog Entries: 1

Rep: Reputation: 48
Allright, the KEY_VAL is the exact output of `file -b` on a file, so I cannot trust its formatting, except that I know it does not contain tabs, since /etc/magic is tab-delimited.

~/rundb.gz (zcat-ed)
Code:
Rich Text Format data, version 1, ANSI  oowriter
OpenDocument Text       oowriter
ASCII C program text    gedit
ASCII text      gedit
PNG image data, 253 x 338, 8-bit/color RGBA, non-interlaced     gthumb
run.sh
Code:
KEY_VAL=`file -b $1`

#to prevent expansion of $2
TWO='$2'

#look it up
CMD=`zcat ~/rundb.gz | awk -F'\t' "/$FILE_GIVES/ {print $TWO}"`

#echo the command instead of executing while developing
echo $CMD
Yes, I know better implementations of a file-type recognizing launcher have been done, but I'm writing this as a learning experience, not to produce a new tool
 
Old 08-19-2007, 02:42 AM   #8
PatrickNew
Senior Member
 
Registered: Jan 2006
Location: Charleston, SC, USA
Distribution: Debian, Gentoo, Ubuntu, RHEL
Posts: 1,148

Original Poster
Blog Entries: 1

Rep: Reputation: 48
Well, it works much better if I replace the call to awk with a grep piped to cut.

Code:
CMD=`zcat ~/rundb.gz | grep --max-count=1 "$KEY_VALUE"| cut -f2 -s`
 
Old 08-19-2007, 03:16 AM   #9
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Glad it works for you, but I thought we were learning awk. If you can change that much, you should probably change a whole lot more. For instance, within the context of creating a 'rundb' I might use file's '-i' option because how many 253 x 338 PNGs are you going to have? I just feel that most of the work should be done getting a rundb in a useful format in the first place.

But my brain is obviously *completely* non-functional these days, so I'll leave you in the hands of the rest of LQ.

Anyway - FWIW, for my 'open files with programs script', I alsos used file, but just used the shell's case statement to match stuff:
Code:
    TYPE="$(file -Lb -- "$1")"
    case $TYPE in
...
        JPEG*|PNG*|X\ pixmap*|GIF*) f_pictureapp "$1" ;;
...
 
Old 08-19-2007, 05:05 AM   #10
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by PatrickNew View Post
I am just now learning awk, and I really want to do some basic things in a script.

I have, essentially a key-value pair file, on each line is a text key, then a text key, separated by a tab. I want to search the file for the key, then return the value.

Code:
TWO='$2'
RESULT=awk -F'\t' "/$KEY_VAL/ {print $TWO}" /path/to/file
The "TWO=$2" is because I need all that to be enclosed in the same parenthesis, but I need KEY_VAL to be expanded, but I need the literal text $2.

This worked fine, until I realized that keys could contain the '/' character. Now I'm stuck, as I don't know how to escape the '/' character to prevent the expansion of KEY_VAL from artificially ending the search block.

Ideally, I would also like to only return the entry from the first line the key is found on.

And yes, I swear I did search this first, but I couldn't find anything on escaping the '/'.
pass it in as a variable to awk and all be fine.
Code:
awk -v two="$some_value" -v key="$key_value" '{if match($0,key) print two }' /path/to/file

Last edited by ghostdog74; 08-19-2007 at 05:06 AM.
 
Old 08-19-2007, 01:04 PM   #11
PatrickNew
Senior Member
 
Registered: Jan 2006
Location: Charleston, SC, USA
Distribution: Debian, Gentoo, Ubuntu, RHEL
Posts: 1,148

Original Poster
Blog Entries: 1

Rep: Reputation: 48
Quote:
Originally Posted by digiot View Post
Glad it works for you, but I thought we were learning awk.
Alas, you are correct. My instinct to get something working overpowered my common sense. Learning a bit of awk is indeed one of the major goals of this project

Quote:
If you can change that much, you should probably change a whole lot more. For instance, within the context of creating a 'rundb' I might use file's '-i' option because how many 253 x 338 PNGs are you going to have?
Actually, mime types would have been my first choice, but they provide the opposite problem as file's native file type. My 'file' identifies OpenDocument Text files as simply "application/x-zip". The inability to recognize odt's is a deal-breaker for me.

[QUOTE]I just feel that most of the work should be done getting a rundb in a useful format in the first place.[QUOTE]
And if this silly little script had an intended user base beyond myself, I would agree. However, this is little more than a toy that I can design and redesign at will, no legacy users to support. If I have to start over, at least I'll have learned a bit of awk.

And in future versions of this script, it will probably use a more intelligent matching algorithm, perhaps only matching before the comma, as file's format seems to use that.
 
Old 08-19-2007, 01:34 PM   #12
PatrickNew
Senior Member
 
Registered: Jan 2006
Location: Charleston, SC, USA
Distribution: Debian, Gentoo, Ubuntu, RHEL
Posts: 1,148

Original Poster
Blog Entries: 1

Rep: Reputation: 48
To ghostdog74:

Many thanks, the match() function was exactly what I wanted. Using it instead of the /some_stuff/ syntax allowed awk to ignore the '/' in the string. I also implemented a way of ensuring that only one match is made. Here it is now:

Code:
CMD=`zcat ~/rundb.gz | awk -F'\t' -v fnd="0" -v key="$KEY_VAL" '{if (match($0, key) && fnd!=0) print $2; fnd++}'`
Since I can now use single quotes around the actual awk commands, I don't need to do the TWO='$2' garbage anymore. Hooray!
 
Old 08-19-2007, 01:42 PM   #13
PatrickNew
Senior Member
 
Registered: Jan 2006
Location: Charleston, SC, USA
Distribution: Debian, Gentoo, Ubuntu, RHEL
Posts: 1,148

Original Poster
Blog Entries: 1

Rep: Reputation: 48
And by adding this line before the search for the command,

Code:
KEY_VAL=`echo $KEY_VAL | awk -F, '{print $1}'`
I can search for only the part preceding a comma, if there is one. While I don't know this for certain, it appears that the part of file preceding the comma is sufficient to identify the file type, and after the comma is merely whatever additional information could be gotten. So one rundb entry for PNG's can match all PNG's, etc.
 
Old 08-19-2007, 01:55 PM   #14
slakmagik
Senior Member
 
Registered: Feb 2003
Distribution: Slackware
Posts: 4,113

Rep: Reputation: Disabled
Glad you and ghostdog got it sorted out. I had a feeling I wasn't functioning properly and something very simple would be very good.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
the 'awk' command. iconicmoronic Linux - Newbie 2 04-08-2007 12:29 AM
awk command to take average? johnpaulodonnell Linux - Newbie 3 02-07-2007 04:11 AM
Awk - get a parameter from the command line benjalien Programming 1 01-24-2006 09:06 AM
/etc/rc.sysinit: /bin/awk: Text file busy teeno Linux - Software 5 02-23-2005 02:19 AM
cut / awk command?? Sammy2ooo Linux - Newbie 1 05-27-2003 05:46 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:00 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration