LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Simple awk question search for string1 and extract string2 (https://www.linuxquestions.org/questions/programming-9/simple-awk-question-search-for-string1-and-extract-string2-843887/)

taskmaster 11-12-2010 08:43 AM

Simple awk question search for string1 and extract string2
 
Hi All,

Haven't used Awk in a while, this is probably simple?

have a file with the following line:

<xref image="00001234.tif|V3|1999:11:19:22:13|49487|0"> image: </xref>

want to use awk to search for: <xref image="
??? /<xref image=\"/ ???

then set a variable called imagename to the next 12 characters
??? help ???

******************

Also looking for a way to display a character based text file's hidden characters.

Thanks in advance for your assistance.

ee437 11-12-2010 08:59 AM

If the filename is always followed by a "|" character, this might work:

Code:

awk -F \" '$1=="<xref image=" {print $2}' file.in.name \
awk -F "|" '{print $1}'

(assuming you want to use awk). The -F \" declares the delimiter as a double quote. The first line essentially prints all lines in file.in.name that begin with

<xref image=

and the second line (note the \ to continue the line) uses awk with | as the delimiter and prints everything up to the first |.

in the first line, i didn't quote the -F variable, but i did in the second. the awk script is between two '.

i didn't try the whole script, i hope that helps.

GrapefruiTgirl 11-12-2010 09:04 AM

Code:

VARIABLE=$(awk '/^<xref image=".*<\/xref>/{print substr($0,14,12)}' input_file)
Try that...

As for the second issue, what "hidden characters" are you referring to, and in what sort of "text based" file? Is it a text file, or some other sort of file? How do you know they are there, if they are hidden? ;)

taskmaster 11-12-2010 10:03 AM

Thanks both for your responses, I might try the second approach initially as a computer program generated these files and the structure is set in stone, so to speak, at least for the part of the line that I included here.

In regards second question, if I vi the files I see funny stuff, if I display the file in XTerm I see other funny stuff, etc.

Additionally the cursor position counter in vim goes nutty when going across what appears to be empty space: if line reads "THORPE KISHIE C " the cursor position on the C is 1,21 and the cursor positioned on the space after C is 1,22 and the cursor on the next space goes to 1,24-23 then 1,26-24 then 1,28-25 then 1,30-26 etc 1,+2 - +1

Additionally these files although named .txt were never intended for someone to look at them raw. They were only intended for use by a programs backend with a GUI interface for the victim, I mean user.

Thanks Again Guys

GrapefruiTgirl 11-12-2010 10:25 AM

For the funny characters, maybe try:
Code:

cat -v file
the -v "shows non-printing characters" using CTRL (^) and META (M-) notation. Pipe it into `less` to put it on screen in an easily scrollable fashion.

If that doesn't work, let us know; someone will have another idea.

taskmaster 11-12-2010 11:51 AM

Well it is some ugly stuff in that file, but again I can work around it. I might close this thread and be back for assistance on a new thread. Want a sneak preview? OK!

The part of the line I am interested in grabbing looks like this:
..."STUBER ROBERT J "...

with all those spaces being as follows when you cat -v
..."STUBER ROBERT J- M- M- M- M- M- M- M etc - M- M- "...

I want awk to fill name variable with "Stuber Robert J" and stop at the J based on the next character being something other than A-Z. It's almost like I need a HEX Editor to see what that first character is and stop filling variable. Is it a dash or just showing as a dash with the cat -v command on a screen.

Take care all.

GrapefruiTgirl 11-12-2010 11:58 AM

Using regular expressions, you can match against non-printing characters too. So for example, if you had the word "John" followed by a bunch of hidden garbage, you could match for that.

For the time being (and if it's all the same to you), I suggest continuing the discussion in this thread, since I am guessing there may yet be more to this situation. Put another way: don't start a new thread, if it is to continue this one; instead, leave this one un-SOLVED until you're sure there's not more to this issue It'll help people find your thread(s) easier, and in less locations, when they search for things on this subject.. :)

If it's a new issue, then by all means, make a new thread. Actually, sed comes to mind for your "Robert Stuber" issue, but we'll know more when you tell us more about this.

Also, a suggestion: when posting chunks of data files, text files, especially when whitespace is relevant in the formatted listing of the data, please put the data in code tags. You can see their usage here:
http://www.phpbb.com/community/faq.php?mode=bbcode#f2r1

Keep us posted!


All times are GMT -5. The time now is 10:16 PM.