Simple awk question search for string1 and extract string2
Hi All,
Haven't used Awk in a while, this is probably simple? have a file with the following line: <xref image="00001234.tif|V3|1999:11:19:22:13|49487|0"> image: </xref> want to use awk to search for: <xref image=" ??? /<xref image=\"/ ??? then set a variable called imagename to the next 12 characters ??? help ??? ****************** Also looking for a way to display a character based text file's hidden characters. Thanks in advance for your assistance. |
If the filename is always followed by a "|" character, this might work:
Code:
awk -F \" '$1=="<xref image=" {print $2}' file.in.name \ <xref image= and the second line (note the \ to continue the line) uses awk with | as the delimiter and prints everything up to the first |. in the first line, i didn't quote the -F variable, but i did in the second. the awk script is between two '. i didn't try the whole script, i hope that helps. |
Code:
VARIABLE=$(awk '/^<xref image=".*<\/xref>/{print substr($0,14,12)}' input_file) As for the second issue, what "hidden characters" are you referring to, and in what sort of "text based" file? Is it a text file, or some other sort of file? How do you know they are there, if they are hidden? ;) |
Thanks both for your responses, I might try the second approach initially as a computer program generated these files and the structure is set in stone, so to speak, at least for the part of the line that I included here.
In regards second question, if I vi the files I see funny stuff, if I display the file in XTerm I see other funny stuff, etc. Additionally the cursor position counter in vim goes nutty when going across what appears to be empty space: if line reads "THORPE KISHIE C " the cursor position on the C is 1,21 and the cursor positioned on the space after C is 1,22 and the cursor on the next space goes to 1,24-23 then 1,26-24 then 1,28-25 then 1,30-26 etc 1,+2 - +1 Additionally these files although named .txt were never intended for someone to look at them raw. They were only intended for use by a programs backend with a GUI interface for the victim, I mean user. Thanks Again Guys |
For the funny characters, maybe try:
Code:
cat -v file If that doesn't work, let us know; someone will have another idea. |
Well it is some ugly stuff in that file, but again I can work around it. I might close this thread and be back for assistance on a new thread. Want a sneak preview? OK!
The part of the line I am interested in grabbing looks like this: ..."STUBER ROBERT J "... with all those spaces being as follows when you cat -v ..."STUBER ROBERT J- M- M- M- M- M- M- M etc - M- M- "... I want awk to fill name variable with "Stuber Robert J" and stop at the J based on the next character being something other than A-Z. It's almost like I need a HEX Editor to see what that first character is and stop filling variable. Is it a dash or just showing as a dash with the cat -v command on a screen. Take care all. |
Using regular expressions, you can match against non-printing characters too. So for example, if you had the word "John" followed by a bunch of hidden garbage, you could match for that.
For the time being (and if it's all the same to you), I suggest continuing the discussion in this thread, since I am guessing there may yet be more to this situation. Put another way: don't start a new thread, if it is to continue this one; instead, leave this one un-SOLVED until you're sure there's not more to this issue It'll help people find your thread(s) easier, and in less locations, when they search for things on this subject.. :) If it's a new issue, then by all means, make a new thread. Actually, sed comes to mind for your "Robert Stuber" issue, but we'll know more when you tell us more about this. Also, a suggestion: when posting chunks of data files, text files, especially when whitespace is relevant in the formatted listing of the data, please put the data in code tags. You can see their usage here: http://www.phpbb.com/community/faq.php?mode=bbcode#f2r1 Keep us posted! |
All times are GMT -5. The time now is 10:16 PM. |