-   Linux - Newbie (
-   -   Extracting text with grep or awk? (

UrbanDruid 04-05-2005 03:42 PM

Extracting text with grep or awk?
Hey, kids!

I have a couple of text file reports in a format like this:

8/25/2004 8:23:30 AM 0x1000000 0x1 5,059 E:\Project 68950\Project 68950 2-up cover report
8/25/2004 8:23:32 AM 0x1000000 0x1 675,328 E:\Project 68950\Project 68950 2-up covers
7/13/1990 1:00:14 PM 0x1000000 0x1 0 E:\Project 68950\Fonts\Helve
8/5/1993 5:02:32 AM 0x1000000 0x1 0 E:\Project 68950\Fonts\HelveNeuMedCon
8/25/2004 8:19:02 AM 0x1000000 0x1 0 E:\Project 68950\Fonts\Helvetica
8/13/2004 10:49:30 AM 0x1000000 0x1 0 E:\Project 68950\Fonts\Helvetica Neue Condensed 3

All I need out of it are the "Project 68950" sections, and preferably only unique occurrences.

Columns are separated by varying numbers of spaces, rather than tabs, but the text I need will always appear between the first pair of backslashes in each line. Because of this I thought grep might be the way to go, but can't figure out how to do it. And unfortunately I know more about full-contact knitting than I do about awk.

Any advice? (For added marks: in the other file the text I need is between the second and third backslashes.)



Tinkster 04-05-2005 03:53 PM

In awk ...

awk '{for(i=7;i<=NF;i++)printf $i" ";print""}' <report_name> | uniq
Or, a bit more elegant, using sed :)

sed 's/.*\(E:\)/\1/g' <report_name>|uniq


UrbanDruid 04-07-2005 12:01 PM


Many thanks. Unfortunately, those give me the whole path (e.g., "E:\Project 68950\Project 68950 2-up cover report") for each file/directory in "Project 68950," when all I want is unique occurrences of "Project 68950."

These reports are the contents of backup tapes and all I need to know is which projects are on them, not every file.

I dug around in the man pages of awk and sed to figure out your examples and I think I burst a blood vessel. If you can take another crack I'd appreciate it.



Tinkster 04-07-2005 01:53 PM

Oh ... I got you wrong the first time round, that makes it
even easier ;) ... when you referred to it as section I
assumed you wanted the font names.

awk -F\\ '{print $2}' <report_name> | uniq


UrbanDruid 04-07-2005 02:08 PM

Perfect! Grazie!

Tinkster 04-07-2005 02:57 PM


Originally posted by UrbanDruid
Perfect! Grazie!
Prego ;)


All times are GMT -5. The time now is 08:20 AM.