reading a tagged log

Rosemary-Lane · 04-20-2024, 06:20 AM

I have a log file with simple(no meta characters) text lines. Each entry starts with a + followed by the name of the person who added an entry to the log. So for example:

+Alan
line1
line2

more lines
some blanks lines

+David
line
more lines

still more lines
+Alan
...
...
+Chris
..
..

What is a good way to filter out all the entries by, say,Alan? Blank lines must be maintained as part of the entry.

pan64 · 04-20-2024, 06:47 AM

I don't really know the desired output, but probably awk with a well defined record separator (RS="+" probably) can do the job.

MadeInGermany · 04-20-2024, 10:35 AM

Yes, awk is good for this.
RS is best if it's at the end of the record. But here the + is at the beginning, so perhaps a state variable is simpler. (Set the state if the + is met. If state is good then print the current line.)
Waiting for some attempt of the O/P...

Rosemary-Lane · 04-20-2024, 12:39 PM

My first goal was to find a list of all unique names. I can do that with:

awk '/^+/' logfile | awk '!a[$0]++'

But I am still struggling with filtering out all entries of the same person. Ideally I could call a function with any of the names as parameter and have all his/her entries printed.

TB0ne · 04-20-2024, 01:53 PM

Quote:

Originally Posted by Rosemary-Lane

My first goal was to find a list of all unique names. I can do that with:

Code:

awk '/^+/' logfile |  awk '!a[$0]++'

But I am still struggling with filtering out all entries of the same person. Ideally I could call a function with any of the names as parameter and have all his/her entries printed.

My first question would be, where does this information come from, and can the output format be changed? Often times you can modify what the program(s) in question output, so if you could get all the data for each person on one line, that'd make it FAR easier to get out info for just one person. I'm not an awk expert, and I'd probably write a perl script to do this.

My approach would be to read the file line-by-line (you don't say how big these files are), and look for the + sign, then compare the name to what you're looking for...if you find it, all other lines would be pushed into an array, until you hit the NEXT line with a + at the beginning. Name match? Keep shoving data out to the array. Doesn't match? keep reading. When you're done, you'll have an array with all of Alan's data in it, and you can output to screen/file/whatever. There are also approximately 10,000 other ways to do this, but for quick-and-dirty (this sounds like homework, honestly), that'd be my approach.

syg00 · 04-20-2024, 06:11 PM

I can't imagine any circumstance where you should need to pipe awk to awk - it has all the conditionals needed, and the END block for tidying up after the input has reached EOF in need.
The typical solution for this sort of thing is to search for your key and set a flag - print while the flag is true. Turn the flag off at the next non-key. You can pass the key in by a bash variable.
Pretty straightforward.

MadeInGermany · 04-21-2024, 04:09 AM

Quote:

My first goal was to find a list of all unique names. I can do that with:

Code:

awk '/^+/' logfile | awk '!a[$0]++

'

Can be efficiently done in one awk:

Code:

awk '/^+/ && !a[$0]++' logfile

What did you attempt with your actual challenge?
You can pass a parameter like this

Code:

awk -v tag=Alan '...'

or like this

Code:

awk '...' tag=Alan

Rosemary-Lane · 04-21-2024, 08:16 AM

Code:

awk '...' tag=Alan

[/QUOTE]

Thanks for the improvements. Yes the single awk is neat and concise.

This log has just over 2000 lines.
The log was used by several engineers as they were installed a refrigeration plant. It all went well. But I am now preparing an activity report on each engineer's contribution.

I think I may have found a way to pull all blocks of text for each person (tag) with this command:

awk '/^+/{f=(/tag/)} f' logfile # e.g tag=Alan

No failures so far. I don't like the fact that the tag is hard-coded in the command. I guess I could use something like:

<logfile awk '/^+/{f=(/tag/)} f' tag=Alan

If there is a more efficient way I would be grateful to hear.

MadeInGermany · 04-21-2024, 08:42 AM

Yes that's the efficient way.
But the / / enclose a literal string. Because / / is short for $0~/ / you can do $0 ~ variable

Code:

<logfile awk '/^+/{f=($0 ~ tag)} f' tag=Alan

Since an RE without anchors matches a sub string, Alan would match +Alana
The following are more precise:

Code:

<logfile awk '/^+/{f=($0 ~ ("^+" tag " *$"))} f' tag=Alan

I allow a trailing space.

Code:

<logfile awk '/^+/{f=($1 == ("+" tag))} f' tag=Alan

By choosing $1, the space-separated first field, I allow trailing space (and further fields).

pan64 · 04-21-2024, 10:52 AM

I guess something like this may work:

Code:

awk 'BEGIN{RS="+"} /^Alan/ {next};1'

(not tested)