Assistance on file output - awk/sed

maddyfreaks · 02-16-2019, 10:58 PM

Hi Experts

I have a file (with more than 1000+ lines) need to get the output in a format need your inputs/help
All I need to see is each thr how many lines it has
In below case
I have 3 thr's and 5,4,2 is the counts of each thr lines respective.
Need assistance in generating the output.

Am new to shell scripting couldn't get an idea on how to spot the output

File :

19608250477[thr=22321]: Res90 at
1: 0x00007f1fb38d5089
2: 0x00007f1fb5565c79
3: 0x00007f1fbb097775
4: 0x00007f1fbb034a69
5: 0x00007f1fbb035467
19601889333[thr=19068]: Res87 at
1: 0x00007f1fc15f86c0
2: 0x00007f1fc1a27d7c
3: 0x00007f1fc1d0f312
4: 0x00007f1fc1caf054
16236545786[thr=55528]: Res67 at
1: 0x00007f1fb4959a90
2: 0x00007f1fb557ad94

I need the output like below

thr=22321 ; Count 5 # Count number of lines for each thr ; Each content will start with number (1: and finish with some numebr

thr=19068 ; Count 4
thr=55528 ; Count 2

berndbausch · 02-17-2019, 01:24 AM

I use awk for everything, including the laundry.

You search for lines that contain "[thr". When you find such a line, reset the line counter to 1, then peel off the value between the square brackets. This can be done with the sub() function, for example sub(/.*\[/,"") removes everything up to and including the opening bracket, and sub(/\].*/,"") removes everything starting with the closing bracket. Remember what remains after the two sub()'s. For example:

Code:

/\[thr/ { linectr = 1
          sub(/.*\[/,"")
          sub(/\].*/,"")
          thr = $0       
          next              }

Note that $0 refers to the entire line, but after stripping the unwanted parts. The next directive advances to the next line.

When awk encounters other lines, it just increments the counter:

Code:

{ linectr += 1 }

How are you going to output the line count? Whenever the line contains [thr, you print the thr number and line count of the previous block. The only time you don't do this is at the first ocurrence of [thr. I assume that this is the first line of the file. Thus, the entire program looks like this:

Code:

/\[thr/ { if (NR>1) print thr " ; " linectr
          linectr = 1
          sub(/.*\[/,"")
          sub(/\].*/,"")
          thr = $0       
          next              }
        
        { linectr += 1 }

Put this in a file, for example process-thr.awk, and run awk -f process-thr.awk YOUR-INPUT.
WARNING: This is not tested. I assume that the first thr block starts in the first line, and that the file contains no other lines than thr blocks.

This can certainly be improved. Instead of counting the lines, for example, one could use the number at the beginning of each line.

grail · 02-17-2019, 01:56 AM

Quote:

Am new to shell scripting couldn't get an idea on how to spot the output

After 85 posts on here I think saying you are new may be a little excessive, plus you have been told multiple times to include some kind of an attempt.
You will definitely not get any better if you simply ask for solutions without working out how to do some of it yourself.

As above, I too would use awk and maybe just play with the FS a little to get your thr value. the counting has already been demonstrated.

chrism01 · 02-17-2019, 09:47 PM

Edit - had a nice soln for "thr" lines but I forgot the count requirement - never mind

allend · 02-18-2019, 02:07 AM

You can build appropriate arrays with

Code:

awk -F "[][]" '!/^$/ {if ($2) {t[i++]=$2} else {c[i-1]++}}'

Adding an END rule to produce output is left as an exercise.

MadeInGermany · 02-18-2019, 07:33 AM

Regarding post#2, if you suppress the 1st printout (e.g. by the NR>1 condition), then for symmetry reason consider another printout in the END section.
Then it's handy to put the printout in a function.

Code:

function printout(){
  if (printctr++) print thr " ; " linectr
}
/\[thr/ {
  printout()
  linectr = 0
  sub(/.*\[/,"")
  sub(/\].*/,"")
  thr = $0
  next
}
{ linectr++ }
END { printout() }

berndbausch · 02-18-2019, 08:51 AM

Quote:

Originally Posted by MadeInGermany

Regarding post#2, if you suppress the 1st printout (e.g. by the NR>1 condition), then for symmetry reason consider another printout in the END section.

Of course, you are right. my program would not have printed the last thr block. It would have been a nice problem to solve for the OP