How to accomplish this?

Xossarian · 04-15-2008, 12:47 PM

Hi All

I am new to Linux and Shell scripting. So bear with me.

I have two file the log.txt and the output.txt.
The log.txt looks like this
-------------------------------------------------------------------------
####
- r34327 | #####| 2008-03-31 10:18:35 -0400 (Mon, 31 Mar 2008) | 1 line Tag created for 00.04.11.04
- r34077 | #####| 2008-03-27 17:38:58 -0400 (Thu, 27 Mar 2008) | 3 lines Pilot CSC Inbound error fixed ID: 11237

#####
- r34327 | #######| 2008-03-31 10:18:35 -0400 (Mon, 31 Mar 2008) | 1 line 00.04.11.04
- r34193 | #######| 2008-03-28 10:29:17 -0400 (Fri, 28 Mar 2008) | 3 lines D09285 - Code change for Edit filtering ID:12345
-------------------------------------------------------------------------
The output.txt file looks like this. It contains the desscription of the ID's in log.txt
-------------------------------------------------------------------------
Defect/LifeCycle Mgmt/11237:CSC : Pilot: CrossSystemInbound Error
Fixed Release No: 11.02
search with some specific customers who are in 442 and another with 7932.
- 1 occured 2 days ago
- 2 occured yesterday
-------------------------------------------------------------------------
What i want to accomplish is write a script that searches for all the ID's in log.txt file and then retreives the information about the particular ID from the output.txt file and only get the first two lines and write it back in the log.txt next to the ID
E.g.
-------------------------------------------------------------------------
####
- r34327 | #####| 2008-03-31 10:18:35 -0400 (Mon, 31 Mar 2008) | 1 line Tag created for 00.04.11.04
- r34077 | #####| 2008-03-27 17:38:58 -0400 (Thu, 27 Mar 2008) | 3 lines Pilot CSC Inbound error fixed ID: 11237
Defect/LifeCycle Mgmt/11237:CSC : Pilot: CrossSystemInbound Error
Fixed Release No: 11.02
-------------------------------------------------------------------------
And do it for all the ID in the log.txt

can somebody point me to the right direction
Thank you
X

beadyallen · 04-15-2008, 06:13 PM

From what I can tell from your post, you'll need two things. Firstly, you need a way to grab the individual ID's from the log.txt file, and secondly, you want to grab 1 line before that ID in log.txt, and 1 line after the match in output.txt.
You can use grep for both of these. First off, use the '-o' option to just grab the ID using a suitable regexp. Then strip out the ID (not sure if this is needed, but better safe than sorry). Something like

Code:

 grep 'ID: *[0-9]\+' | sed 's/ID: *//'

This'll match 'ID:' followed by 0 or more spaces (' *'), followed by a number of 1 or more digits ('[0-9]\+'). The 'sed s/ID: *//' then removes the ID: and spaces from the line, leaving just the number.

So having got the individual ID's, you then want to extract lines before in the log.txt, and lines after in output.txt. This is easily achieved using the '-A' and '-B' options in grep (lines after and before).
So putting it all together:

Code:

for x in $( grep -o 'ID: *[0-9]\+' log.txt | sed 's/ID: *//' );
do
  echo -----------------------------
  linebefore=$(grep -B 1 $x log.txt)
  lineafter=$(grep -A 1 $x output.txt)
  echo "${linebefore}"
  echo "${lineafter}"
done

The above is just based on what you posted, and will probably need editing. For instance, it assumes that the ID number will only ever exist in output and log as the actual ID, not as just a number. It might be worth removing the pipe to sed, but then the spaces between ID: and the number need to match between log.txt and output.txt. It also assumes every ID in log.txt has a corresponding entry in output.txt. Still, maybe it's a start.
Hope you follow what's happening.

osor · 04-15-2008, 07:13 PM

As beadyallen said, you have two “searches” going on here. First, you are searching each line for the occurrence of the extended regular expression

Code:

ID: *([0-9]+)

(where the stuff in parens is what you want to capture). Then, if such a match occurs, you will search the other file for a line containing the fixed expression:

Code:

$ID:

where $ID has been replaced with whatever you captured previously. If such a line is found, print it and the following line.

Generally, this would be easiest to do in awk or perl, but I will give a bash solution, simply for fun (and also since bash’s new regex functionality is rarely put to good use). We will iterate over every line in log, and print its contents. But if a match occurs, other stuff will also happen. Here is some code illustrating what I mean:

Code:

#!/bin/bash

IFS=''
while read line; do
	echo "$line"
	if [[ "$line" =~ ID:[[:space:]]*([0-9]+) ]]; then
		sed -ne "/${BASH_REMATCH[1]}:/{p;n;p}" output.txt
	fi
done < log.txt > log2.txt

#mv log2.txt log.txt

I have made some assumptions:

In log.txt, the id number always follows “ID:” after zero or more spaces.
In output.txt, the id is always followed by a “:” character.

Also, the above is very slow, as it reads and searches through (the potentially large) output.txt file on every match. If you want to use perl, you could slurp the entire output.txt into memory, and build a hash containing the “line numbers” of any potential ID numbers in one fell search.

ghostdog74 · 04-15-2008, 09:18 PM

Code:

awk 'FNR==NR && /ID:/{
  gsub(/.*ID: */,"")
  id[$0]
  next;
}
{
  for(i in id ) {
   if ( $0 ~ i ) {
    print $0;getline;print $0
   }
  }
}
' file file1

output:

Code:

# ./test.sh
Defect/LifeCycle Mgmt/11237:CSC : Pilot: CrossSystemInbound Error
Fixed Release No: 11.02