LinuxQuestions.org - [SOLVED] Bash script to parse a file to get a set of line between a specific characters

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Bash script to parse a file to get a set of line between a specific characters (https://www.linuxquestions.org/questions/linux-newbie-8/bash-script-to-parse-a-file-to-get-a-set-of-line-between-a-specific-characters-852147/)

Bash script to parse a file to get a set of line between a specific characters

Hi all,
I have a log file that contains information like this:

----------------------------
r11141 | prasath-palani | 2010-12-23 16:21:24 +0530 (Thu, 23 Dec 2010) | 1 line
Changed paths:
M /projects/
M /projects/
M /applications
updated for integration test
----------------------------
r11140 | upendra.sahu | 2010-12-23 16:09:38 +0530 (Thu, 23 Dec 2010) | 1 line
Changed paths:
M /projects/trunk/
M /projects/trunk/
A /projects/trunk/
updated to use the Manifest function
----------------------------

what i need is, i need to copy the data given between the "----------------------------" to seperate files, for, e.g. the first set of data between the "----------------------------" should be in one file and another set of data in another file.

Can anyone help me on how to write a bash script for this?

What have you tried and where are you stuck?

While I'm sure you can do it in bash, I find bash rather hard to get my head around - it seems hard to debug and incredibly picky about syntax. If you're just starting out scripting, a language like perl or python would seem to be a better choice to me...

Quote:

Originally Posted by grail (Post 4201762)

What have you tried and where are you stuck?

Hi,
Finally i spent 4hrs and created the script, and posting it here, which may be useful for others....

#!/bin/bash
logfile="$1/trunklog.txt"

#Exit if trunklog.txt file doesn't exists in the $1 {Path to the logfile}
if [ ! -f "$logfile" ]; then
echo "File trunklog.txt doesn't exist in the $1 directory"
exit 1
fi

#Create the log file in the path
cd "$1"
cd "../frags"
#Delete all the files/folers in the ./frag directory
rm -rf *

#Parses the input "tmp.txt" file and creates the logfrags file with file name as
#revision number given in the first line of the input "tmp.txt" file.
#Deletes the passed "tmp.txt" file once the logfrags file is created.
processFile()
{
firstline="TRUE"
fname="xxx.txt"

# Set loop separator to end of line
BAKIFS=$IFS
IFS=$(echo -en "\n\b")

exec 3<&0
exec 0<"$1"
while read -r line
do
#echo $line
var="$line"

#Read the first line in the file to get the revision number
if [[ "$var" =~ "| 1 line" ]] && [ $firstline = "TRUE" ] ; then
fname=${var%%\|*}
fname=${fname#r}
fname=${fname//[[:space:]]}
echo $var >> "$fname"
firstline="FALSE"
else
echo $var >> "$fname"
fi
done
exec 0<&3
# restore $IFS which was used to determine what the field separators are
FS=$BAKIFS

#Delete the tmp.txt file which is passed as argument to this function
rm "$1"
}

# Read the ./state/trunklog.txt file line by line and parse it by
# Calling the Updatelogfrag function to create the frag files
readlogfile()
{
first="TRUE"

# Set loop separator to end of line
BAKIFS=$IFS
IFS=$(echo -en "\n\b")

exec 3<&0
exec 0<"$1"

while read -r line
do
#echo $line
#Remove the first line in the trunklog.txt file
if [ $first = "TRUE" ]; then
first="FALSE"
elif [ $first = "FALSE" ]; then
if [[ ! "$line" =~ ---------* ]]; then
echo "$line" >> "temp.txt"
else
processFile "temp.txt"
fi
fi
done

exec 0<&3

# restore $IFS which was used to determine what the field separators are
FS=$BAKIFS
}

echo "Creating individual numbered revision log fragments (logfrags) files. . . . ."

readlogfile "$logfile"

echo ". . . Done"

echo;echo "logfrags files created in $PWD/ directory"
exit 0

You can also use a one liner awk for this purpose.

See this thread http://www.linuxquestions.org/questi...-shell-595506/

Also, if your problem is resolved, go to top of your thread and in the Thread Tools click on "Mark this thread as solved".

venkatrg - Firstly I commend you on your script as it is quite in depth. Second, if you use [code][/code] tags around your code it will maintain formatting and be a
lot easier to read.

As vikas has pointed out there are easier ways, but I would like to help you with what you have presented.

I will start from the top:

1. You refer to $1 all the way through the script. Are you aware that when the script is called that $1 is the first item on the command line after the script name and prior to a space but when you call
one of your own functions it is the first item after the function name prior to a space? I ask as it is very confusing from a reading point of view to know which $1 is being referenced.

2. Let me know what you think might happen if the following were the only lines in your code:

Code:

#Create the log file in the path

cd "$1"

cd "../frags"

#Delete all the files/folers in the ./frag directory

rm -rf *

Now we call the script but make a little typo in our haste:

Code:

./script / folder

#        ^ this is a space between slash and word folder (typo)

Think about what might happen here??

3. Echo not necessary here:

Code:

IFS=$(echo -en "\n\b")

# could just be

IFS=$'\n\b'

I am also curious how backspace (\b) will be a separator??

4. In processFile function, why the need to set var=$line? Is there a reason you could not simply use $line in the places where you have $var?

5. No need to mix up testing options:

Code:

if [[ "$var" =~ "| 1 line" ]] && [ $firstline = "TRUE" ] ; then

#becomes

if [[ "$var" =~ "| 1 line" &&  $firstline = "TRUE" ]] ; then

6. You open file descriptor '3':

Code:

exec 3<&0

In both functions, but it is never actually closed. Line to close would be:

Code:

exec 3<&-

7. Escape (\) not required here:

Code:

fname=${var%%\|*}

8. The following compound if statement inside the while loop for function readlogfile has me confused:

Code:

while read -r line

do

#echo $line

#Remove the first line in the trunklog.txt file

    if [ $first = "TRUE" ]; then

        first="FALSE"

    elif [ $first = "FALSE" ]; then

        if [[ ! "$line" =~ ---------* ]]; then

            echo "$line" >> "temp.txt"

        else

            processFile "temp.txt"

        fi

    fi

done

So if we start from the top of this snippet:

a. We using a while loop to read from the past in log file
b. The use of if then elif is not required as there are only 2 options for $first so this could become a simple if / else construct:

Code:

    if [ $first = "TRUE" ]; then

        first="FALSE"

    else

        if [[ ! "$line" =~ ---------* ]]; then

            echo "$line" >> "temp.txt"

        else

            processFile "temp.txt"

        fi

    fi

c. Once in the else we then check if line contains dashes. If it does we call the function processFile. My issue here is that if the dashes are in the first line we look at then we will call function on a file that does not exist. I realise that the previous if is probably coping with this, but it might be an idea to test that the file exists prior to calling the function

9. When using =~ you are doing a regular expression comparison (of sorts) and so placing * at the end of the line is not required:

Code:

if [[ ! "$line" =~ ---------* ]]; then

# same result as

if [[ ! "$line" =~ --------- ]]; then

Once the match is made you do not care what comes after

I hope you do not take any of the above as negative. It is solely meant to aide you in improving what you have :)

Something to think about ... generally you will create functions for tasks that you repeat several times in code, but most of your tasks are fairly linear so it may
be just as easy to have most of this as a continuous code piece (just a thought)

Look forward to seeing how you go.