LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-17-2005, 12:44 PM   #1
Grafbak
Member
 
Registered: Jun 2003
Location: /dev/null
Distribution: Knoppix 3.3
Posts: 61

Rep: Reputation: 15
grep detecting carriage return, how ?


Hello,

i am trying to let grep detect a carriage return, but im not sure how to do it. I have looked around but i can't find a good grep tutorial that covers this, or how to detect an ascii code with grep.

I have a large xml-file which contains several blocks of <BF:EVENT> </BF:EVENT> and i want to extract these blocks, and save these in a separate file.
It's a little tricky to detect the last </bf:event> because they come in groups of like 50000 times <bf:event> <bfaram> </bfaram></bf:event>.
I want to track the last line number in the block containing </bf:event> by the fact that </bf:event> is NOT followed by <bf:event> on the next line.

I think it should be something like this :

for i in counter
do
x = `grep -n "<bf:event"|cut -f1 -d: | head -n1`
y = `grep -n "</bf:event>/CR<bf:event>"|cut -f1 -d: | tail -n1`
head -n"$X" xmlfile.xml|tail ("$X"-"$Y") > partfile"$i".xml
done


However i can't get it to do what i want. Can somebody please help me out ?
 
Old 03-17-2005, 12:48 PM   #2
Matir
LQ Guru
 
Registered: Nov 2004
Location: San Jose, CA
Distribution: Debian, Arch
Posts: 8,507

Rep: Reputation: 128Reputation: 128
Grep only works on a line-by-line basis. You cannot match the grep expression to multiple lines. I believe awk can do this, however.
 
Old 03-17-2005, 12:53 PM   #3
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,552

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
Try this perl script from another thread :
http://www.linuxquestions.org/questi...=xml+file+perl
 
Old 03-17-2005, 02:01 PM   #4
Grafbak
Member
 
Registered: Jun 2003
Location: /dev/null
Distribution: Knoppix 3.3
Posts: 61

Original Poster
Rep: Reputation: 15
Thank you for the program in Perl Cedrik. However i would like to avoid using it, because i already have written some lines in bash, and i want to try if i can solve it using bash. There is some educational purpose in it for me as well, i like bash, almost anything is allowed. But i hope you are willing to help me using bash.

This is a scheme of the xml-file i am dealing with :

<BF:LOG>
<BF:ROUND>
<BF:SERVER>
<BF:SETTING>
</BF:SETTING>
<BF:EVENT>
</BF:EVENT>
<BF:SETTING>
</BF:SETTING>
<BF:EVENT>
</BF:EVENT>
<BF:ROUNDSTATS>
<BF:WINNINGTEAM> </BF:WINNINGTEAM>
<BF:VICTORYTYPE> </BF:VICTORYTYPE>
<BF:TEAMTICKETS> </BF:TEAMTICKETS>
<BF:PLAYERSTAT>
<BF:STATPARAM> </BF:STATPARAM>
</BF:PLAYERSTAT>
</BF:ROUNDSTATS>
</BF:ROUND>
</BF:LOG>

The event-blocks keep popping up everywhere in the file, and take up 99% of the space in the file.
I need to split these files up to max 550.000 lines, so the parser does not give me a mem-error.
When i have splitted them up i need to put the other tags back in the smaller files to make the log complete.
I am not exactly a highly experienced bash programmer, so forgive me my code.

This is what i have so far :


#!/bin/bash
filelist=""
filename=""
file_max_size="30000k"
i=""
a=0
xml_dest_dir="$HOME/download/logs/xmltest/test2/"
xml_feed_dir="$HOME/download/logs/xmltest/"
max_number_of_lines=0
global_line_counter=45
filelist=`ls "$xml_feed_dir"*.xml`
echo "$filelist"
#copy every file bigger than 30 mb to xml_dest_dir
find -size +"$file_max_size" -exec mv {} "$xml_dest_dir". \;
filelist_to_process=`ls "$xml_dest_dir"*.xml`
echo "filelist of files being processed: " $filelist_to_process
#MAIN LOOP
for i in $filelist_to_process;
do
let a=a+1
echo "now working on file: ""$i"
#strip the first 45 lines of setting, always the same in every file
head -n 45 "$i">"$xml_dest_dir""topfile_$a"
#set some variables for the next loop
max_number_of_lines=`wc -l "$i"|cut -f1 -d/`
echo "value for max_number_of_lines is " $max_number_of_lines
number_of_event_entries=`grep -n "<bf:event" "$i"|cut -f1 -d:|wc -l`
number_of_event_terms=`grep -n "</bf:event" "$i"|cut -f1 -d:|wc -l`
number_of_lines_in_file=`wc -l "$i"`
# subloop to detect the first event-blocks with create players
for j in $number_of_event_terms
do
linenumber_event_entry=`grep -n "<bf:event" "$i"|cut -f1 -d:|head -n"$j"|tail -n1`
linenumber_event_term=`grep -n "</bf:event" "$i"|cut -f1 -d:|head -n"$j"|tail -n1`
let linenumber_next=$linenumber_event_term+1
echo "value of linenumber_event_entry is " $linenumber_event_entry
echo "value of linenumber_event_terms is " $linenumber_event_term
echo "value of linenumber_next is " $linenumber_next

if [ $linenumber_next -ne $number_of_event_entries ]
then
let s=($j - 45)
head -n"$j" "$i"|tail -n"$s">"$xml_dest_dir/eventfile1_$a"
fi
done
done
exit


I know it is not working, but i seem to have more of a methodical problem doing it with grep, head and tail.
Yet i believe it must be possible to detect the end of a <bf:event> block by the fact that the line following </bf:event> may not be equal to <bf:event>. I don't know how to build my counter correctly for the line number i need for the if-statement.
It seemed so simple -=sigh=-

Can you help me with this please ?
 
Old 03-17-2005, 02:11 PM   #5
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
How about using tac and looking for the first occurrence of </bf:event>?

Or I could be barking up the wrong tree.
 
Old 03-17-2005, 02:18 PM   #6
Grafbak
Member
 
Registered: Jun 2003
Location: /dev/null
Distribution: Knoppix 3.3
Posts: 61

Original Poster
Rep: Reputation: 15
Well the problem is that inside a real file it looks like this :

<bf:event name="createPlayer" timestamp="12.5835">
<bf:param type="int" name="player_id">0</bf:param>
<bf:param type="vec3" name="player_location">937.2/18.07/961.78</bf:param>
<bf:param type="string" name="name">&lt;BeC.bF&gt;Pank</bf:param>
<bf:param type="int" name="is_ai">0</bf:param>
<bf:param type="int" name="team">2</bf:param>
</bf:event>
<bf:event name="playerKeyHash" timestamp="13.1593">
<bf:param type="int" name="player_id">0</bf:param>
<bf:param type="string" name="keyhash">9b15a1ae21024d3e978398603bb636f4</bf:param>
</bf:event>
<bf:event name="createPlayer" timestamp="13.6766">
<bf:param type="int" name="player_id">1</bf:param>
<bf:param type="vec3" name="player_location">937.2/18.07/961.78</bf:param>
<bf:param type="string" name="name">Razorlight</bf:param>
<bf:param type="int" name="is_ai">0</bf:param>
<bf:param type="int" name="team">1</bf:param>
</bf:event>
<bf:event name="playerKeyHash" timestamp="13.8777">
<bf:param type="int" name="player_id">1</bf:param>
<bf:param type="string" name="keyhash">77688ed320616376490dfdf7a5ac288a</bf:param>
</bf:event>
<bf:event name="roundInit" timestamp="22.5044">
<bf:param type="int" name="tickets_team1">0</bf:param>
<bf:param type="int" name="tickets_team2">0</bf:param>
</bf:event>
<bf:event name="createPlayer" timestamp="37.067">
<bf:param type="int" name="player_id">2</bf:param>
<bf:param type="vec3" name="player_location">937.2/18.07/961.78</bf:param>
<bf:param type="string" name="name">DAS_BeKiffte_SChaAf</bf:param>
<bf:param type="int" name="is_ai">0</bf:param>
<bf:param type="int" name="team">1</bf:param>


So i have several of these blocks, i dont know how many, and i want the linenumber of the last </bf:event> in such every block. Hope that helps.

Last edited by Grafbak; 03-17-2005 at 02:20 PM.
 
Old 03-17-2005, 02:33 PM   #7
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
Does the <bf: param type...> exist outside of the event tags?
 
Old 03-17-2005, 02:37 PM   #8
Grafbak
Member
 
Registered: Jun 2003
Location: /dev/null
Distribution: Knoppix 3.3
Posts: 61

Original Poster
Rep: Reputation: 15
No the bf:param does not exist outside of the event tags.
 
Old 03-17-2005, 03:09 PM   #9
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
So if you want to move all occurrences of bf:event ... /bf:event to another file,
Code:
grep "bf:event\|bf:param" > newfile
should do it.

And
Code:
grep -v "bf:event\|bf:param" > anotherfile
will give you the rest.

Is this what you were after?
 
Old 03-17-2005, 03:21 PM   #10
Grafbak
Member
 
Registered: Jun 2003
Location: /dev/null
Distribution: Knoppix 3.3
Posts: 61

Original Poster
Rep: Reputation: 15
That is almost what i'm after. However, would you know a simple command to split up the output from that grep command into separate 30 megabyte files ? (i'll be thinking along with you)
I need to put all the files back together again as well..

Last edited by Grafbak; 03-17-2005 at 03:23 PM.
 
Old 03-17-2005, 03:48 PM   #11
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
Split could do that.

Maybe
Code:
grep "bf:event\|bf:param" | split -C 30m
 
Old 03-17-2005, 03:50 PM   #12
Matir
LQ Guru
 
Registered: Nov 2004
Location: San Jose, CA
Distribution: Debian, Arch
Posts: 8,507

Rep: Reputation: 128Reputation: 128
The disadvantage of the split is that the individual files would likely be completely unparsable.
 
Old 03-17-2005, 03:55 PM   #13
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
Well, with the -C option as opposed to the -b option, at least you will only get complete lines.

It should be possible to add required tags to the top & bottom of these files to enable them to be parsed. And of course, they will still be readable as text.
 
Old 03-17-2005, 03:57 PM   #14
Grafbak
Member
 
Registered: Jun 2003
Location: /dev/null
Distribution: Knoppix 3.3
Posts: 61

Original Poster
Rep: Reputation: 15
I think i can use that command, im gonna try and play with it. The problem is i can not just collect all the <bf:event> blocks and throw em in one file. I need to split them up in separate files of maximum 30 mb.
But i when i reconstruct the file i need all the other blocks, like <bf:server> on te correct place again. I have just tried to parse the output of grep "bf:event\|bf:param" > newfile but then the parser gives an error because the server settings, and other tags are missing.
So if i have 700000 lines with bf:event tags, i want to take 550000 (roughly 30 mb) lines of bf:event, and simply copy the bf:server tags and closing tags to the 30 mb file so it will come through the parser. The remaining 150000 lines of bf:event will get the same bf:server tags, and will also be made 'complete' again.
I'm sorry if i wasn't clear on that.
 
Old 03-17-2005, 04:11 PM   #15
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
Sorry if I seem a bit dim here, but lets see if I've got this straight:-

You have one large file.

You want to split it into several 30M files with the events, and several 30M files with the rest.

Then you want to be able to put it back together?

If this is correct, do you need to reconstruct it? The original file will not have changed by grepping it.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Find & Replace Carriage Return in ooo linuxian Linux - Software 1 04-09-2005 05:43 PM
carriage return in emac lisp balloon Programming 1 11-23-2004 08:29 AM
Inserting a carriage return in awk legtester Linux - General 1 08-17-2003 05:29 PM
unescaping a carriage return in perl acid_kewpie Programming 10 09-20-2002 02:20 PM
pppd...carriage return?!? icyfire Linux - Software 1 02-14-2002 07:07 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:31 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration