Thank you for the program in Perl Cedrik. However i would like to avoid using it, because i already have written some lines in bash, and i want to try if i can solve it using bash. There is some educational purpose in it for me as well, i like bash, almost anything is allowed. But i hope you are willing to help me using bash.
This is a scheme of the xml-file i am dealing with :
<BF:LOG>
<BF:ROUND>
<BF:SERVER>
<BF:SETTING>
</BF:SETTING>
<BF:EVENT>
</BF:EVENT>
<BF:SETTING>
</BF:SETTING>
<BF:EVENT>
</BF:EVENT>
<BF:ROUNDSTATS>
<BF:WINNINGTEAM> </BF:WINNINGTEAM>
<BF:VICTORYTYPE> </BF:VICTORYTYPE>
<BF:TEAMTICKETS> </BF:TEAMTICKETS>
<BF:PLAYERSTAT>
<BF:STATPARAM> </BF:STATPARAM>
</BF:PLAYERSTAT>
</BF:ROUNDSTATS>
</BF:ROUND>
</BF:LOG>
The event-blocks keep popping up everywhere in the file, and take up 99% of the space in the file.
I need to split these files up to max 550.000 lines, so the parser does not give me a mem-error.
When i have splitted them up i need to put the other tags back in the smaller files to make the log complete.
I am not exactly a highly experienced bash programmer, so forgive me my code.
This is what i have so far :
#!/bin/bash
filelist=""
filename=""
file_max_size="30000k"
i=""
a=0
xml_dest_dir="$HOME/download/logs/xmltest/test2/"
xml_feed_dir="$HOME/download/logs/xmltest/"
max_number_of_lines=0
global_line_counter=45
filelist=`ls "$xml_feed_dir"*.xml`
echo "$filelist"
#copy every file bigger than 30 mb to xml_dest_dir
find -size +"$file_max_size" -exec mv {} "$xml_dest_dir". \;
filelist_to_process=`ls "$xml_dest_dir"*.xml`
echo "filelist of files being processed: " $filelist_to_process
#MAIN LOOP
for i in $filelist_to_process;
do
let a=a+1
echo "now working on file: ""$i"
#strip the first 45 lines of setting, always the same in every file
head -n 45 "$i">"$xml_dest_dir""topfile_$a"
#set some variables for the next loop
max_number_of_lines=`wc -l "$i"|cut -f1 -d/`
echo "value for max_number_of_lines is " $max_number_of_lines
number_of_event_entries=`grep -n "<bf:event" "$i"|cut -f1 -d:|wc -l`
number_of_event_terms=`grep -n "</bf:event" "$i"|cut -f1 -d:|wc -l`
number_of_lines_in_file=`wc -l "$i"`
# subloop to detect the first event-blocks with create players
for j in $number_of_event_terms
do
linenumber_event_entry=`grep -n "<bf:event" "$i"|cut -f1 -d:|head -n"$j"|tail -n1`
linenumber_event_term=`grep -n "</bf:event" "$i"|cut -f1 -d:|head -n"$j"|tail -n1`
let linenumber_next=$linenumber_event_term+1
echo "value of linenumber_event_entry is " $linenumber_event_entry
echo "value of linenumber_event_terms is " $linenumber_event_term
echo "value of linenumber_next is " $linenumber_next
if [ $linenumber_next -ne $number_of_event_entries ]
then
let s=($j - 45)
head -n"$j" "$i"|tail -n"$s">"$xml_dest_dir/eventfile1_$a"
fi
done
done
exit
I know it is not working, but i seem to have more of a methodical problem doing it with grep, head and tail.
Yet i believe it must be possible to detect the end of a <bf:event> block by the fact that the line following </bf:event> may not be equal to <bf:event>. I don't know how to build my counter correctly for the line number i need for the if-statement.
It seemed so simple -=sigh=-
Can you help me with this please ?