ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
The BF-server outputs a giant logfile, like 160 MB big.
99,9% of this file is filled with <bf:event><bf:param></bf:param></bf:event>.
However, they come in blocks of 10000, 40000, and are scattered all over the file.
In an earlier post i described what the xml-schedule looks like.
The other tags enclose them everytime, and there are also timestamps involved.
At the end of a <bf:round> there are <bf:roundstats> with <bf:playerstats>. I only want the parser to parse them once, and that is allowed by the parser. However the <bf:server>, <bf:setting> are required for a flawless parsing. So i need these in a separate file to copy them into the home-made 30 mb file in order to prevent parsing errors.
The logparser from a script i use for trying to make a stats page does not accept files that are 160 mb.
So that is my main reason to want to split them up. But i thought i would be easier than this ;).
I think what it comes down to is detection of the first <bf:event> in a block that contains multiple <bf:event> entries.
Also detection of the last </bf:event> in a block, they should point to linenumbers, and with head and tail i should be able to generate output to a file.
So my output-starting-linenumber is the linenumber that contains a <bf:event> entry and that linenumber minus 1 may NOT contain a </bf:event> entry.
The end of my output (end-of-block-linenumber) is either that output-starting-linenumber + 550000 lines OR the linenumber that contains a </bf:event> entry and NOT a <bf:event> entry on the end-of-block-linenumber+1 (the next line that is).
I'm gonna think about it, and will try some stuff.
So far, thank you very much, you have been most helpfull.
Carriage return has code "\r". There is also "newline" symbol "\n".
Unix separates lines using "\n", and windows separates lines using "\r\n" (or "\n\r", I don't remember exactly which one).
They can be detected/inserted using perl, sed, etc.
However, regular expressions has special "end-of-line" symbol "$" (and "beginning of line" symbol "^"), which should be used instead of "\r" or "\n", when possible, because it hides OS-specific "\n, \r, or \r\n" problem.
Note: If you want to do complicated text processing that doesn't fit into one line, try python, or perl.
Note: If you want to do complicated text processing that doesn't fit into one line, try python, or perl.
Erv is right. Its theoretically possible (I think) to do it your way, but you are making it needlessly hard on yourself.
Perl has modules like XML::Simple etc to do it all for you.
Note also that you have 500K+ lines files and shell is an interpreted lang and will have to start a new process for every call to awk/sed etc.
Its gonna be slow.
Perl is compiled on the fly, roughly 90% as quick as C.
(I don't know about Python).
In order to get the ^M in the command line, after the first " press ctrl+v then crtl+m. The carrot symbol created by using Shift + 6 although it looks the same is not interpreted by the system as the same.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.