LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-17-2005, 04:20 PM   #16
Grafbak
Member
 
Registered: Jun 2003
Location: /dev/null
Distribution: Knoppix 3.3
Posts: 61

Original Poster
Rep: Reputation: 15

It's like this :

The BF-server outputs a giant logfile, like 160 MB big.

99,9% of this file is filled with <bf:event><bf:param></bf:param></bf:event>.
However, they come in blocks of 10000, 40000, and are scattered all over the file.
In an earlier post i described what the xml-schedule looks like.
The other tags enclose them everytime, and there are also timestamps involved.
At the end of a <bf:round> there are <bf:roundstats> with <bf:playerstats>. I only want the parser to parse them once, and that is allowed by the parser. However the <bf:server>, <bf:setting> are required for a flawless parsing. So i need these in a separate file to copy them into the home-made 30 mb file in order to prevent parsing errors.
The logparser from a script i use for trying to make a stats page does not accept files that are 160 mb.
So that is my main reason to want to split them up. But i thought i would be easier than this ;).

Last edited by Grafbak; 03-17-2005 at 04:21 PM.
 
Old 03-17-2005, 04:31 PM   #17
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
It was really the reconstruction I was wondering about. That will take some thinking about!

Adding required headers and footers to allow parsing shouldn't be *too* difficult.
 
Old 03-17-2005, 04:43 PM   #18
Grafbak
Member
 
Registered: Jun 2003
Location: /dev/null
Distribution: Knoppix 3.3
Posts: 61

Original Poster
Rep: Reputation: 15
Well i am approaching the verge of insanity .

I think what it comes down to is detection of the first <bf:event> in a block that contains multiple <bf:event> entries.
Also detection of the last </bf:event> in a block, they should point to linenumbers, and with head and tail i should be able to generate output to a file.

So my output-starting-linenumber is the linenumber that contains a <bf:event> entry and that linenumber minus 1 may NOT contain a </bf:event> entry.

The end of my output (end-of-block-linenumber) is either that output-starting-linenumber + 550000 lines OR the linenumber that contains a </bf:event> entry and NOT a <bf:event> entry on the end-of-block-linenumber+1 (the next line that is).
I'm gonna think about it, and will try some stuff.
So far, thank you very much, you have been most helpfull.
 
Old 03-17-2005, 04:55 PM   #19
ahh
Member
 
Registered: May 2004
Location: UK
Distribution: Gentoo
Posts: 293

Rep: Reputation: 31
One last suggestion before bed.

What I was thinking is you could check the created files, and if the first line is not <bf:event> you could add it.

Similarly, if the last line is not </bf:event>, add it.
 
Old 01-22-2009, 11:37 AM   #20
lackita
LQ Newbie
 
Registered: Sep 2005
Posts: 4

Rep: Reputation: 0
One option is you could do a loop in conjunction with sed:

Code:
for file in $(ls [directory]/*);do sed "s/\r/\n$file: /g" $file | grep [whatever];done
It's not pretty, but it gets the job done.
 
Old 01-22-2009, 05:15 PM   #21
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by Grafbak View Post
Hello,

i am trying to let grep detect a carriage return,
Carriage return has code "\r". There is also "newline" symbol "\n".
Unix separates lines using "\n", and windows separates lines using "\r\n" (or "\n\r", I don't remember exactly which one).
They can be detected/inserted using perl, sed, etc.
However, regular expressions has special "end-of-line" symbol "$" (and "beginning of line" symbol "^"), which should be used instead of "\r" or "\n", when possible, because it hides OS-specific "\n, \r, or \r\n" problem.

Note: If you want to do complicated text processing that doesn't fit into one line, try python, or perl.
 
Old 01-22-2009, 05:29 PM   #22
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,348

Rep: Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749
Quote:
Note: If you want to do complicated text processing that doesn't fit into one line, try python, or perl.
Erv is right. Its theoretically possible (I think) to do it your way, but you are making it needlessly hard on yourself.
Perl has modules like XML::Simple etc to do it all for you.
Note also that you have 500K+ lines files and shell is an interpreted lang and will have to start a new process for every call to awk/sed etc.
Its gonna be slow.
Perl is compiled on the fly, roughly 90% as quick as C.
(I don't know about Python).

Of course its your decision.
 
Old 01-13-2010, 01:19 PM   #23
dray
LQ Newbie
 
Registered: Jan 2010
Posts: 1

Rep: Reputation: 0
grep for carriage return within files

find ./ -type f -exec fgrep -il "^M" {} \;

In order to get the ^M in the command line, after the first " press ctrl+v then crtl+m. The carrot symbol created by using Shift + 6 although it looks the same is not interpreted by the system as the same.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Find & Replace Carriage Return in ooo linuxian Linux - Software 1 04-09-2005 05:43 PM
carriage return in emac lisp balloon Programming 1 11-23-2004 08:29 AM
Inserting a carriage return in awk legtester Linux - General 1 08-17-2003 05:29 PM
unescaping a carriage return in perl acid_kewpie Programming 10 09-20-2002 02:20 PM
pppd...carriage return?!? icyfire Linux - Software 1 02-14-2002 07:07 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:42 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration