LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-25-2008, 02:48 AM   #1
lothario
Member
 
Registered: Apr 2004
Posts: 340

Rep: Reputation: 30
need help processing large data files


I have to process some very large text data files.
Each is over 100 Mb.

Here is a sample:
Quote:
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
system A 2008_05_24:22: ...data...
system A 2008_05_24:22: ...data...
system B 2008_05_24:22: ...data...
system B 2008_05_24:22: ...data...
system A 2008_05_24:22: ...data...
system A 2008_05_24:21: ...data...
system A 2008_05_24:21: ...data...
system B 2008_05_24:21: ...data...
I want to insert a "-----" separator whenever there
is any difference in the first 22 characters of the
current and next line.

In other words, the above data file should look
like this:
Quote:
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
-----
system A 2008_05_24:22: ...data...
system A 2008_05_24:22: ...data...
-----
system B 2008_05_24:22: ...data...
system B 2008_05_24:22: ...data...
-----
system A 2008_05_24:22: ...data...
-----
system A 2008_05_24:21: ...data...
system A 2008_05_24:21: ...data...
-----
system B 2008_05_24:21: ...data...
I am not bash script expert, I need some help.

Thanks.
 
Old 05-25-2008, 06:11 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Would this do:

Code:
#!/bin/bash

awk 'BEGIN { prevPart = "" }
{ 
currPart = substr($0,1,23)
if ( prevPart != currPart && NR != "1" ) { print "-----" }
print $0
prevPart = currPart }
' infile
A testrun with the provided data:
Code:
$ ./seperate.sh
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
-----
system A 2008_05_24:22: ...data...
system A 2008_05_24:22: ...data...
-----
system B 2008_05_24:22: ...data...
system B 2008_05_24:22: ...data...
-----
system A 2008_05_24:22: ...data...
-----
system A 2008_05_24:21: ...data...
system A 2008_05_24:21: ...data...
-----
system B 2008_05_24:21: ...data...
$
The && NR != "1" part is there to supress the first ----- line. If anything is unclear, just ask.

Hope this helps.
 
Old 05-27-2008, 10:16 PM   #3
lothario
Member
 
Registered: Apr 2004
Posts: 340

Original Poster
Rep: Reputation: 30
Thanks. That was exactly what I needed.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Processing data from a 'foreign' database with mysql, or tools to pre-process data. linker3000 Linux - Software 1 08-14-2007 09:36 PM
processing data within files PirateJack Linux - Newbie 3 03-28-2006 11:32 AM
Data Processing joelhop Linux - General 8 01-01-2006 09:08 PM
need help on processing large data files eph Programming 3 03-11-2004 05:56 AM
Large data files on CD dema Linux - Newbie 1 01-26-2002 11:30 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 07:20 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration