LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
LinkBack Search this Thread
Old 05-25-2008, 01:48 AM   #1
lothario
Member
 
Registered: Apr 2004
Posts: 340

Rep: Reputation: 30
need help processing large data files


I have to process some very large text data files.
Each is over 100 Mb.

Here is a sample:
Quote:
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
system A 2008_05_24:22: ...data...
system A 2008_05_24:22: ...data...
system B 2008_05_24:22: ...data...
system B 2008_05_24:22: ...data...
system A 2008_05_24:22: ...data...
system A 2008_05_24:21: ...data...
system A 2008_05_24:21: ...data...
system B 2008_05_24:21: ...data...
I want to insert a "-----" separator whenever there
is any difference in the first 22 characters of the
current and next line.

In other words, the above data file should look
like this:
Quote:
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
-----
system A 2008_05_24:22: ...data...
system A 2008_05_24:22: ...data...
-----
system B 2008_05_24:22: ...data...
system B 2008_05_24:22: ...data...
-----
system A 2008_05_24:22: ...data...
-----
system A 2008_05_24:21: ...data...
system A 2008_05_24:21: ...data...
-----
system B 2008_05_24:21: ...data...
I am not bash script expert, I need some help.

Thanks.
 
Old 05-25-2008, 05:11 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371Reputation: 2371
Hi,

Would this do:

Code:
#!/bin/bash

awk 'BEGIN { prevPart = "" }
{ 
currPart = substr($0,1,23)
if ( prevPart != currPart && NR != "1" ) { print "-----" }
print $0
prevPart = currPart }
' infile
A testrun with the provided data:
Code:
$ ./seperate.sh
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
system A 2008_05_24:23: ...data...
-----
system A 2008_05_24:22: ...data...
system A 2008_05_24:22: ...data...
-----
system B 2008_05_24:22: ...data...
system B 2008_05_24:22: ...data...
-----
system A 2008_05_24:22: ...data...
-----
system A 2008_05_24:21: ...data...
system A 2008_05_24:21: ...data...
-----
system B 2008_05_24:21: ...data...
$
The && NR != "1" part is there to supress the first ----- line. If anything is unclear, just ask.

Hope this helps.
 
Old 05-27-2008, 09:16 PM   #3
lothario
Member
 
Registered: Apr 2004
Posts: 340

Original Poster
Rep: Reputation: 30
Thanks. That was exactly what I needed.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Processing data from a 'foreign' database with mysql, or tools to pre-process data. linker3000 Linux - Software 1 08-14-2007 08:36 PM
processing data within files PirateJack Linux - Newbie 3 03-28-2006 10:32 AM
Data Processing joelhop Linux - General 8 01-01-2006 08:08 PM
need help on processing large data files eph Programming 3 03-11-2004 04:56 AM
Large data files on CD dema Linux - Newbie 1 01-26-2002 10:30 PM


All times are GMT -5. The time now is 04:34 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration