LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 10-28-2013, 10:38 AM   #1
matajaz
LQ Newbie
 
Registered: Oct 2013
Posts: 1

Rep: Reputation: Disabled
split a file and process resulting files in parallell


Hi,

I have huge (100 Gbytes) files which I need to post process after they have been generated.

I wonder if there is an easy way to split the file and then process each file coming form split file.
I mean doing it automatically on one command line without waiting for split output and then start the processing commands.

Here is an example where I want to split a file and the count number of McDonald words in each file.

split -b 1000m -a 3 sourcefile.txt resultfile | foreach splitted file do "grep -c McDonald"

I hope you understand what I want to do.

Br Mathias
PS. The file server is very fast so I do not expect IO Wait to be limiting factor.
 
Old 10-29-2013, 06:27 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,846

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
Code:
#step 1 split the file into smaller parts.
split ....
#step 2 run your grep on all the parts in the same time
for f in <list of splitted parts>
do 
grep -c McDonald > $f.count
done
#step 3 wait for the result
wait
#step 4 sum up the results
for f in <list of splitted parts>
do
SUM=$((SUM+`cat $f.count`))
done
# but probably it will run longer than a simple grep on the single input file.
this is not a runnable script, but a plan to implement it.
 
Old 10-29-2013, 01:18 PM   #3
smeezekitty
Senior Member
 
Registered: Sep 2009
Location: Washington U.S.
Distribution: M$ Windows / Debian / Ubuntu / DSL / many others
Posts: 2,339

Rep: Reputation: 231Reputation: 231Reputation: 231
That won't actually run in parallel because there is no "&"

One thing to watch out for is the unlikely but possible case where split will chop the desired word in two while will cause that instance to be missed.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Split large file into smaller files mikes88 Programming 29 03-22-2012 10:14 AM
split file into seperate files mzh Linux - General 2 08-10-2011 11:35 AM
How to split a file into multiple files using AWK? keenboy Linux - General 1 08-05-2010 01:18 PM
Split large file in several files using scripting (awk etc.) chipix Programming 14 10-29-2007 11:16 AM
How to split a file into more sub files michaelyu33 Linux - Software 9 04-15-2006 09:23 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 01:07 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration