LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-05-2011, 05:27 PM   #1
vamsiv
LQ Newbie
 
Registered: Sep 2011
Posts: 5

Rep: Reputation: Disabled
Exclamation Perl Extracting 450000+ files in subdirs into single output file in current dir


Hi All

I have the following structure in a folder containing a very large chunk of data
Code:
Current Folder
    A lot of SubFolders (maybe 2000+)
       A lot more of Other Subfolders (maybe another 2000+) in each single "upper" folder from above
          About 500+ text files per folder from each above folder.
In total you're looking at about 460000(more or less) text files. I want to get certain data out of those files, and output it as required into a single output file in a single line per file.

i.e. My final output.txt file will have about 460000 entries of

Blah, blah, blah, blah (from file 1)
Blah, blah, blah, blah (from file 2)
........
Blah, blah, blah, blah (from the last file)

I have the following script which doesn't work as expected because it gives me a blank output.txt file. Not able to trace where and why it's going wrong. I am pretty new (actually first script) to Perl and would appreciate your help.

Code:
#!/usr/bin/perl -w
use File::Find;
use Digest::MD5 qw(md5 md5_hex md5_base64);


$dir = `pwd`;
chomp($dir);
#globals
@directories = ($dir);
@foundFiles = ();

print("======================\n");
print("searching...\n");
foreach my $d(@directories){
find( sub { push @foundFiles, $File::Find::name if(/\.txt/) }, @directories );
print("======================\n");
print("found " . $#foundFiles . " files\n");
print("======================\n");


open(OUT,">output.txt") or die "cant create output file";

#get files to update:
foreach my $f (@foundFiles) {

    open(F, $f) or die("WARNING: could not open $f\n");    
    
	$foundDate = 0;
	$foundTime = 0;
	$getInfo = 0;
	$i = 0;
	while($line = <F>) {
		
		next if($line =~ /^\s*$/);
		
		chomp($line);
		
		if($line =~ /\d{4}-\d{2}-\d{2}/) {
			#1992-09-01
			$date = $line;
			$foundDate++;
		} elsif ($line =~ /\d{2}:\d{2}:\d{2}/) {
			#10:59:32
			$time = $line;
			$foundTime++;
		}
		
		$i++ if($getInfo);
		$getInfo = 1 if($foundTime == 2 && $foundDate == 2);
		
		if ($i == 3) {
			#ENT
			$code = $line;
		} elsif ($i == 4) {
			#55
			$pages = $line;
		} elsif ($i == 6) {
			#S/Holder Details Change in Substantial (S.43)
			$info = $line;
		}
		
	}
	#ENT,1992-09-01,10:59:32,55, S/Holder Details Change in Substantial (S.43) 
	print OUT "$code|$date|$time|$pages|$info\n" if($code && $code =~ /\d{3}/);
	
	#system("db insert something something");
	
	close(F);
}
	
}

close(OUT);

Last edited by vamsiv; 10-05-2011 at 05:29 PM. Reason: make it clearer to show subdirectory tree
 
Old 10-05-2011, 08:27 PM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,362

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
A few notes that will help

1. Add 'use strict;' thus
Code:
#!/usr/bin/perl -w
use File::Find;
use Digest::MD5 qw(md5 md5_hex md5_base64);
use strict;
2. to check code (syntax/declarations) without running it use
Code:
perl -wc myprog.pl
3. Try this to get curr dir
Code:
use Cwd;  # Get current full dir for error msgs
$curr_dir = cwd();
4. I usually use a few judiciously placed print cmds to see what the prog is doing. Setup a test dir with just the min num of sub-dirs and files eg 3 dirs at each level and 3 files in bottom level; easier to follow.

5. Good Perl docs
http://perldoc.perl.org/
http://www.perlmonks.org/?node=Tutorials


6. are these lines in the wrong order ?
Code:
$i++ if($getInfo);
$getInfo = 1 if($foundTime == 2 && $foundDate == 2);
7. use 3 arg open with full err msg
Code:
open(OUT,">output.txt") or die "cant create output file";

# becomes
$out_file="output.txt";
open(OUT, ">", $out_file ) or die "Can't open $out_file: $!\n";

.
.
.

close(OUT) or die "Can't close $out_file: $!\n";
HTH & Welcome to LQ
 
Old 10-05-2011, 08:39 PM   #3
Tinkster
Moderator
 
Registered: Apr 2002
Location: earth
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928Reputation: 928
Moved: This thread is more suitable in <PROGRAMMING> and has been moved accordingly to help your thread/question get the exposure it deserves.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Read file names in a Dir and its Sub Dir in Perl sagarkha Programming 2 04-28-2010 04:58 PM
splitting a file into several output files - perl/sed john dixon Programming 5 12-01-2009 11:17 PM
AWK/Perl for extracting data from txt file to numerous other files briana.paige Linux - Newbie 2 05-05-2009 09:53 AM
Copy files in subdirs to one dir Rotwang Linux - General 4 12-14-2007 01:30 PM
Split a large file and get the names of output files using Perl Sherlock Programming 25 02-02-2007 12:43 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:58 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration