LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-21-2019, 07:54 AM   #1
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,897

Rep: Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318
perl md5sum big file segfaults


Code:
#!/usr/bin/perl

strict;
use Digest::MD5 qw(md5_hex);
use Digest::SHA1 qw(sha1_hex);

my $content = "";

open(my $fh, '<:raw', "/tmp/a.tgz" ) or die "cannot open file $filename";
{
    local $/;
    $content = <$fh>;
}
close($fh);

print length($content) . "\n";

# Check that sha1 and md5 are correct
my $md5Value=md5_hex($content);
print "md5sum is ok\n" if $debug;
my $sha1Value=sha1_hex($content);
print "sha1sum is ok\n" if $debug;
this code segfaults if /tmp/a.tgz is big enough (3 GB).
Do you know how can I handle it?
 
Old 06-21-2019, 08:45 AM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,143

Rep: Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264Reputation: 1264
Is your Perl 64-bit? How much memory does your computer have?
 
Old 06-21-2019, 08:49 AM   #3
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,897

Original Poster
Rep: Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318
this was occurred on several different hosts with huge amount of ram (like 256 GB).
This is perl 5, version 16, subversion 3 (v5.16.3) built for x86_64-linux-thread-multi
 
Old 06-21-2019, 08:50 AM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,652

Rep: Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970Reputation: 7970
Quote:
Originally Posted by pan64 View Post
Code:
#!/usr/bin/perl
strict;
use Digest::MD5 qw(md5_hex);
use Digest::SHA1 qw(sha1_hex);

my $content = "";

open(my $fh, '<:raw', "/tmp/a.tgz" ) or die "cannot open file $filename";
{
    local $/;
    $content = <$fh>;
}
close($fh);

print length($content) . "\n";

# Check that sha1 and md5 are correct
my $md5Value=md5_hex($content);
print "md5sum is ok\n" if $debug;
my $sha1Value=sha1_hex($content);
print "sha1sum is ok\n" if $debug;
this code segfaults if /tmp/a.tgz is big enough (3 GB). Do you know how can I handle it?
That would be an expensive (memory wise) thing to do, reading the entire huge file handle at once. I'd personally read it in chunks.
Code:
my $MD5SUM = do {
    open(my $fh, '<:raw', "/tmp/a.tgz" ) or die "cannot open file $filename";
    my $MDChunk = Digest::MD5->new;
    local $/ = \131072; # Define a 128k chunk
    while (<$fh>) {
        $MDChunk->add($_);
    }
    $MDChunk->hexdigest;
};
<...rest of code....>
..would process it one 128k chunk at a time. Totally untested. Just a thought.
 
Old 06-21-2019, 08:53 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,897

Original Poster
Rep: Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318
yes, just the file is downloaded from the net using something like this:
Code:
        my $ua = LWP::UserAgent->new(
            ssl_opts => { SSL_verify_mode => 'SSL_VERIFY_NONE'},
            );
        my $req = HTTP::Request->new( GET => "$url" );
        $req->authorization_basic( "$User", "$Password" );
        my $response = $ua->request($req);
        $content = $response->content;
So I have no really chance to do that.
Downloading works.

Last edited by pan64; 06-21-2019 at 08:55 AM.
 
Old 06-26-2019, 08:59 AM   #6
Guttorm
Senior Member
 
Registered: Dec 2003
Location: Trondheim, Norway
Distribution: Debian and Ubuntu
Posts: 1,453

Rep: Reputation: 447Reputation: 447Reputation: 447Reputation: 447Reputation: 447
Hi

I really suck at Perl, but there is a way to set up a callback with HTTP::Request, so you can process the data "chunk by chunk".

https://stackoverflow.com/questions/...-http-resource

If you can use shell commands, its a lot easier. Something like this will pipe the downloaded data to md5sum, and uses a lot less memory:

Code:
wget -q -O - http://example.org | md5sum
 
Old 06-27-2019, 02:06 AM   #7
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,897

Original Poster
Rep: Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318Reputation: 7318
yes, you are almost right. There are workarounds, now I'm forced to "find another way".
I need to download a file and check the md5sum/sha1sum and the perl script can do that (not only this, but even more - like find out the name and location of the file and other special things). It works perfectly, the only problem is when the file itself is too big.

The current workaround is to save the file and run checksum calculation on file.

But this thread is about the segfault of *_hex calls which looks like a bug.
 
Old 06-27-2019, 04:15 AM   #8
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,868
Blog Entries: 1

Rep: Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869Reputation: 1869
You don't have to process every data in one step; example:
Code:
use Digest::MD5;

sub DoOne {
    my ($filename, $fh, $block, $blocklen, $ctx, $digest);
    $filename=$_[0];

    open $fh, '<', $filename or return '';
    binmode $fh;

    $ctx = Digest::MD5->new;

    do {
	$blocklen= read $fh, $block, 1024;
	if ($blocklen>0) {
	    $ctx->add ($block);
	}
    } while ($blocklen>0);
    close $fh;

    $digest = $ctx->hexdigest;

    return $digest;
}
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] MD5Sum - How do I compute the MD5Sum of a downloaded iso? kd6tzf Linux - Newbie 13 01-29-2015 12:15 PM
Big O, Big Omega, and Big Theta oulevon Programming 7 05-26-2010 07:18 AM
How BIG are "Big files" in "XFS is the best with big files"? ingerudo Linux - General 1 09-11-2009 08:36 AM
LXer: Why Big Compute and Big Storage will meet Big Pipe at the Last Mile LXer Syndicated Linux News 0 12-23-2007 01:20 PM
perl(Cwd) perl(File::Basename) perl(File::Copy) perl(strict)....What are those? Baldorg Linux - Software 1 11-09-2003 08:09 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration