LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 03-19-2016, 07:28 AM   #1
kubuntu-man
Member
 
Registered: Oct 2011
Posts: 36

Rep: Reputation: Disabled
Is there a non-perl replacement for shasum ?


I'm massively using shasums for most of my archived files (photos, videos from my action cam, ...). The shasums of my self-created files are all sha512sums.
But I also have some downloaded files that come with sha256sums.

Having a mix of sha256 and sha512 sums is why I need to use shasum (instead of the more specific commands) for regularly checking my files. Currently, all sha files are named e.g. my_biking_video.mp4.sha or sometimes shasums.txt for all shasums of one directory, so there is currently no way to auto-detect the format by filename.

Unfortunately, the general shasum command (that is able to autodetect the sha variant) is written in perl and extremely slow, compared to the native sha256 and sha512 commands. Especially on slower computers such as single board computers (SBCs).

I just did a test on my banana pi. Checking a directory with shasum took over 4 times longer than using sha512sum. This was in a subdirectory where I knew all files are sha512sums, so the test was possible here (but only here).
I want to build a small NAS with my banana pi, so this will become a common use-case for me.
On my desktop PC, shasum is still 25% slower than sha512sum. This is relevant, too: it makes a real diffenrence if a full check on an external backup disk takes 4 or 5 hours.

I'm using a script similar to
Code:
find -name shasums.txt -o -name "*.sha" | while read SUMFILE
do
  pushd "`dirname "$SUMFILE"`" >/dev/null
  shasum -c "`basename "$SUMFILE"`"
  popd >/dev/null
done
So is there a general shasum command that can autodetect the sha format but is not as slow as the perl command ?

The only alternative would be checking the format of all of my sha files and rename them to .sha256 respectively .sha512, but I really like to avoid that effort.
 
Old 03-21-2016, 07:44 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Pi OS & Android
Posts: 12,029

Rep: Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416
Have you tried for other utilities? There could be something in Python, awk, or some other scripting language, or code that you can write/employ.
 
Old 03-21-2016, 08:01 AM   #3
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 15,643

Rep: Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127
if you want to speed it up you can use awk/python/whatever language, but even this script can be tuned:
Code:
${SUMFILE##*/} # can be used instead of basename
${SUMFILE%/*}  # dirname
looks like you can decide based on the length of shasum itself (if it was 256 or 512)

Last edited by pan64; 03-21-2016 at 03:49 PM. Reason: typo
 
Old 03-21-2016, 03:27 PM   #4
kubuntu-man
Member
 
Registered: Oct 2011
Posts: 36

Original Poster
Rep: Reputation: Disabled
@business_kid
Yes, I also thought about checking the length of the chcksum by the script and then use the corresponding command.

@pan64
Thanks for the hint, I'm going to tune my script this way, although the basename and dirname calls are not the most time comsuming part. :-)
 
Old 03-21-2016, 03:50 PM   #5
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 15,643

Rep: Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127
you can post another parts too, probably we can give you additional hints. By the way, you can try to use more cores.
 
Old 04-07-2016, 08:05 PM   #6
kubuntu-man
Member
 
Registered: Oct 2011
Posts: 36

Original Poster
Rep: Reputation: Disabled
The most time consuming part is the checksum calculation, as I'm often using it for large files (> 1G). All the rest is only about finding the checksum files and formatting the output a little bit.
So, choosig the specific sha[X]sum instead of the perl version has by far the largest effect.

Parallelizing is a good idea, but I don't know how to do this in a bash script.
 
Old 04-08-2016, 02:24 AM   #7
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Pi OS & Android
Posts: 12,029

Rep: Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416
Parallelizing - untested
Code:
shaXsum <file1> & shaxsum <file2>
etc. Expand as needed.Or you could do something with 'for files in I do;' They would run together. Be careful you don't bring the box to it's knees on CPU or memory by overloading it, e.g. calculating 50 checksums simultaneously.
 
Old 04-08-2016, 02:55 AM   #8
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 15,643

Rep: Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127
Something like this will do the job:
Code:
N=10
cat filelist | xargs -n1 -P${N} shaXsum
N can be set at about the number of your cores, that will not really overload the system. Higher N will not give you faster execution. You can also try nice. In some cases it may help on overloading CPUs.
 
Old 05-10-2016, 06:01 PM   #9
kubuntu-man
Member
 
Registered: Oct 2011
Posts: 36

Original Poster
Rep: Reputation: Disabled
@pan64

Looks like it's worth a try. But another thing just came into my mind:
When I run the checksums sequentially, it produces a nice log. All of the shasum variants I know produce a line like
Code:
/path/to/filename Ok
other than most linux commands that say nothing when there is no error. I really appreciate that log.
With parallel processes, I don't know how to get the output without being messed up.
 
Old 05-10-2016, 06:14 PM   #10
keefaz
LQ Guru
 
Registered: Mar 2004
Distribution: Slackware
Posts: 6,230

Rep: Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724Reputation: 724
Do you have Digest::SHA perl module installed?
(check with 'perl -MDigest::SHA -e 1', if no output and $?=0 then it's installed)

I ask because I saw this comment in the code:
Code:
## Try to use Digest::SHA.  If not installed, use the slower
## but functionally equivalent Digest::SHA::PurePerl instead.
http://web.mit.edu/barnowl/arch/sun4x_511/bin/shasum
 
Old 05-11-2016, 02:52 AM   #11
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Pi OS & Android
Posts: 12,029

Rep: Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416Reputation: 1416
Also, generally on Perl modules, beware that some install to (From memory) /use/local/lib64/perl5 while the system is using somewhere else like /use/lib64/perl5. Perl only searches one directory. A quick move of the files and install of a symlink sorted that for me.
 
Old 05-11-2016, 03:48 AM   #12
sag47
Senior Member
 
Registered: Sep 2009
Location: Raleigh, NC
Distribution: Kubuntu x64, Raspbian, CentOS
Posts: 1,861
Blog Entries: 36

Rep: Reputation: 459Reputation: 459Reputation: 459Reputation: 459Reputation: 459
Why mix algorithms? Use either sha256 or sha512. I see no reason or sense to mix algorithms throughout your data.

If you still think you need to mix (I can't imagine why unless you explain) then you can use grep to filter the sum character lengths. Pipe the output to sha256/sha512 stdin. As you say the intense part is the actual summing so a script with grep isn't going to add much overhead.

As for downloaded files; verify the sha256 and then recalculate using sha512.

When I get around to it I can write a script that both supports parallel and the utilities. I still think you should recalculate and append.

Last edited by sag47; 05-11-2016 at 04:01 AM.
 
Old 05-11-2016, 03:54 AM   #13
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 15,643

Rep: Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127Reputation: 5127
with parallel processing you need to handle those messages (collect and print) I think. That means a little work in perl (for example)
 
Old 05-25-2016, 05:08 PM   #14
kubuntu-man
Member
 
Registered: Oct 2011
Posts: 36

Original Poster
Rep: Reputation: Disabled
@sag47
I don't mix algorithms when I create the sha sums. I always use sha512 sums. But if I download files, e.g. from apache.org, they come with sha256 sums.

Yes, I could check them and after that, throw away the sha256 sum and create my own sha512 sum. But that's additional work and it still can happen that I miss one, leaving me with unnecessary errors on the next check.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
perl, multiple string replacement qrange Linux - Software 2 11-04-2011 09:41 AM
Perl: Simple Replacement of values in XML talla Programming 3 05-11-2011 09:36 AM
Using dd and shasum: how can i know the two partitions are the same? tirengarfio Linux - General 1 02-04-2010 10:31 AM
Perl script reformat / replacement baidym Programming 3 12-21-2008 09:02 PM
Perl string replacement within an array? Seventh Programming 1 09-07-2004 03:50 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 01:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration