LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Is there a non-perl replacement for shasum ? (https://www.linuxquestions.org/questions/linux-software-2/is-there-a-non-perl-replacement-for-shasum-4175575342/)

kubuntu-man 03-19-2016 07:28 AM

Is there a non-perl replacement for shasum ?
 
I'm massively using shasums for most of my archived files (photos, videos from my action cam, ...). The shasums of my self-created files are all sha512sums.
But I also have some downloaded files that come with sha256sums.

Having a mix of sha256 and sha512 sums is why I need to use shasum (instead of the more specific commands) for regularly checking my files. Currently, all sha files are named e.g. my_biking_video.mp4.sha or sometimes shasums.txt for all shasums of one directory, so there is currently no way to auto-detect the format by filename.

Unfortunately, the general shasum command (that is able to autodetect the sha variant) is written in perl and extremely slow, compared to the native sha256 and sha512 commands. Especially on slower computers such as single board computers (SBCs).

I just did a test on my banana pi. Checking a directory with shasum took over 4 times longer than using sha512sum. This was in a subdirectory where I knew all files are sha512sums, so the test was possible here (but only here).
I want to build a small NAS with my banana pi, so this will become a common use-case for me.
On my desktop PC, shasum is still 25% slower than sha512sum. This is relevant, too: it makes a real diffenrence if a full check on an external backup disk takes 4 or 5 hours.

I'm using a script similar to
Code:

find -name shasums.txt -o -name "*.sha" | while read SUMFILE
do
  pushd "`dirname "$SUMFILE"`" >/dev/null
  shasum -c "`basename "$SUMFILE"`"
  popd >/dev/null
done

So is there a general shasum command that can autodetect the sha format but is not as slow as the perl command ?

The only alternative would be checking the format of all of my sha files and rename them to .sha256 respectively .sha512, but I really like to avoid that effort.

business_kid 03-21-2016 07:44 AM

Have you tried for other utilities? There could be something in Python, awk, or some other scripting language, or code that you can write/employ.

pan64 03-21-2016 08:01 AM

if you want to speed it up you can use awk/python/whatever language, but even this script can be tuned:
Code:

${SUMFILE##*/} # can be used instead of basename
${SUMFILE%/*}  # dirname

looks like you can decide based on the length of shasum itself (if it was 256 or 512)

kubuntu-man 03-21-2016 03:27 PM

@business_kid
Yes, I also thought about checking the length of the chcksum by the script and then use the corresponding command.

@pan64
Thanks for the hint, I'm going to tune my script this way, although the basename and dirname calls are not the most time comsuming part. :-)

pan64 03-21-2016 03:50 PM

you can post another parts too, probably we can give you additional hints. By the way, you can try to use more cores.

kubuntu-man 04-07-2016 08:05 PM

The most time consuming part is the checksum calculation, as I'm often using it for large files (> 1G). All the rest is only about finding the checksum files and formatting the output a little bit.
So, choosig the specific sha[X]sum instead of the perl version has by far the largest effect.

Parallelizing is a good idea, but I don't know how to do this in a bash script.

business_kid 04-08-2016 02:24 AM

Parallelizing - untested
Code:

shaXsum <file1> & shaxsum <file2>
etc. Expand as needed.Or you could do something with 'for files in I do;' They would run together. Be careful you don't bring the box to it's knees on CPU or memory by overloading it, e.g. calculating 50 checksums simultaneously.

pan64 04-08-2016 02:55 AM

Something like this will do the job:
Code:

N=10
cat filelist | xargs -n1 -P${N} shaXsum

N can be set at about the number of your cores, that will not really overload the system. Higher N will not give you faster execution. You can also try nice. In some cases it may help on overloading CPUs.

kubuntu-man 05-10-2016 06:01 PM

@pan64

Looks like it's worth a try. But another thing just came into my mind:
When I run the checksums sequentially, it produces a nice log. All of the shasum variants I know produce a line like
Code:

/path/to/filename Ok
other than most linux commands that say nothing when there is no error. I really appreciate that log.
With parallel processes, I don't know how to get the output without being messed up.

keefaz 05-10-2016 06:14 PM

Do you have Digest::SHA perl module installed?
(check with 'perl -MDigest::SHA -e 1', if no output and $?=0 then it's installed)

I ask because I saw this comment in the code:
Code:

## Try to use Digest::SHA.  If not installed, use the slower
## but functionally equivalent Digest::SHA::PurePerl instead.

http://web.mit.edu/barnowl/arch/sun4x_511/bin/shasum

business_kid 05-11-2016 02:52 AM

Also, generally on Perl modules, beware that some install to (From memory) /use/local/lib64/perl5 while the system is using somewhere else like /use/lib64/perl5. Perl only searches one directory. A quick move of the files and install of a symlink sorted that for me.

sag47 05-11-2016 03:48 AM

Why mix algorithms? Use either sha256 or sha512. I see no reason or sense to mix algorithms throughout your data.

If you still think you need to mix (I can't imagine why unless you explain) then you can use grep to filter the sum character lengths. Pipe the output to sha256/sha512 stdin. As you say the intense part is the actual summing so a script with grep isn't going to add much overhead.

As for downloaded files; verify the sha256 and then recalculate using sha512.

When I get around to it I can write a script that both supports parallel and the utilities. I still think you should recalculate and append.

pan64 05-11-2016 03:54 AM

with parallel processing you need to handle those messages (collect and print) I think. That means a little work in perl (for example)

kubuntu-man 05-25-2016 05:08 PM

@sag47
I don't mix algorithms when I create the sha sums. I always use sha512 sums. But if I download files, e.g. from apache.org, they come with sha256 sums.

Yes, I could check them and after that, throw away the sha256 sum and create my own sha512 sum. But that's additional work and it still can happen that I miss one, leaving me with unnecessary errors on the next check.


All times are GMT -5. The time now is 10:11 AM.