comparing quality of sound files from the shell ( using kurtosis )
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
comparing quality of sound files from the shell ( using kurtosis )
"comparing audio files using a remote ssh session, using just the shell, without speakers, and without downloading of images or the media files"
This is what forced me to think about a way to achieve this using one of the topics from a statistics class... 20 years ago.
many of audio encoders limit ( or remove ) presence of frequencies above 14-16 kHz to shrink the encoded file size.
Analyzing flac, wav, mp3, ogg/vorbis, aac "families" of codecs using a graphical representation ( spectrogram ) confirms lowering the quality ( read: amount of information ) by removing certain types of sounds, or even replacing them with an "alternative", which shows to be more efficient in space consumption, eg. AAC-LC, and tricking the human ear.
- I use 'sox' to draw/bild spectrograms for frequencies above 16kHz in sound files I want to compare.
- then I run 'identify' ( from ImageMagick package ) to calculate the kurtosis value on each audio file.
- a file that shows a lower kurtosis value ( more even distribution ) in hundreds of cases I tested, results in higher presence of tones above 16kHz, which I understand as a higher quality.
It is not unusual that interpretation of kurtosis varies, as well as not agreeing with statement, that higher quality samples show a higher presence of high frequency tones ( in my opinion that is true for most music samples, unless there was an artificial amplification of high tones applied, to "trick" the human ear ).
the attached png's show 'sox' results on an libmp3lame encoded files ( variable ( aq1 & aq2 ) ( 224 kbps & 192 kbps ) and the flac.
The script I wrote for purpose of comparing the original and the encoded result, to check/see the loss after the encoding. It can be used to compare samples of the same content, or between different audio formats ( ogg/aac...) .
Code:
#!/bin/bash
# sound_q-compare script
# idea and approach by Mike Fiedler (2014)
# GPL/LGPL
if [ ! -f "$1" ] || [ ! -f "$2" ]; then
echo "SRC/DST???"
exit; fi
sox "$1" -n sinc 16k-0k spectrogram -m -l -z 100 -o /tmp/1spectro.png
KURT1=$(identify -verbose /tmp/1spectro.png | grep kurtosis | sed 's/^.*sis:\ //')
sox "$2" -n sinc 16k-0k spectrogram -m -l -z 100 -o /tmp/2spectro.png
KURT2=$(identify -verbose /tmp/2spectro.png | grep kurtosis | sed 's/^.*sis:\ //')
echo "SRC: $KURT1"
echo "DST: $KURT2"
COMPARE=$( echo "$KURT1 > $KURT2" | bc )
if [ $COMPARE -gt 0 ]; then
echo "WTF - DST higher Q then SRC ?!?!?"
elif [ $COMPARE -eq 0 ]; then
COMPARE2=$( echo "$KURT1 == $KURT2" | bc )
if [ $COMPARE2 -eq 1 ]; then
echo "DST same Q"
else
echo "DST lower Q"
fi
fi
echo "scale=5; $KURT1/$KURT2*100-100" | bc -l | printf "diff: `sed -e 's/00$//' -e 's/-\./-0\./'` %%\n"
the script produces a following output: ( comparing q1 and q2 variable bitrate libmp3lame )
ps: personally, I went with OGG, q6 setting - veeery nice
why?
encoding a high quality FLAC sample to q6 OGG ( libvorbis ) ( ~18 MB ) and comparing it to a vbr 256kbps mp3 ( lame ) ( ~20MB ) produces this output from my sound_q-compare approach:
Hello Paziulek, would you please be so kind as to send this file via email to Jeremy so we can incude this in www.linuxquestions.org/linux/answers/? Your post really deserves a permanent place there!
after the kurtosis attempt to compare the quality of two samples, I added "white balance" as another compare method.
It represents the space not used by the spectrogram, the background, #FFFFFF .
I assumed, that the #FFFFFF ( white ) area present in the output PNG files generated by 'sox' equals to lack of presence of sound ( or noise ). The lower quality of the sample, the more #FFFFFF area is present in the PNG images. By comparing the values of #FFFFFF for both samples, it is possible to determine which one is more "rich" in tones between 16 and 21kHz.
for the images I used a source file which is a FLAC ~1050 kbit/s, 44.1kHz, encoded to ogg (q6) and mp3 (q0 and 192kbit/s constant )
here are the results:
flac:
192k mp3 constant br:
ogg q6 var:
mp3 q0 var:
sample output when comparing 2 samples, a 192 constant and q0 mp3:
Code:
sample1: 07.flac.mp3 - 11376 KB
sample2: 07.flac.q0.mp3 - 15616 KB
### KURTOSIS ###
sample1: 4.98912
sample2: 1.60372
diff: 211.096 %
### WHITE BALANCE ###
sample1: 349014
sample2: 306595
2nd sample has a higher Q by : 13.800%
same, but OGG q6 and mp3 q0:
Code:
sample1: 07.flac.ogg - 12736 KB
sample2: 07.flac.q0.mp3 - 15616 KB
### KURTOSIS ###
sample1: 1.62902
sample2: 1.60372
diff: 1.577 %
### WHITE BALANCE ###
sample1: 306365
sample2: 306595
1st sample has a higher Q by : 0%
( false result possible* )
looks like both samples have ABOUT THE SAME Q )
and the code:
Code:
#!/bin/bash
# by mike fiedler
# sound.q_compare.wb
# GPL/LGPL
if [ ! -f "$1" ] || [ ! -f "$2" ]; then
echo " $0 SAMPLE1 SAMPLE2 "
exit; fi
echo
echo "sample1: $(basename "$1") - $(ls -sk "$1" | cut -d" " -f1 ) KB"
echo "sample2: $(basename "$2") - $(ls -sk "$2" | cut -d" " -f1 ) KB"
echo
sox "$1" -n sinc 16k-21k spectrogram -m -l -z 100 -o /tmp/1spectro.png
sox "$2" -n sinc 16k-21k spectrogram -m -l -z 100 -o /tmp/2spectro.png
identify -verbose /tmp/1spectro.png > /tmp/1spectro.out
identify -verbose /tmp/2spectro.png > /tmp/2spectro.out
KURT1=$( cat /tmp/1spectro.out | grep kurtosis | sed 's/^.*sis:\ //')
KURT2=$( cat /tmp/2spectro.out | grep kurtosis | sed 's/^.*sis:\ //')
echo "### KURTOSIS ###"
echo "sample1: $KURT1"
echo "sample2: $KURT2"
echo "scale=5; $KURT1/$KURT2*100-100" | bc -l | printf "diff: `sed -e 's/00$//' -e 's/-\./-0\./'` %%\n"
echo -e "\n### WHITE BALANCE ###"
WBAL1=$(cat /tmp/1spectro.out | grep -m 1 \#FFFFFF | cut -d":" -f1 | tr -d " ")
WBAL2=$(cat /tmp/2spectro.out | grep -m 1 \#FFFFFF | cut -d":" -f1 | tr -d " ")
echo "sample1: $WBAL1"
echo "sample2: $WBAL2"
DIFF1=$( echo "scale=3;$WBAL2/$WBAL1*100-100" | bc )
DIFF2=$( echo "scale=3;$WBAL1/$WBAL2*100-100" | bc )
if [ "$WBAL1" -lt "$WBAL2" ]; then
echo "1st sample has a higher Q by : $DIFF1%"
elif [ "$WBAL1" -gt "$WBAL2" ]; then
echo "2nd sample has a higher Q by : $DIFF2%"
fi
DIFF1=$( echo "scale=2;$WBAL2/$WBAL1*100-100" | bc )
DIFF2=$( echo "scale=2;$WBAL1/$WBAL2*100-100" | bc )
if [ $WBAL1 -eq $WBAL2 ]; then
echo "looks like the same file? or a lossless copy?"
elif [ "$DIFF1" == "0" ] || [ "$DIFF2" == "0" ]; then
echo -e "( false result possible* ) \nlooks like both samples have ABOUT THE SAME Q ) "
fi
- improved accuracy by selecting specific area of the spectrograms, extending the image for both channels.
- fixed arithmetic comparison of negative float results in kurtosis
Code:
#!/bin/bash
SPECPNG1=/tmp/spectro1.png
SPECPNG2=/tmp/spectro2.png
if [ ! -f "$1" ] || [ ! -f "$2" ]; then
echo "usage: $0 SAMPLE1 SAMPLE2"
exit; fi
echo
echo "sample1: $(basename "$1") - $(ls -sk "$1" | cut -d" " -f1 ) KB"
echo "sample2: $(basename "$2") - $(ls -sk "$2" | cut -d" " -f1 ) KB"
echo
sox "$1" -n sinc 16k-21k spectrogram -x 5000 -m -l -z 100 -o $SPECPNG1
sox "$2" -n sinc 16k-21k spectrogram -x 5000 -m -l -z 100 -o $SPECPNG2
convert $SPECPNG1 -crop 5141x140-100+30 $SPECPNG1.1.png
convert $SPECPNG1 -crop 5141x140-100+290 $SPECPNG1.2.png
convert $SPECPNG1.1.png $SPECPNG1.2.png -append $SPECPNG1
convert $SPECPNG2 -crop 5141x140-100+30 $SPECPNG2.1.png
convert $SPECPNG2 -crop 5141x140-100+290 $SPECPNG2.2.png
convert $SPECPNG2.1.png $SPECPNG2.2.png -append $SPECPNG2
identify -verbose $SPECPNG1 > $SPECPNG1.out
identify -verbose $SPECPNG2 > $SPECPNG2.out
KURT1=$( cat $SPECPNG1.out | grep kurtosis | sed 's/^.*sis:\ //')
KURT2=$( cat $SPECPNG2.out | grep kurtosis | sed 's/^.*sis:\ //')
echo "### KURTOSIS ###"
echo "sample1: $KURT1"
echo "sample2: $KURT2"
if [ $(echo $KURT1'<'$KURT2 | bc -l) -gt 0 ]; then
echo -n "sample1 higher Q"
KURT=$(echo "scale=5; $KURT1/$KURT2*100-100" | bc -l | sed -e 's/00$//' -e 's/-\./-0\./')
elif [ $(echo $KURT1'>'$KURT2 | bc -l) -gt 0 ]; then
echo -n "sample2 higher Q"
KURT=$(echo "scale=5; $KURT2/$KURT1*100-100" | bc -l | sed -e 's/00$//' -e 's/-\./-0\./')
else
echo -n "both samples seem to have the same Q"
fi
echo " by: ${KURT#-}%"
echo -e "\n### WHITE BALANCE ###"
WBAL1=$(cat $SPECPNG1.out | grep -m 1 \#FFFFFF | cut -d":" -f1 | tr -d " ")
WBAL2=$(cat $SPECPNG2.out | grep -m 1 \#FFFFFF | cut -d":" -f1 | tr -d " ")
echo "sample1: $WBAL1"
echo "sample2: $WBAL2"
DIFF1=$( echo "scale=3;$WBAL2/$WBAL1*100-100" | bc )
DIFF2=$( echo "scale=3;$WBAL1/$WBAL2*100-100" | bc )
if [ "$WBAL1" -lt "$WBAL2" ]; then
echo "sample1 higher Q by: $DIFF1%"
elif [ "$WBAL1" -gt "$WBAL2" ]; then
echo "sample2 higher Q by: $DIFF2%"
fi
if [ $WBAL1 -eq $WBAL2 ]; then
echo "looks like the same file? or a lossless copy?"
elif [ "$DIFF1" == "0" ] || [ "$DIFF2" == "0" ]; then
echo -e "( false result possible* ) \nlooks like both samples have ABOUT THE SAME Q ) "
fi
If you think this is anything useful, please let me know... otherwise I will just stop adding to this thread...
Regarding the sound, I wonder about using the kurtosis instead of skewness. I suspect, but have no idea let alone proof, that the curve is skewed towards lower frequencies in an effort to "pump up the bass", which would throw out the kurtosis analysis as an effect of quality (instead of an effect of variance of frequency). I wonder if you have tried a multi-factor analysis using both kurtosis and skewness.
I detest modern mixing, by which I mean anything from the late-90s on. I have a good quality system and CDs, which nowadays typically are mixed for use in a car, are so bass-heavy that they are not listenable.
It is a bit of coincidence that I looked at kurtosis for this purpose, since in the past I worked with a large set of images,
where I needed to get the most properties I could get out of the bitmap, plus I found statistics very interesting since the day I found a world atlas on a shelf.
Depending on the source of data, and the target you want to prove, show or expose, statistics ( and kurtosis ) can simply lie.
eg. if you change the specifications of the PNG generated ( source of data ) the results will vary, and becoming often an opposite to what expected.
Before I had anything ready that I could consider representation of sound quality, I looked at a number of sample results, and for one or the other reason skewness was not near the results of kurtosis,
This was sometime last year, when I felt the need to compare sound samples, in a simplest way possible, using just a "dumb terminal".
Analyzing the full spectrum will give you different results - I decided to go with above 16 khz, after lossy re-encoding, either some or most disappear - when looking at the spectrogram,
on lower quality samples, the diversity between peaks and throughs is larger, due to lack of lower intensity tones, removed or lost during the lossy encoding. This ( what I think ) is a right place
for analysis using kurtosis.
Now, since the "white balance" (assuming) serves as a confirmation of the kurtosis results, adding skewness into the game might be a perfect idea.
I also have to experiment with analyzing separate channels, and merging the results.
Thank you for your reply, and your thoughts padeen.
edit: as for the low frequencies, I will look into this.
few changes:
-converted the grayscale to black which "sharpened" the results
-comparing the space occupied by the spectrogram instead of the "white" background
example outputs:
decreasing the BASS by 10dB on sample2:
Code:
### 15-300Hz ###
### KURTOSIS ###
sample1: -1.20279
sample2: -0.626176
sample1 more BASS by: 92.084%
### BLACK BALANCE ###
sample1: 209021 sample2: 174452
sample1 more BASS by: 16.600%
sample compare of 128 and 192kbit respectively, mp3 cbr:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.