LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Zero Padded Files (http://www.linuxquestions.org/questions/programming-9/zero-padded-files-4175448487/)

jroggow 02-03-2013 03:05 PM

Zero Padded Files
 
I've written a script (Bash 4.2.37) to convert a whole heap of audio files to a common format, and move them to a particular directory. I've done some limited testing and so far it works like a champ, but a couple of finer points have me scratching my head.

Specifically, this snippet:

Code:

#find greatest increment amongst files
files=(~/music/*)
greatest=$(basename ${files[-1]%.*})
((i=$greatest+3)) #Does arithmetic expression see -1 in array and evaluate mathematically?  Add three to increment by one. Weird.  Why is it subtracting twice?
#Is there an alternative?  Seems messy.  It's bugging me.
#Actually, this shouldn't work.  What is eating my leading zeros? #Aren't zero padded numbers considered octal by Bash?
echo "$greatest"
echo "$i"

My stream of conscious comments include my questions.

When I first ran the script, $greatest had a value of 0008 and $i a value of 9. The second go, $greatest was 0012, $i 13.

Which is as it should be. Except not if Bash treats numbers with leading zeros as octal.

The $greatest+3 bit seems to work just fine, but it irks me. I chased my own tail for an age trying to make something like (($greatest++)) work. It didn't.

PTrenholme 02-03-2013 03:12 PM

Well, you could remove the leading zeros with the ${greatest##0} couldn't you? That might help.

jroggow 02-03-2013 03:23 PM

That's the weird thing. I didn't remove the leading zeros, but they're gone when I use $i in my script.

Maybe I should clear the directory and start at 0. It was only a circumstance that 0008 was the largest number in the directory. Did Bash automatically determine that I wasn't using octal because 0008 would be invalid?

PTrenholme 02-03-2013 04:20 PM

I just took a quick look at pinfo bash, and was reminded that you can explicitly state your number base by preceding the number with a specific base, so, for example:
Code:

$ echo $((10#017)) " != " $((017))
17  !=  15

As to the leading zeros being gone, the arithmetic evaluation converts the number to an integer, and bash prints integers without leading zeros. If you want, say, a four digit number with leading zeros, try something like this:
Code:

$ for ((i=0;i<21;i=i+3)); do j="0000"${i};echo ${i} "=" ${j: -4};done
0 = 0000
3 = 0003
6 = 0006
9 = 0009
12 = 0012
15 = 0015
18 = 0018

Note that the space after the : in ${j: -4} is needed, since :- has another use.

<edit>
I forgot the printf command!:redface:
Code:

$ for ((i=0;i<=21;i=i+3)); do printf "%2d = %04d\n" ${i} ${i};done
 0 = 0000
 3 = 0003
 6 = 0006
 9 = 0009
12 = 0012
15 = 0015
18 = 0018
21 = 0021

</edit>

jroggow 02-03-2013 04:51 PM

That makes sense. Have I stumbled across a convenient shortcut or will this bite me in the ass?

Counting on my toes, 0012 should convert to 10 rather than 13. Since my script is returning 13 (the value I expect) for $i, it's making a sort of literal conversion rather than a mathematical conversion.

If I don't have to explicitly strip zero padding from the filenames, I will opt for the lazy solution and leave it as it stands.

ntubski 02-03-2013 05:38 PM

This doesn't work for me:

Code:

~/tmp/music$ ls
0001.music  0002.music  0003.music  0004.music  0005.music  0006.music  0007.music  0008.music
~/tmp/music$ files=(~/tmp/music/*)
~/tmp/music$ echo ${files[-1]}
/home/npostavs/tmp/music/0008.music
~/tmp/music$ echo $(basename ${files[-1]})
0008.music
~/tmp/music$ echo $(basename ${files[-1]%.*})
0008
~/tmp/music$ greatest=$(basename ${files[-1]%.*})
~/tmp/music$ ((i=$greatest+3))
bash: ((: i=0008: value too great for base (error token is "0008")

Do you have some strange characters in your filenames? locale setting?

jroggow 02-03-2013 06:16 PM

No, ntubski. That's why I don't particularly understand it.

Here's the code that names my files:

Code:

for file in ${files[@]}
do
        name="$i.mp3"
        num=$(expr length ${name%.*})
        #Zero-padding sorts file numerically.
        if (($num < 4)) #Four seems reasonable.  I don't anticipate having more than 10,000 files.
        then
                pad=$(head -c $num /dev/zero | tr '\0' '0')
        fi
        mv $file ~/music/$pad$name
        i=$(( $i+1 ))

There are no special characters or anything that I can identify. ls in my music directory shows 0000.mp3 0001.mp3 &c. . .

That's why I think it shouldn't work. My understanding is that it should have broken the first time I ran the script when I had 0008 files (which I put there manually).

ntubski 02-03-2013 06:57 PM

Quote:

Originally Posted by jroggow (Post 4883815)
Here's the code that names my files:

The padding calculation doesn't look right: wouldn't you get 01.mp3, 02.mp3, ..., 0010.mp3, 0011.mp3, ..., 000100.mp3, 000101.mp3, ... from that? Also, you don't need $ on variables within arithmetic expressions: i=$((i+1)) or just ((i++)).


Back to the original question, how about you isolate the problem: make a new script in an empty directory:
Code:

#!/bin/bash

# run this in a new directory

touch 000{1..8}.mp3

files=(*.mp3)
greatest=$(basename ${files[-1]%.*})
((i=$greatest+3))
echo "i = $i, greatest = $greatest"

My output for this is:
Code:

./next-name.bash: line 9: ((: i=0008: value too great for base (error token is "0008")
i = , greatest = 0008

What's your output?

jroggow 02-03-2013 07:29 PM

I get that same error running your script.
Code:

line 9: ((: i=0008: value too great for base (error token is "0008")
0008


jroggow 02-03-2013 07:40 PM

The padding should be fine. It counts the number of characters in the base filename and if less than four adds as many zeros as needed to make it four digits.

ntubski 02-03-2013 10:00 PM

Quote:

Originally Posted by jroggow (Post 4883838)
I get that same error running your script.
Code:

line 9: ((: i=0008: value too great for base (error token is "0008")
0008


Okay, now the question is what's the difference between that script and the code in your original post? Do you get the same result if you run it in the ~/music directory?


And the padding code is fine, I just had a temporary comprehension failure.

konsolebox 02-04-2013 07:23 AM

I think it's likely that your first code and the first output was different. Even if 0008 was not treated as octal "i" would not have a value of 9.

Code:

greatest=0008
(( i = $greatest + 3 )) # could be 11 but not 9


mina86 02-04-2013 08:02 AM

Quote:

Originally Posted by PTrenholme (Post 4883722)
Well, you could remove the leading zeros with the ${greatest##0} couldn't you? That might help.

This will remove only the first leading zero since “0” matches only that. “##” is of no help here. You'd have to do:
Code:

while [ x"$greatest" != x0 ] && [ x"$greatest" != x"${greatest#0}" ]; do
    greatest=${greatest#0}
done

Or if you don't like loops:
Code:

greatest=${greatest#"${greatest%%[1-9]*}"}
greatest=${greatest:-0}

Quote:

Originally Posted by ntubski (Post 4883828)
Also, you don't need $ on variables within arithmetic expressions: i=$((i+1)) or just ((i++)).

Nonetheless, it's more portable to use a dollar sign.

konsolebox 02-04-2013 08:34 AM

Quote:

Originally Posted by mina86 (Post 4884055)
Nonetheless, it's more portable to use a dollar sign.

Can I ask in what sense could it be more portable? What shell / version of Bash?

Also extended globbing would be a better solution for trimming leading zeros:
Code:

shopt -s extglob
... ${VAR##+(0)}

And I don't think portability with Bash < 3.0 would be necessary for that?

David the H. 02-04-2013 02:33 PM

As mentioned in passing by konsolebox, when a number has a leading zero, the shell treats it as an octal value. Any arithmetic operation on a number that includes an 8 or 9 will result in an error, and others will probably give you incorrect values.

Edit: :doh: I just noticed that the OP actually said it first. Oh well.

The best way to avoid this is to strip the leading zeroes off before doing any math, and only re-pad them when you really need it.

http://mywiki.wooledge.org/ArithmeticExpression
http://mywiki.wooledge.org/BashFAQ/018


The leading base string is another option instead of stripping them off, but you'll still have to worry about re-padding the results afterwards.


All times are GMT -5. The time now is 12:48 AM.