converting number representation

kshkid · 04-13-2008, 10:18 AM

Hi All,

Are there any modules available in perl

to convert numbers in the form of 125345 to 125,345

more like international formats with comma separated

Am sure there should be some perl modules available for that already.

Thanks for the pointers

taylor_venable · 04-13-2008, 10:23 AM

http://kobesearch.cpan.org/htdocs/Nu...er/Format.html

osor · 04-13-2008, 02:55 PM

In something that uses a POSIX-compatible printf(), you can specify the ' flag to use your locale’s thousand’s separator. For example, with the command-line printf utility:

Code:

$ printf "%'d\n" 125345
125,345

I am surprised that this is not part of Perl’s printf (even with “use POSIX qw(printf)”), since this has been in POSIX for at least 10 years.

makyo · 04-13-2008, 03:48 PM

Hi.

Quote:

Originally Posted by osor

In something that uses a POSIX-compatible printf(), you can specify the ' flag to use your locale’s thousand’s separator. For example, with the command-line printf utility:

I am surprised that this is not part of Perl’s printf (even with “use POSIX qw(printf)”), since this has been in POSIX for at least 10 years.

Yes, that does seem surprising, but the Camel book says that the system sprintf is emulated, not used.

For other solutions, searching for the string commify (or more rarely, commafy) and perl will yield a number of hits to help perform the task of inserting a comma every 3 digits ... cheers, makyo

kshkid · 04-14-2008, 12:27 AM

thank you very much for all the replies

here is an additional way of doing it

Code:

sub commify {
    my $text = reverse $_[0];
    $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
    return scalar reverse $text;
}

taylor_venable · 04-14-2008, 01:12 PM

This analysis is totally unnecessary, but the problem intrigues me, so here it goes.

That's clever, but I would think that would run slow. The bad part is the lookahead in the global regex. This is going to search through the string for the decimal point each time, which is unnecessary because once we've found it we shouldn't have to think about it anymore. I propose the following, which may net additional performance improvement because it involves only simple relationships like equivalence to specific characters, without having to rely on an overly complicated regex.

Code:

sub commify($) {
    my @decimal = split /\./, shift;
    my $length = length $decimal[0];
    for (my $x = 3; $x < $length; $x += 3) {
        substr $decimal[0], $length - $x, 0, ",";
    }
    return $decimal[0] . ((scalar @decimal != 1) ? ("." . $decimal[1]) : "");
}

This assumes correct number formatting to work without chopping off data, as "xyz.abc.def" would return "xyz.abc" -- but I would think also that Perl might optimize away the regex for a single-character, and turn it into a simple rindex() and substr(). If that's not the case, one can easily adopt this to use that method, rather than a call to split().

osor · 04-14-2008, 08:58 PM

Quote:

Originally Posted by taylor_venable

This analysis is totally unnecessary, but the problem intrigues me, so here it goes.

Well, now that you started it, your solution will not work if the number has a plus or minus sign. Also loop does not seem very “perlish”. For the integer part of the number (without possible sign), you can reverse the digits, “split it” into groups of three digits using unpack, join the resulting list with comma, and reverse to get the integer part back.

taylor_venable · 04-14-2008, 09:22 PM

Quote:

Originally Posted by osor

Well, now that you started it, your solution will not work if the number has a plus or minus sign. Also loop does not seem very “perlish”. For the integer part of the number (without possible sign), you can reverse the digits, “split it” into groups of three digits using unpack, join the resulting list with comma, and reverse to get the integer part back.

Ooh, quite right about that negative sign - that's a problem. Perlosity aside, my goal was to make something that was as fast as possible. Reversing a string is costly, so I was trying to avoid it since you'd have to do it twice. Unpack probably takes the cake for raw speed (I guess I'm not totally sure, I've not used it much; demo code would be awesome), since it works at the byte level, but here's my improved version, which should at least be better than my first one.

Code:

sub commify($) {
    my $number = shift;
    my $d = index $number, "."; $d = length $number if ($d == -1);
    for (my $x = $d - 3; $x > 0; $x -= 3) {
        if ((substr $number, $x - 1, 1) =~ m/[0-9]/) {
            substr $number, $x, 0, ",";
        }
    }
    return $number;
}

Most expensive operations are index() which is O(n) and the loop, which is O(n/3). Now, that part really hinges on how fast substr() is when doing an insertion. You could also speed things along by changing the regex match with [0-9] to be simply a comparison of the substr() against "-":

Code:

if ((substr $number, $x - 1, 1) ne "-") {

If you're using a number as the argument (since although you can specify a "+" in the Perl code, it gets discarded in the conversion to a string, so the first character of the string is always either a digit or the minus sign). The unpack() thing interests me, I'd be happy if you could show it.

osor · 04-14-2008, 09:55 PM

Quote:

Originally Posted by taylor_venable

The unpack() thing interests me, I'd be happy if you could show it.

Here is an unpack version without any reversing:

Code:

sub commify($) {
	my $num = shift;
	
	my @part = ($num =~ m/^(-?)(\d+)(.*)/);
	
	my $len = length($part[1]);
	my $d = $len % 3 if $len > 3;

	my $pre = $d ? "A$d" : "";

	$part[1] = join ",", unpack "$pre(A3)*", $part[1];

	local $" = "";
	return "@part";
}

It first strips out the integer part (as the second element of the array), and then does its work on that.

makyo · 04-15-2008, 12:03 PM

Hi.

J Friedl showed a version of commify that he said was a "full third faster" than the usual version. I added a few things, and made one correction:

Code:

sub commify {
  unless ( @_ == 1 ) {
    carp('Sub usage: $withcomma = commify($somenumber);');
    return undef;
  }

  my ($t1) = $_[0];

  # From Mastering Regular Expressions, page 292.
  # Corrected by adding "x": ">gx;".

  $t1 =~ s<
    (\d{1,3})       # before a comma, 1 to 3 digits
    (?=             # followed by, but not part of what's matched
        (?:\d\d\d)+ # some number of triplets
        (?!\d)      # not followed by another digit
    )               # (in other words, which ends the number)
  ><$1,>gx;

  return $t1;
}

In running these with a simple harness that prints a line before and after calling commify produces:

Code:

% ./taylor_commify data1
999
9,99

1000
10,00

12345
123,45

and

Code:

% ./osor_commify data1
999
999

1000
1,000

12345
12,345

and finally:

Code:

% ./friedl_commify data1
999
999

1000
1,000

12345
12,345

Apologies if I copied anything incorrectly. I have not tested the speed for comparison ... cheers, makyo

taylor_venable · 04-15-2008, 10:42 PM

Well, makyo, I'm not sure how you're running my last-posted version, but it seems to be coming out wrong!

Anyway, when I run it, the right answer is given, so I'm not sure what the problem is there... Still, thanks for finding that other implementation.

I went ahead and benchmarked these four versions (mine, osor's, kshkid's, and now Jeffrey Friedl's) and the results were ... interesting:

Code:

         Rate   osor friedl kshkid taylor
osor   3420/s     --   -19%   -35%   -55%
friedl 4202/s    23%     --   -20%   -45%
kshkid 5274/s    54%    26%     --   -30%
taylor 7576/s   122%    80%    44%     --

Not entirely expected, but given that my version is the closest to what you would write in a lower-level language like C, it seems there is little really to reject about it. Please, feel free to analyze or criticize: the code is available at http://real.metasyntax.net:2357/code/arch/commify.pl -- it runs each method over my chosen example data 25000 (or rather: 25,000) times. To ensure accuracy, you can uncomment the other lines to get the real output printed to your screen. Note that if you remove the additional overhead of the second function call to do_commify() (which is there so you can selectively show the result) there are some performance gains across the board, but nothing that actually changes the results.

Perl version details are found at http://real.metasyntax.net:2357/perl-version.txt

EDIT:
Oh, also I just noticed that the version of Friedl's method you supplied fails for numbers with fractional parts of more than three digits: -1234567.1234 => -1,234,567.1,234

makyo · 04-16-2008, 06:27 AM

Hi, taylor_venable.

Thanks for doing the speed comparison tests.

I recopied your code and tested it again. I got the same results, so I tried feeding your function a chomped line and it worked as desired, so the newline causes the result I posted.

Friedl mentions the "FAQ" method:

Code:

1 while s/^(-?\d+)(\d{3})/$1,$2/;

very concise, but can be improved (3%) simply with:

Code:

1 while s/^(-?\d+)(\d\d\d)/$1,$2/;

and, finally removing the anchor so allow entire strings:

Code:

1 while s/(\d+)(\d\d\d)/$1,$2/;

It was this code against which he tested for his "third faster". However, it will not handle strings with decimal points correctly.

He added a bit of code later in the discussion (for his faster version) to handle the problem of decimal points:

Code:

\G((?:^-)?\d{1,3})       # before a comma, 1 to 3 digits

The underlying theme of all of Friedl's code, of course, is regular expressions. I have the 1997 edition of MRE, and there is now a third edition available.

The code posted by kshkid corresponds to Recipe 2.17 from the Perl Cookbook (1999).

For one-off solutions, I like the final FAQ version, but if I had a lot of integers, I'd use the other solutions. If I had more general numbers, I'd need to use something other than the basic regular expression codes.

I have enjoyed this excursion into the problem. Thanks to everyone for their posts.

So, readers of this thread can make their choices based on speed, brevity, understandability, etc.

Best wishes ... cheers, makyo

taylor_venable · 04-16-2008, 06:40 AM

Quote:

Originally Posted by makyo

I recopied your code and tested it again. I got the same results, so I tried feeding your function a chomped line and it worked as desired, so the newline causes the result I posted.

Ah, OK. That makes sense; I wrote the code for my method assuming the input was a number type, not a string. Partly because I thought that's what was wanted, but mostly because it frees me from a lot of input validation problems. (For one, it simply allows me to assume the only sign character that can be present is a "-" since the "+" you might write in your source code gets dropped in the conversion from number to string.)

I would expect Friedl's method to be all about the regexen, as I would many Perl approaches to this particular problem, as it is fundamentally text manipulation. My point was just simply not to overdo it, as there are many problems where simpler methods are much faster and less complicated. Even still, excellent work, all.