LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 04-13-2008, 11:18 AM   #1
kshkid
Member
 
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383

Rep: Reputation: 30
converting number representation


Hi All,

Are there any modules available in perl

to convert numbers in the form of 125345 to 125,345

more like international formats with comma separated

Am sure there should be some perl modules available for that already.

Thanks for the pointers

 
Old 04-13-2008, 11:23 AM   #2
taylor_venable
Member
 
Registered: Jun 2005
Location: Indiana, USA
Distribution: OpenBSD, Ubuntu
Posts: 892

Rep: Reputation: 41
http://kobesearch.cpan.org/htdocs/Nu...er/Format.html
 
Old 04-13-2008, 03:55 PM   #3
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
In something that uses a POSIX-compatible printf(), you can specify the ' flag to use your locale’s thousand’s separator. For example, with the command-line printf utility:
Code:
$ printf "%'d\n" 125345
125,345
I am surprised that this is not part of Perl’s printf (even with “use POSIX qw(printf)”), since this has been in POSIX for at least 10 years.
 
Old 04-13-2008, 04:48 PM   #4
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Hi.
Quote:
Originally Posted by osor View Post
In something that uses a POSIX-compatible printf(), you can specify the ' flag to use your locale’s thousand’s separator. For example, with the command-line printf utility:

I am surprised that this is not part of Perl’s printf (even with “use POSIX qw(printf)”), since this has been in POSIX for at least 10 years.
Yes, that does seem surprising, but the Camel book says that the system sprintf is emulated, not used.

For other solutions, searching for the string commify (or more rarely, commafy) and perl will yield a number of hits to help perform the task of inserting a comma every 3 digits ... cheers, makyo
 
Old 04-14-2008, 01:27 AM   #5
kshkid
Member
 
Registered: Dec 2005
Distribution: RHEL3, FC3
Posts: 383

Original Poster
Rep: Reputation: 30
thank you very much for all the replies

here is an additional way of doing it

Code:
sub commify {
    my $text = reverse $_[0];
    $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
    return scalar reverse $text;
}
 
Old 04-14-2008, 02:12 PM   #6
taylor_venable
Member
 
Registered: Jun 2005
Location: Indiana, USA
Distribution: OpenBSD, Ubuntu
Posts: 892

Rep: Reputation: 41
This analysis is totally unnecessary, but the problem intrigues me, so here it goes.

That's clever, but I would think that would run slow. The bad part is the lookahead in the global regex. This is going to search through the string for the decimal point each time, which is unnecessary because once we've found it we shouldn't have to think about it anymore. I propose the following, which may net additional performance improvement because it involves only simple relationships like equivalence to specific characters, without having to rely on an overly complicated regex.
Code:
sub commify($) {
    my @decimal = split /\./, shift;
    my $length = length $decimal[0];
    for (my $x = 3; $x < $length; $x += 3) {
        substr $decimal[0], $length - $x, 0, ",";
    }
    return $decimal[0] . ((scalar @decimal != 1) ? ("." . $decimal[1]) : "");
}
This assumes correct number formatting to work without chopping off data, as "xyz.abc.def" would return "xyz.abc" -- but I would think also that Perl might optimize away the regex for a single-character, and turn it into a simple rindex() and substr(). If that's not the case, one can easily adopt this to use that method, rather than a call to split().
 
Old 04-14-2008, 09:58 PM   #7
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
Quote:
Originally Posted by taylor_venable View Post
This analysis is totally unnecessary, but the problem intrigues me, so here it goes.
Well, now that you started it, your solution will not work if the number has a plus or minus sign. Also loop does not seem very “perlish”. For the integer part of the number (without possible sign), you can reverse the digits, “split it” into groups of three digits using unpack, join the resulting list with comma, and reverse to get the integer part back.
 
Old 04-14-2008, 10:22 PM   #8
taylor_venable
Member
 
Registered: Jun 2005
Location: Indiana, USA
Distribution: OpenBSD, Ubuntu
Posts: 892

Rep: Reputation: 41
Quote:
Originally Posted by osor View Post
Well, now that you started it, your solution will not work if the number has a plus or minus sign. Also loop does not seem very “perlish”. For the integer part of the number (without possible sign), you can reverse the digits, “split it” into groups of three digits using unpack, join the resulting list with comma, and reverse to get the integer part back.
Ooh, quite right about that negative sign - that's a problem. Perlosity aside, my goal was to make something that was as fast as possible. Reversing a string is costly, so I was trying to avoid it since you'd have to do it twice. Unpack probably takes the cake for raw speed (I guess I'm not totally sure, I've not used it much; demo code would be awesome), since it works at the byte level, but here's my improved version, which should at least be better than my first one.
Code:
sub commify($) {
    my $number = shift;
    my $d = index $number, "."; $d = length $number if ($d == -1);
    for (my $x = $d - 3; $x > 0; $x -= 3) {
        if ((substr $number, $x - 1, 1) =~ m/[0-9]/) {
            substr $number, $x, 0, ",";
        }
    }
    return $number;
}
Most expensive operations are index() which is O(n) and the loop, which is O(n/3). Now, that part really hinges on how fast substr() is when doing an insertion. You could also speed things along by changing the regex match with [0-9] to be simply a comparison of the substr() against "-":
Code:
if ((substr $number, $x - 1, 1) ne "-") {
If you're using a number as the argument (since although you can specify a "+" in the Perl code, it gets discarded in the conversion to a string, so the first character of the string is always either a digit or the minus sign). The unpack() thing interests me, I'd be happy if you could show it.

Last edited by taylor_venable; 04-14-2008 at 10:24 PM.
 
Old 04-14-2008, 10:55 PM   #9
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 70
Quote:
Originally Posted by taylor_venable View Post
The unpack() thing interests me, I'd be happy if you could show it.
Here is an unpack version without any reversing:
Code:
sub commify($) {
	my $num = shift;
	
	my @part = ($num =~ m/^(-?)(\d+)(.*)/);
	
	my $len = length($part[1]);
	my $d = $len % 3 if $len > 3;

	my $pre = $d ? "A$d" : "";

	$part[1] = join ",", unpack "$pre(A3)*", $part[1];

	local $" = "";
	return "@part";
}
It first strips out the integer part (as the second element of the array), and then does its work on that.

Last edited by osor; 04-14-2008 at 11:18 PM. Reason: revised
 
Old 04-15-2008, 01:03 PM   #10
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Hi.

J Friedl showed a version of commify that he said was a "full third faster" than the usual version. I added a few things, and made one correction:
Code:
sub commify {
  unless ( @_ == 1 ) {
    carp('Sub usage: $withcomma = commify($somenumber);');
    return undef;
  }

  my ($t1) = $_[0];

  # From Mastering Regular Expressions, page 292.
  # Corrected by adding "x": ">gx;".

  $t1 =~ s<
    (\d{1,3})       # before a comma, 1 to 3 digits
    (?=             # followed by, but not part of what's matched
        (?:\d\d\d)+ # some number of triplets
        (?!\d)      # not followed by another digit
    )               # (in other words, which ends the number)
  ><$1,>gx;

  return $t1;
}
In running these with a simple harness that prints a line before and after calling commify produces:
Code:
% ./taylor_commify data1
999
9,99

1000
10,00

12345
123,45
and
Code:
% ./osor_commify data1
999
999

1000
1,000

12345
12,345
and finally:
Code:
% ./friedl_commify data1
999
999

1000
1,000

12345
12,345
Apologies if I copied anything incorrectly. I have not tested the speed for comparison ... cheers, makyo
 
Old 04-15-2008, 11:42 PM   #11
taylor_venable
Member
 
Registered: Jun 2005
Location: Indiana, USA
Distribution: OpenBSD, Ubuntu
Posts: 892

Rep: Reputation: 41
Well, makyo, I'm not sure how you're running my last-posted version, but it seems to be coming out wrong! Anyway, when I run it, the right answer is given, so I'm not sure what the problem is there... Still, thanks for finding that other implementation.

I went ahead and benchmarked these four versions (mine, osor's, kshkid's, and now Jeffrey Friedl's) and the results were ... interesting:
Code:
         Rate   osor friedl kshkid taylor
osor   3420/s     --   -19%   -35%   -55%
friedl 4202/s    23%     --   -20%   -45%
kshkid 5274/s    54%    26%     --   -30%
taylor 7576/s   122%    80%    44%     --
Not entirely expected, but given that my version is the closest to what you would write in a lower-level language like C, it seems there is little really to reject about it. Please, feel free to analyze or criticize: the code is available at http://real.metasyntax.net:2357/code/arch/commify.pl -- it runs each method over my chosen example data 25000 (or rather: 25,000) times. To ensure accuracy, you can uncomment the other lines to get the real output printed to your screen. Note that if you remove the additional overhead of the second function call to do_commify() (which is there so you can selectively show the result) there are some performance gains across the board, but nothing that actually changes the results.

Perl version details are found at http://real.metasyntax.net:2357/perl-version.txt

EDIT:
Oh, also I just noticed that the version of Friedl's method you supplied fails for numbers with fractional parts of more than three digits: -1234567.1234 => -1,234,567.1,234

Last edited by taylor_venable; 04-16-2008 at 12:12 AM. Reason: Found a bug in another implementation; changed URL on my site to be more permanent.
 
Old 04-16-2008, 07:27 AM   #12
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Hi, taylor_venable.

Thanks for doing the speed comparison tests.

I recopied your code and tested it again. I got the same results, so I tried feeding your function a chomped line and it worked as desired, so the newline causes the result I posted.

Friedl mentions the "FAQ" method:
Code:
1 while s/^(-?\d+)(\d{3})/$1,$2/;
very concise, but can be improved (3%) simply with:
Code:
1 while s/^(-?\d+)(\d\d\d)/$1,$2/;
and, finally removing the anchor so allow entire strings:
Code:
1 while s/(\d+)(\d\d\d)/$1,$2/;
It was this code against which he tested for his "third faster". However, it will not handle strings with decimal points correctly.

He added a bit of code later in the discussion (for his faster version) to handle the problem of decimal points:
Code:
\G((?:^-)?\d{1,3})       # before a comma, 1 to 3 digits
The underlying theme of all of Friedl's code, of course, is regular expressions. I have the 1997 edition of MRE, and there is now a third edition available.

The code posted by kshkid corresponds to Recipe 2.17 from the Perl Cookbook (1999).

For one-off solutions, I like the final FAQ version, but if I had a lot of integers, I'd use the other solutions. If I had more general numbers, I'd need to use something other than the basic regular expression codes.

I have enjoyed this excursion into the problem. Thanks to everyone for their posts.

So, readers of this thread can make their choices based on speed, brevity, understandability, etc.

Best wishes ... cheers, makyo
 
Old 04-16-2008, 07:40 AM   #13
taylor_venable
Member
 
Registered: Jun 2005
Location: Indiana, USA
Distribution: OpenBSD, Ubuntu
Posts: 892

Rep: Reputation: 41
Quote:
Originally Posted by makyo View Post
I recopied your code and tested it again. I got the same results, so I tried feeding your function a chomped line and it worked as desired, so the newline causes the result I posted.
Ah, OK. That makes sense; I wrote the code for my method assuming the input was a number type, not a string. Partly because I thought that's what was wanted, but mostly because it frees me from a lot of input validation problems. (For one, it simply allows me to assume the only sign character that can be present is a "-" since the "+" you might write in your source code gets dropped in the conversion from number to string.)

I would expect Friedl's method to be all about the regexen, as I would many Perl approaches to this particular problem, as it is fundamentally text manipulation. My point was just simply not to overdo it, as there are many problems where simpler methods are much faster and less complicated. Even still, excellent work, all.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
String representation of octal to int ManuPeng Programming 8 04-09-2007 12:02 PM
Converting number to locale's abbreviated month name tsilok Linux - Newbie 4 10-25-2005 06:18 PM
USB Device Representation in Linux BobCap Programming 1 10-12-2005 09:51 PM
Textual representation of numbers lackluster Programming 4 09-05-2004 05:02 PM
Have problem converting a decimal number to octal Linh Programming 4 05-20-2004 04:21 PM


All times are GMT -5. The time now is 05:46 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration