ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
In something that uses a POSIX-compatible printf(), you can specify the ' flag to use your locale’s thousand’s separator. For example, with the command-line printf utility:
Code:
$ printf "%'d\n" 125345
125,345
I am surprised that this is not part of Perl’s printf (even with “use POSIX qw(printf)”), since this has been in POSIX for at least 10 years.
In something that uses a POSIX-compatible printf(), you can specify the ' flag to use your locale’s thousand’s separator. For example, with the command-line printf utility:
I am surprised that this is not part of Perl’s printf (even with “use POSIX qw(printf)”), since this has been in POSIX for at least 10 years.
Yes, that does seem surprising, but the Camel book says that the system sprintf is emulated, not used.
For other solutions, searching for the string commify (or more rarely, commafy) and perl will yield a number of hits to help perform the task of inserting a comma every 3 digits ... cheers, makyo
This analysis is totally unnecessary, but the problem intrigues me, so here it goes.
That's clever, but I would think that would run slow. The bad part is the lookahead in the global regex. This is going to search through the string for the decimal point each time, which is unnecessary because once we've found it we shouldn't have to think about it anymore. I propose the following, which may net additional performance improvement because it involves only simple relationships like equivalence to specific characters, without having to rely on an overly complicated regex.
This assumes correct number formatting to work without chopping off data, as "xyz.abc.def" would return "xyz.abc" -- but I would think also that Perl might optimize away the regex for a single-character, and turn it into a simple rindex() and substr(). If that's not the case, one can easily adopt this to use that method, rather than a call to split().
This analysis is totally unnecessary, but the problem intrigues me, so here it goes.
Well, now that you started it, your solution will not work if the number has a plus or minus sign. Also loop does not seem very “perlish”. For the integer part of the number (without possible sign), you can reverse the digits, “split it” into groups of three digits using unpack, join the resulting list with comma, and reverse to get the integer part back.
Well, now that you started it, your solution will not work if the number has a plus or minus sign. Also loop does not seem very “perlish”. For the integer part of the number (without possible sign), you can reverse the digits, “split it” into groups of three digits using unpack, join the resulting list with comma, and reverse to get the integer part back.
Ooh, quite right about that negative sign - that's a problem. Perlosity aside, my goal was to make something that was as fast as possible. Reversing a string is costly, so I was trying to avoid it since you'd have to do it twice. Unpack probably takes the cake for raw speed (I guess I'm not totally sure, I've not used it much; demo code would be awesome), since it works at the byte level, but here's my improved version, which should at least be better than my first one.
Code:
sub commify($) {
my $number = shift;
my $d = index $number, "."; $d = length $number if ($d == -1);
for (my $x = $d - 3; $x > 0; $x -= 3) {
if ((substr $number, $x - 1, 1) =~ m/[0-9]/) {
substr $number, $x, 0, ",";
}
}
return $number;
}
Most expensive operations are index() which is O(n) and the loop, which is O(n/3). Now, that part really hinges on how fast substr() is when doing an insertion. You could also speed things along by changing the regex match with [0-9] to be simply a comparison of the substr() against "-":
Code:
if ((substr $number, $x - 1, 1) ne "-") {
If you're using a number as the argument (since although you can specify a "+" in the Perl code, it gets discarded in the conversion to a string, so the first character of the string is always either a digit or the minus sign). The unpack() thing interests me, I'd be happy if you could show it.
Last edited by taylor_venable; 04-14-2008 at 09:24 PM.
J Friedl showed a version of commify that he said was a "full third faster" than the usual version. I added a few things, and made one correction:
Code:
sub commify {
unless ( @_ == 1 ) {
carp('Sub usage: $withcomma = commify($somenumber);');
return undef;
}
my ($t1) = $_[0];
# From Mastering Regular Expressions, page 292.
# Corrected by adding "x": ">gx;".
$t1 =~ s<
(\d{1,3}) # before a comma, 1 to 3 digits
(?= # followed by, but not part of what's matched
(?:\d\d\d)+ # some number of triplets
(?!\d) # not followed by another digit
) # (in other words, which ends the number)
><$1,>gx;
return $t1;
}
In running these with a simple harness that prints a line before and after calling commify produces:
Well, makyo, I'm not sure how you're running my last-posted version, but it seems to be coming out wrong! Anyway, when I run it, the right answer is given, so I'm not sure what the problem is there... Still, thanks for finding that other implementation.
I went ahead and benchmarked these four versions (mine, osor's, kshkid's, and now Jeffrey Friedl's) and the results were ... interesting:
Not entirely expected, but given that my version is the closest to what you would write in a lower-level language like C, it seems there is little really to reject about it. Please, feel free to analyze or criticize: the code is available at http://real.metasyntax.net:2357/code/arch/commify.pl -- it runs each method over my chosen example data 25000 (or rather: 25,000) times. To ensure accuracy, you can uncomment the other lines to get the real output printed to your screen. Note that if you remove the additional overhead of the second function call to do_commify() (which is there so you can selectively show the result) there are some performance gains across the board, but nothing that actually changes the results.
EDIT:
Oh, also I just noticed that the version of Friedl's method you supplied fails for numbers with fractional parts of more than three digits: -1234567.1234 => -1,234,567.1,234
Last edited by taylor_venable; 04-15-2008 at 11:12 PM.
Reason: Found a bug in another implementation; changed URL on my site to be more permanent.
I recopied your code and tested it again. I got the same results, so I tried feeding your function a chomped line and it worked as desired, so the newline causes the result I posted.
Friedl mentions the "FAQ" method:
Code:
1 while s/^(-?\d+)(\d{3})/$1,$2/;
very concise, but can be improved (3%) simply with:
Code:
1 while s/^(-?\d+)(\d\d\d)/$1,$2/;
and, finally removing the anchor so allow entire strings:
Code:
1 while s/(\d+)(\d\d\d)/$1,$2/;
It was this code against which he tested for his "third faster". However, it will not handle strings with decimal points correctly.
He added a bit of code later in the discussion (for his faster version) to handle the problem of decimal points:
Code:
\G((?:^-)?\d{1,3}) # before a comma, 1 to 3 digits
The underlying theme of all of Friedl's code, of course, is regular expressions. I have the 1997 edition of MRE, and there is now a third edition available.
The code posted by kshkid corresponds to Recipe 2.17 from the Perl Cookbook (1999).
For one-off solutions, I like the final FAQ version, but if I had a lot of integers, I'd use the other solutions. If I had more general numbers, I'd need to use something other than the basic regular expression codes.
I have enjoyed this excursion into the problem. Thanks to everyone for their posts.
So, readers of this thread can make their choices based on speed, brevity, understandability, etc.
I recopied your code and tested it again. I got the same results, so I tried feeding your function a chomped line and it worked as desired, so the newline causes the result I posted.
Ah, OK. That makes sense; I wrote the code for my method assuming the input was a number type, not a string. Partly because I thought that's what was wanted, but mostly because it frees me from a lot of input validation problems. (For one, it simply allows me to assume the only sign character that can be present is a "-" since the "+" you might write in your source code gets dropped in the conversion from number to string.)
I would expect Friedl's method to be all about the regexen, as I would many Perl approaches to this particular problem, as it is fundamentally text manipulation. My point was just simply not to overdo it, as there are many problems where simpler methods are much faster and less complicated. Even still, excellent work, all.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.