Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
MySQL format(sum(ID), '000000000000000'): 61,050,470,996,395,050
MySQL SUM(CAST(ID AS UNSIGNED)): 61050470996456889
Excel: 61050470996395000
I don't think you've been hitting MySQL precision limit (maximum BIGINT UNSIGNED value is 18446744073709551615).
1 and 2 are equal.
3 is unique. For me, it's the only mystery currently... Are all your IDs>=0, unsigned and without any decimals?
4 is unique. It has already been explained why (Excel limits its precision to 15 digits only).
MySQL format(sum(ID), '000000000000000'): 61,050,470,996,395,050
MySQL SUM(CAST(ID AS UNSIGNED)): 61050470996456889
Excel: 61050470996395000
I don't think you've been hitting MySQL precision limit (maximum BIGINT UNSIGNED value is 18446744073709551615).
1 and 2 are equal.
3 is unique. For me, it's the only mystery currently... Are all your IDs>=0, unsigned and without any decimals?
4 is unique. It has already been explained why (Excel limits its precision to 15 digits only).
yes, all ID's are > 0, exist, unsigned, and no decimals.
Ok, I guess its a limit of hardware. I'm just trying to sum ID to confirm an import matches the source. Thanks for your help.
In that case, given you really do care about this, I'd also write a short prog (Perl for pref) to SELECT all the IDs and sum them.
Be interesting to see what answer you get ...
Ok, I guess its a limit of hardware. I'm just trying to sum ID to confirm an import matches the source. Thanks for your help.
OK, thanks.
If the source is an excel sheet then you will need to figure out some intermediate way to get a reference result without the excel precision limit.
You could generate a checksum with bit mask operations in MySQL, but I have no exposure to excel so do not know if it could generate a reference checksum.
Your question has prompted me to think about the basic math, and one question that came to mind was if we sum the series 1 + 2 + 3... +N, what would be the largest N for which the sum would be less than the unsigned 64 bit integer limit. This works out to be 6074000999. That is,
Code:
The sum of all positive integers 1+2+3...6074000999 == 18446744070963499500
18446744070963499500 < 18446744073709551615 (Diff == 2746052115)
That may not help with your ultimate solution, but it helps to visualize the relative scale of the numbers you are working with and the various limits involved.
The scale of the limit N from above: 6074000999
The scale of IDs from your example: 170318068021
Think of it as an indicator of how fast your sum is moving toward the limit.
We know nothing about the distribution of your data, the sequence obviously has gaps, but the scale of the values you are adding exceed the limit N by two digits or a scale of 100. I don't know if that is helpful to you, but I thought I'd post it here while I had the thought in view.
Good luck!
Last edited by astrogeek; 12-20-2018 at 06:42 PM.
Reason: formatting
Thanks for your help. I'm not too worried because when I do a text file compare of the giant list of ID's from excel and mysql they match. no text differences. So it the numbers are the same the result has to be the same. I was just curious why the calc was always off. Its good enough.
MySQL format(sum(ID), '000000000000000'): 61,050,470,996,395,050
MySQL SUM(CAST(ID AS UNSIGNED)): 61050470996456889
Excel: 61050470996395000
I don't think you've been hitting MySQL precision limit (maximum BIGINT UNSIGNED value is 18446744073709551615).
1 and 2 are equal.
3 is unique. For me, it's the only mystery currently... Are all your IDs>=0, unsigned and without any decimals?
4 is unique. It has already been explained why (Excel limits its precision to 15 digits only).
Commands #1/#2 give a different result than #3 and, frankly, I would rather rely on #3.
It seems that there is an implicit conversion VARCHAR->INT by MySQL during #1/#2 but I don't really know how it is done exactly so you'd better force the conversion yourself...
More, your PHP script confirms #3
Quote:
Originally Posted by kenw232
Thanks for your help. I'm not too worried because when I do a text file compare of the giant list of ID's from excel and mysql they match. no text differences. So it the numbers are the same the result has to be the same. I was just curious why the calc was always off. Its good enough.
Did you get an error after importing your data into MySQL? If number of affected rows matched your number of Excel rows, it should mean that everything went fine (except if some data were already present in Table).
Finally, please note that same ID sum result between Excel and MySQL doesn't necessarily mean that your IDs are all the same (there can be some further balance).
However I think this bias cannot happen if number of rows is the same. Did you compare COUNT(ID) FROM Table; with your number of Excel rows?
just a trick: it is enough to sum up the last 4 or 5 digits, no need to handle bigints or similar.
Also you may try to sum up only 1000 (or 10000) lines in one.
You can split the table and that may help you to avoid the mentioned problem.
Probably sum of 10000 lines is still "small" enough and that way you will have 6 [partial] results to compare.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.