LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-20-2017, 11:11 AM   #1
skirtum
LQ Newbie
 
Registered: May 2017
Location: Lithuania
Distribution: Linux mint 18.1 Cinnamon 64-bit
Posts: 10

Rep: Reputation: Disabled
LibreOffice Calc performance vs MS Excell with large amount of data lines


I'm doing my research project now with some quite large amount of data. I'm not a programmer and I my knowledge is limited to Excel and LibreOffice Calc, when we talk about preparation of the data for the analysis.

My problem is with LibreOffice Calc, where my data has ~500 000 of lines. MS excell just crunches it with no sweat (insert new column, do some vlookup's). However LibreOffice Calc just stalks when I ask it to insert new column or even sort that data. And stalks it from 10 to 60 or more seconds. I went to tools/options/memory and increased some numbers there, but no positive effect registered.
My Linux machine, on which I work, has no problems with resources: 8GB RAM, intel Core i7 CPU, SSD drive.
My Windows machine, where excel has no any problems with that large amount of data has even even worse CPU (i5).

I have also noticed, that during these stalks only one CPU core reaches 100 percent, all other has no load on my Linux machine.

I guess something could be tweaked. Anyone can suggest anything?

P.S. My project results looks like that:
https://flic.kr/p/USpV3C
https://flic.kr/p/TB8BZY
It is Lithuanian Parliament votes rendered with Gephi.
 
Old 05-23-2017, 07:19 AM   #2
remma12
Member
 
Registered: May 2017
Distribution: Arch
Posts: 65

Rep: Reputation: 28
Quote:
Originally Posted by skirtum View Post
I'm doing my research project now with some quite large amount of data. I'm not a programmer and I my knowledge is limited to Excel and LibreOffice Calc, when we talk about preparation of the data for the analysis.

My problem is with LibreOffice Calc, where my data has ~500 000 of lines. MS excell just crunches it with no sweat (insert new column, do some vlookup's). However LibreOffice Calc just stalks when I ask it to insert new column or even sort that data. And stalks it from 10 to 60 or more seconds. I went to tools/options/memory and increased some numbers there, but no positive effect registered.
My Linux machine, on which I work, has no problems with resources: 8GB RAM, intel Core i7 CPU, SSD drive.
My Windows machine, where excel has no any problems with that large amount of data has even even worse CPU (i5).

I have also noticed, that during these stalks only one CPU core reaches 100 percent, all other has no load on my Linux machine.

I guess something could be tweaked. Anyone can suggest anything?

P.S. My project results looks like that:
https://flic.kr/p/USpV3C
https://flic.kr/p/TB8BZY
It is Lithuanian Parliament votes rendered with Gephi.
Might be worth asking over at https://ask.libreoffice.org/en/questions/
 
Old 05-23-2017, 09:25 AM   #3
skirtum
LQ Newbie
 
Registered: May 2017
Location: Lithuania
Distribution: Linux mint 18.1 Cinnamon 64-bit
Posts: 10

Original Poster
Rep: Reputation: Disabled
Thanks. I'll do that.
 
Old 05-23-2017, 11:51 AM   #4
skirtum
LQ Newbie
 
Registered: May 2017
Location: Lithuania
Distribution: Linux mint 18.1 Cinnamon 64-bit
Posts: 10

Original Poster
Rep: Reputation: Disabled
It seems like there is quite a lot of similar bug reports for poor calc performance with large data. It seems like this is not a configuration issue. Or not the configuration issue, which I could solve... I'll wait for next LibreOffice version.

Thanks for the link again anyways!
 
1 members found this post helpful.
Old 05-23-2017, 03:35 PM   #5
jefro
Moderator
 
Registered: Mar 2008
Posts: 20,988

Rep: Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405
I tried to use Calc against updating MS excel but the speed on some large calculations made buying the MS product cost effective. Excel rates well for many tasks. Strictly on a DB issue there are other programs that may be faster but you'll have to learn them. IBM DB2 has a host of accelerator products for massive databases.
 
Old 05-23-2017, 03:42 PM   #6
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-30
Posts: 5,290

Rep: Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916
heres my quick-and-dirty research:
http://www.linuxquestions.org/questi...7/#post5216232
 
Old 05-24-2017, 12:51 AM   #7
skirtum
LQ Newbie
 
Registered: May 2017
Location: Lithuania
Distribution: Linux mint 18.1 Cinnamon 64-bit
Posts: 10

Original Poster
Rep: Reputation: Disabled
Yeah... Common problem. But it seems like the best solution is to use right tool. And either Calc or Excel is not the right tool for large amount of data. I have to learn to use another tool. Are there any open source tools for that? I guess IBM DB2 is not an Open Source. Or is it?
 
Old 05-24-2017, 12:55 AM   #8
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,676
Blog Entries: 3

Rep: Reputation: 2908Reputation: 2908Reputation: 2908Reputation: 2908Reputation: 2908Reputation: 2908Reputation: 2908Reputation: 2908Reputation: 2908Reputation: 2908Reputation: 2908
Depending on what you are doing, R might be an option. It is very widely used.
 
Old 05-24-2017, 08:37 AM   #9
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-30
Posts: 5,290

Rep: Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916Reputation: 916
Quote:
Originally Posted by skirtum View Post
Yeah... Common problem. But it seems like the best solution is to use right tool. And either Calc or Excel is not the right tool for large amount of data. I have to learn to use another tool. Are there any open source tools for that? I guess IBM DB2 is not an Open Source. Or is it?
ibm-db2 isnt (it runs on s/390 mainframes -- the version of db2 for unix would be ibm-udb which also is proprietary) but mysql is.
 
Old 05-24-2017, 10:54 PM   #10
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Posts: 945

Rep: Reputation: 216Reputation: 216Reputation: 216
Just a thought.

How does gNumeric stack up against MsExcel? As a quck hit would anyone recommend that the OP tries it before moving away from spreadsheets?

OK

Last edited by AnanthaP; 05-24-2017 at 10:57 PM.
 
Old 05-25-2017, 03:07 PM   #11
jefro
Moderator
 
Registered: Mar 2008
Posts: 20,988

Rep: Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405Reputation: 3405
It boils down to the type of data and the way you want to access it and crunch it really.
You'd have to decide what tool is best.

Official differences. https://wiki.openoffice.org/wiki/Doc...Calc_and_Excel One can generally export/import data of these or link their data to some standard database to speed up or increase ability as in size or scope or join. You can use different backends with either calc or excel I think.

From OODB this is about right. https://en.wikipedia.org/wiki/Compar...gement_systems

Some data base here. http://www.yolinux.com/TUTORIALS/LinuxDatabases.html

Ideas. http://wiki.linuxquestions.org/wiki/...ndows_software

Then you get into similar things like Filemaker pro and ways to use similar in linux.

If one had a huge amount of data to crunch in some way and they needed it fast, IBM DB2 and maybe Oracle would be my first two choices. My huge and your huge may be two different things.

Last edited by jefro; 05-25-2017 at 03:09 PM.
 
Old 05-27-2017, 03:53 PM   #12
skirtum
LQ Newbie
 
Registered: May 2017
Location: Lithuania
Distribution: Linux mint 18.1 Cinnamon 64-bit
Posts: 10

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by jefro View Post
It boils down to the type of data and the way you want to access it and crunch it really.
You'd have to decide what tool is best.

[...]

If one had a huge amount of data to crunch in some way and they needed it fast, IBM DB2 and maybe Oracle would be my first two choices. My huge and your huge may be two different things.
Thanks for the links. I'll try to understand what are these.

My large amount of data is ~5.5 mil of rows. It's not a secret (all recorded voting from Lithuanian Parliament (from http://lrs.lt)): http://atviriduomenys.lt/data/lrs/balsavimai/

I'm trying to analyze it using Gephi. I render these results into coalitions: https://flic.kr/p/TB8BZY . It seems like fun to me. It also helps me to understand Lithuanian politics better and quite faster than it would be otherwise.

And functions, which I use with the excel or calc is: Concatenate, Vlookup, filter.

Last edited by skirtum; 05-27-2017 at 03:59 PM.
 
Old 05-27-2017, 08:07 PM   #13
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,513

Rep: Reputation: 1010Reputation: 1010Reputation: 1010Reputation: 1010Reputation: 1010Reputation: 1010Reputation: 1010Reputation: 1010
Nice graphic. It seems as though flickr thinks it's a firework.
 
Old 05-27-2017, 09:07 PM   #14
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
I'm curious about how you are parsing those files. The header in "klausimai.csv" shows six fields but the number of commas in a record varies. For example the following line has 40 commas.
Code:
2017-05-02,11:56:24,rytinis,; pateikimas,"Darbo kodekso patvirtinimo, įsigaliojimo ir įgyvendinimo įstatymo Nr. XII-2603 1 straipsniu patvirtinto Lietuvos Respublikos darbo kodekso 21, 23, 31, 32, 40, 43, 48, 52, 53, 57, 63, 65, 71, 79, 112, 114, 115, 117, 120, 127, 144, 147, 169, 171, 179, 181, 185, 195, 197, 204, 209, 221, 237, 240, 241 ir 242 straipsnių pakeitimo įstatymo projektas (Nr. XIIIP-587)",http://www.lrs.lt/sip/portal.show?p_r=15275&p_k=1&p_a=sale_bals&p_bals_id=-26038
Kind of hard to import as a csv file.

EDIT: Never mind. I was trying to import some of the data into MySQL and it was ignoring the quotes around the "klausimai.pavadinimas" field. Figured it out by importing into LibreOffice calc.

Please elaborate on how you manipulate the data.

Last edited by norobro; 05-28-2017 at 11:34 AM.
 
Old 05-28-2017, 01:13 PM   #15
skirtum
LQ Newbie
 
Registered: May 2017
Location: Lithuania
Distribution: Linux mint 18.1 Cinnamon 64-bit
Posts: 10

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by norobro View Post
[...]

Please elaborate on how you manipulate the data.
Here is one of the "Gephi-ready" files: https://1drv.ms/f/s!Alz2FMnmDud9huNSn2YTASmvCoEGpw

Here's Gephi file itself: https://1drv.ms/f/s!Alz2FMnmDud9huNTSppCmXljMBve2Q

All politicians has their fractions and then coalitions. And some of them changes fractions during the ruling period - I needed to put all these fractions next to the name. I then colored all these fractions, so I could see how these renders into the coalitions.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
HP laserjet 6p stalls with large amount of data simjii Mandriva 2 04-10-2020 08:52 PM
[SOLVED] Moving large amount of data to another drive Quads Linux - Newbie 6 03-04-2014 02:59 PM
Best interprocess communication method for transferring large amount of data? mntgoat Linux - Software 1 05-07-2009 06:58 AM
howto copy & backup LARGE amount of data edenCC Linux - Server 5 12-16-2007 09:02 AM
generate large amount of traffic data Mr_C Linux - Networking 3 03-09-2006 11:38 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:35 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration