ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I want to write a script that will read from a text file, line by line, and then dissect each line and load each dissected value into a column of a table. I was going to use Perl for that, but I read that since perl is an interpreted language, that it is slower than other languages. My question is, what language(s) would you use to write this script? Performance is of high importance, since I want to load over 170 million records. The database is a SQL Server 2005. Thanks in advance.
Most of the time will probably got into actually reading the input from the file and writing the output. That time is pretty much independent of the language you use for the processing in between. So you don't necessarily need to choose the language with fastest processing (which is asm, C or C++ depending on your coding abilities). Maybe even Perl will be fast enough that processing time won't be noticed.
I would write it in C++, because that is the language I know best and I know it well enough to avoid (when appropriate) any loss of performance relative to C, and this problem obviously doesn't justify the effort needed for asm.
But the language that is right for you should be one you know already.
BTW, how does the SQL Server fit in? You said the input is "text file". Is the output directly into requests to the SQL server? Or is the output to some file format that the SQL server can import? Either way, the SQL server work to receive the data may dwarf the processing in your program to reformat the data. So you may be worrying about performance in the wrong place.
I knew C and C++ some 15 years ago. No worries, I can pick it up again and keep going at it until I can make it work. I know a little bit of Perl.
The script is supposed to open a connection to SQL Server, read a text file on the filesystem, and parse each line of the file and then using the opened database connection, load it to SQL Server. At least that's the plan.
SQL Server actually loads pretty fast. I remember a load of 4 million rows in a minute or two. All I want is to make sure that whatever language I use will be able to keep up. Do you think Perl should be fine for this project?
You might want to start a new thread (or rename this one) to get some SQL experts to look. (Rename thread by editing first post and using "Go Advanced" button).
I know barely more than zero about SQL, but I think it could add 170 million records a LOT faster if you put them in an appropriately formatted file for some kind of bulk import than if you give it the records one at a time. That difference probably matters a lot more than Perl vs. C++ for the reformatting.
Is the SQL server running on the same system as the reformatting program (so Perl would be taking CPU cycles from it)? Or is it accessed across a LAN?
My wild guess is a PERL program might take several minutes of CPU time more than C++ for this task. If those CPU cycles are taken from a local SQL server working on the same job then the whole thing could run several minutes slower because you chose PERL. But if the CPU isn't shared then those extra minutes likely would be entirely overlapped in all the minutes an SQL server would take to accept 170 million records one at a time.
Actually, Perl isn't interpreted (like eg bash); its compiled on the fly (sort of). The net effect is its eg 80-90% as fast as C, but easier to program.
See http://www.perl.com/doc/FMTEYEWTK/comp-vs-interp.html for the gory details.
I also agree that using eg Perl to reformat the data and then use SQL Server bulk load tool (whatever its called) to load the new file would overall be faster.
In this case you can also do the reformat on a different machine if you are paranoid about the performance load.
I doubt that Perl isn't fast enough - especially on the text processing side (reading lines, parsing them, doing stuff...).
The Perl reg ex engine ist extremely optimized (yours would have to be better...) and Perl was made to handle text as good as possible.
Why don't you simply write two test scripts, one in Perl, (good Perl of course - not some bad hack missing all the things making Perl fast), one in C++ and compare it?
Just stuff some ten thousand entries into your database and measure what's actually taking the time. There's a bunch of Perl modules to help you with that.
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195
Rep:
I assume this is a repeating task which you will perform. Otherwise if you have to process those 170 million records only once, the development time in C/C++ will be an order of magnitude more than the processing time in almost any interpreted language.
When you use an interpreted language, don't forget that processing time is used for two parts: interpreting your script, and executing the statements. Interpreting is generally slow, but executing the statements is very fast since that is compiled code again.
If your script is written smartly, the interpretion time might be small as compared to execution time. Think about a forech() statement in PHP operating on a 1000 member array.
As proposed in another post, it might be worth to do some quick measurements using a scripting language, it could save you a lot of development time.
I think you will find that Perl's performance is actually very good, and it's optimized performance in text processing would be challenging to match in most other (compiled) languages. There are Perl bindings for most common SQL Databases, although I'm not sure about yours. I would think that would factor heavily into your choice.
I believe you would be well served to heed jlinkels' advice.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.