LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   What programming language(s) to use? (https://www.linuxquestions.org/questions/programming-9/what-programming-language-s-to-use-722324/)

shahgols 04-28-2009 03:34 PM

What programming language(s) to use?
 
Hi all,

I want to write a script that will read from a text file, line by line, and then dissect each line and load each dissected value into a column of a table. I was going to use Perl for that, but I read that since perl is an interpreted language, that it is slower than other languages. My question is, what language(s) would you use to write this script? Performance is of high importance, since I want to load over 170 million records. The database is a SQL Server 2005. Thanks in advance.

johnsfine 04-28-2009 03:45 PM

What languages do you know?

Most of the time will probably got into actually reading the input from the file and writing the output. That time is pretty much independent of the language you use for the processing in between. So you don't necessarily need to choose the language with fastest processing (which is asm, C or C++ depending on your coding abilities). Maybe even Perl will be fast enough that processing time won't be noticed.

I would write it in C++, because that is the language I know best and I know it well enough to avoid (when appropriate) any loss of performance relative to C, and this problem obviously doesn't justify the effort needed for asm.

But the language that is right for you should be one you know already.

BTW, how does the SQL Server fit in? You said the input is "text file". Is the output directly into requests to the SQL server? Or is the output to some file format that the SQL server can import? Either way, the SQL server work to receive the data may dwarf the processing in your program to reformat the data. So you may be worrying about performance in the wrong place.

shahgols 04-28-2009 03:52 PM

I knew C and C++ some 15 years ago. :) No worries, I can pick it up again and keep going at it until I can make it work. I know a little bit of Perl.

The script is supposed to open a connection to SQL Server, read a text file on the filesystem, and parse each line of the file and then using the opened database connection, load it to SQL Server. At least that's the plan.

SQL Server actually loads pretty fast. I remember a load of 4 million rows in a minute or two. All I want is to make sure that whatever language I use will be able to keep up. Do you think Perl should be fine for this project?

johnsfine 04-28-2009 04:14 PM

You might want to start a new thread (or rename this one) to get some SQL experts to look. (Rename thread by editing first post and using "Go Advanced" button).

I know barely more than zero about SQL, but I think it could add 170 million records a LOT faster if you put them in an appropriately formatted file for some kind of bulk import than if you give it the records one at a time. That difference probably matters a lot more than Perl vs. C++ for the reformatting.

Is the SQL server running on the same system as the reformatting program (so Perl would be taking CPU cycles from it)? Or is it accessed across a LAN?

My wild guess is a PERL program might take several minutes of CPU time more than C++ for this task. If those CPU cycles are taken from a local SQL server working on the same job then the whole thing could run several minutes slower because you chose PERL. But if the CPU isn't shared then those extra minutes likely would be entirely overlapped in all the minutes an SQL server would take to accept 170 million records one at a time.

chrism01 04-29-2009 08:27 AM

Actually, Perl isn't interpreted (like eg bash); its compiled on the fly (sort of). The net effect is its eg 80-90% as fast as C, but easier to program.
See http://www.perl.com/doc/FMTEYEWTK/comp-vs-interp.html for the gory details.
I also agree that using eg Perl to reformat the data and then use SQL Server bulk load tool (whatever its called) to load the new file would overall be faster.
In this case you can also do the reformat on a different machine if you are paranoid about the performance load.

Su-Shee 04-29-2009 08:32 AM

I doubt that Perl isn't fast enough - especially on the text processing side (reading lines, parsing them, doing stuff...).

The Perl reg ex engine ist extremely optimized (yours would have to be better...) and Perl was made to handle text as good as possible.

Why don't you simply write two test scripts, one in Perl, (good Perl of course - not some bad hack missing all the things making Perl fast), one in C++ and compare it?

Just stuff some ten thousand entries into your database and measure what's actually taking the time. There's a bunch of Perl modules to help you with that.

Surely C++ has some libraries doing the same.

jlinkels 04-29-2009 08:42 AM

I assume this is a repeating task which you will perform. Otherwise if you have to process those 170 million records only once, the development time in C/C++ will be an order of magnitude more than the processing time in almost any interpreted language.

When you use an interpreted language, don't forget that processing time is used for two parts: interpreting your script, and executing the statements. Interpreting is generally slow, but executing the statements is very fast since that is compiled code again.

If your script is written smartly, the interpretion time might be small as compared to execution time. Think about a forech() statement in PHP operating on a 1000 member array.

As proposed in another post, it might be worth to do some quick measurements using a scripting language, it could save you a lot of development time.

jlinkels

JoeBleaux 04-29-2009 09:13 AM

Perl

theYinYeti 04-29-2009 09:56 AM

I can confirm (I did the test once) that perl is much faster than bash/awk/sed.

Yves.

ghostdog74 04-29-2009 10:03 AM

Quote:

Originally Posted by theYinYeti (Post 3524771)
I can confirm (I did the test once) that perl is much faster than bash/awk/sed.

Yves.

debatable, at least with awk. you left out Python and Ruby.

theNbomr 04-29-2009 10:13 AM

I think you will find that Perl's performance is actually very good, and it's optimized performance in text processing would be challenging to match in most other (compiled) languages. There are Perl bindings for most common SQL Databases, although I'm not sure about yours. I would think that would factor heavily into your choice.
I believe you would be well served to heed jlinkels' advice.

--- rod.

Sergei Steshenko 04-29-2009 12:11 PM

Perl. And HOP == Higher Order Perl.

H_TeXMeX_H 04-29-2009 12:55 PM

Quote:

Originally Posted by ghostdog74 (Post 3524779)
debatable, at least with awk. you left out Python and Ruby.

I agree, awk is relatively fast as well. Nowhere near C of course.

For a more quantitative answer see here:
http://shootout.alioth.debian.org/u3...ng2=perl&box=1

Unfortunately awk is not on there.

Actually, the difference between Perl and Python is not that great in terms of performance, check for yourself on the above site.


All times are GMT -5. The time now is 03:45 PM.