LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-28-2009, 03:34 PM   #1
shahgols
Member
 
Registered: Dec 2006
Posts: 97

Rep: Reputation: 15
What programming language(s) to use?


Hi all,

I want to write a script that will read from a text file, line by line, and then dissect each line and load each dissected value into a column of a table. I was going to use Perl for that, but I read that since perl is an interpreted language, that it is slower than other languages. My question is, what language(s) would you use to write this script? Performance is of high importance, since I want to load over 170 million records. The database is a SQL Server 2005. Thanks in advance.
 
Old 04-28-2009, 03:45 PM   #2
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
What languages do you know?

Most of the time will probably got into actually reading the input from the file and writing the output. That time is pretty much independent of the language you use for the processing in between. So you don't necessarily need to choose the language with fastest processing (which is asm, C or C++ depending on your coding abilities). Maybe even Perl will be fast enough that processing time won't be noticed.

I would write it in C++, because that is the language I know best and I know it well enough to avoid (when appropriate) any loss of performance relative to C, and this problem obviously doesn't justify the effort needed for asm.

But the language that is right for you should be one you know already.

BTW, how does the SQL Server fit in? You said the input is "text file". Is the output directly into requests to the SQL server? Or is the output to some file format that the SQL server can import? Either way, the SQL server work to receive the data may dwarf the processing in your program to reformat the data. So you may be worrying about performance in the wrong place.

Last edited by johnsfine; 04-28-2009 at 03:47 PM.
 
Old 04-28-2009, 03:52 PM   #3
shahgols
Member
 
Registered: Dec 2006
Posts: 97

Original Poster
Rep: Reputation: 15
I knew C and C++ some 15 years ago. No worries, I can pick it up again and keep going at it until I can make it work. I know a little bit of Perl.

The script is supposed to open a connection to SQL Server, read a text file on the filesystem, and parse each line of the file and then using the opened database connection, load it to SQL Server. At least that's the plan.

SQL Server actually loads pretty fast. I remember a load of 4 million rows in a minute or two. All I want is to make sure that whatever language I use will be able to keep up. Do you think Perl should be fine for this project?
 
Old 04-28-2009, 04:14 PM   #4
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197Reputation: 1197
You might want to start a new thread (or rename this one) to get some SQL experts to look. (Rename thread by editing first post and using "Go Advanced" button).

I know barely more than zero about SQL, but I think it could add 170 million records a LOT faster if you put them in an appropriately formatted file for some kind of bulk import than if you give it the records one at a time. That difference probably matters a lot more than Perl vs. C++ for the reformatting.

Is the SQL server running on the same system as the reformatting program (so Perl would be taking CPU cycles from it)? Or is it accessed across a LAN?

My wild guess is a PERL program might take several minutes of CPU time more than C++ for this task. If those CPU cycles are taken from a local SQL server working on the same job then the whole thing could run several minutes slower because you chose PERL. But if the CPU isn't shared then those extra minutes likely would be entirely overlapped in all the minutes an SQL server would take to accept 170 million records one at a time.

Last edited by johnsfine; 04-28-2009 at 04:19 PM.
 
Old 04-29-2009, 08:27 AM   #5
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,348

Rep: Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749Reputation: 2749
Actually, Perl isn't interpreted (like eg bash); its compiled on the fly (sort of). The net effect is its eg 80-90% as fast as C, but easier to program.
See http://www.perl.com/doc/FMTEYEWTK/comp-vs-interp.html for the gory details.
I also agree that using eg Perl to reformat the data and then use SQL Server bulk load tool (whatever its called) to load the new file would overall be faster.
In this case you can also do the reformat on a different machine if you are paranoid about the performance load.
 
Old 04-29-2009, 08:32 AM   #6
Su-Shee
Member
 
Registered: Sep 2007
Location: Berlin
Distribution: Slackware
Posts: 510

Rep: Reputation: 53
I doubt that Perl isn't fast enough - especially on the text processing side (reading lines, parsing them, doing stuff...).

The Perl reg ex engine ist extremely optimized (yours would have to be better...) and Perl was made to handle text as good as possible.

Why don't you simply write two test scripts, one in Perl, (good Perl of course - not some bad hack missing all the things making Perl fast), one in C++ and compare it?

Just stuff some ten thousand entries into your database and measure what's actually taking the time. There's a bunch of Perl modules to help you with that.

Surely C++ has some libraries doing the same.
 
Old 04-29-2009, 08:42 AM   #7
jlinkels
LQ Guru
 
Registered: Oct 2003
Location: Bonaire, Leeuwarden
Distribution: Debian /Jessie/Stretch/Sid, Linux Mint DE
Posts: 5,195

Rep: Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043Reputation: 1043
I assume this is a repeating task which you will perform. Otherwise if you have to process those 170 million records only once, the development time in C/C++ will be an order of magnitude more than the processing time in almost any interpreted language.

When you use an interpreted language, don't forget that processing time is used for two parts: interpreting your script, and executing the statements. Interpreting is generally slow, but executing the statements is very fast since that is compiled code again.

If your script is written smartly, the interpretion time might be small as compared to execution time. Think about a forech() statement in PHP operating on a 1000 member array.

As proposed in another post, it might be worth to do some quick measurements using a scripting language, it could save you a lot of development time.

jlinkels
 
Old 04-29-2009, 09:13 AM   #8
JoeBleaux
LQ Newbie
 
Registered: Mar 2009
Distribution: Slamd64
Posts: 26

Rep: Reputation: 15
Perl
 
Old 04-29-2009, 09:56 AM   #9
theYinYeti
Senior Member
 
Registered: Jul 2004
Location: France
Distribution: Arch Linux
Posts: 1,897

Rep: Reputation: 66
I can confirm (I did the test once) that perl is much faster than bash/awk/sed.

Yves.
 
Old 04-29-2009, 10:03 AM   #10
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by theYinYeti View Post
I can confirm (I did the test once) that perl is much faster than bash/awk/sed.

Yves.
debatable, at least with awk. you left out Python and Ruby.

Last edited by ghostdog74; 04-29-2009 at 10:05 AM.
 
Old 04-29-2009, 10:13 AM   #11
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
I think you will find that Perl's performance is actually very good, and it's optimized performance in text processing would be challenging to match in most other (compiled) languages. There are Perl bindings for most common SQL Databases, although I'm not sure about yours. I would think that would factor heavily into your choice.
I believe you would be well served to heed jlinkels' advice.

--- rod.
 
Old 04-29-2009, 12:11 PM   #12
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Perl. And HOP == Higher Order Perl.
 
Old 04-29-2009, 12:55 PM   #13
H_TeXMeX_H
LQ Guru
 
Registered: Oct 2005
Location: $RANDOM
Distribution: slackware64
Posts: 12,928
Blog Entries: 2

Rep: Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301Reputation: 1301
Quote:
Originally Posted by ghostdog74 View Post
debatable, at least with awk. you left out Python and Ruby.
I agree, awk is relatively fast as well. Nowhere near C of course.

For a more quantitative answer see here:
http://shootout.alioth.debian.org/u3...ng2=perl&box=1

Unfortunately awk is not on there.

Actually, the difference between Perl and Python is not that great in terms of performance, check for yourself on the above site.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
which programming language is used to do tcp/ip programming?? gajaykrishnan Linux - Networking 9 12-21-2012 05:16 AM
D Programming Language XsuX Programming 7 11-17-2004 08:55 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:26 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration