Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
11-23-2011, 10:58 AM
|
#1
|
|
LQ Newbie
Registered: Nov 2011
Location: London, UK
Distribution: ubuntu
Posts: 10
Rep: 
|
script.pl with sed shell calls: sh error syntax error near unexpected token `('
Hi,
Just my second cry for help here, so be gentle!
I am using sed to do a substitution. The target strings in myfile are
Code:
@stuffdeleted#TGACCA/1
I can do the substitution just fine on command line (in bash shell, Fedora 14)
Code:
cat myfile | sed 's_\(#[TCGA]\{6\}\)/1_\1/TESTSUSTUTIONVALUE2_g'
It all works OK. The output is:
Code:
@stuffdeleted#TGACCA/TESTSUBSTUTIONVALUE2
Because the string that matches the [TCGA]{6} pattern varies, I'm using the enclosing () to capture it into \1, which allows me to paste it back into the RHS of the substitution operation. I can't substitute just on the '/1' because that's not unique in the file.
So far so good. And then I put it into a perl script. And it breaks. The error code is widely posted online but I haven't found a code context that's comparable to my code.
Code:
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `sed s_(#[TCGA]{6})/1_1/2_g > head20_2.txt.test.sync'
The code context in the perl script first defines the sed line as a command
Code:
my $sed2cmd = "sed 's_\(#[TCGA]\{6\}\)/1_\1/2_g'";
and then includes this in a second commandline which is subsequently called using backticks
Code:
my $cmdreads2 = "awk \'{print \$1$kNL \$5$kNL \$7 | \"$sed2cmd > $read2\.test\.sync\"}\' $read1\.join\.tmp";
`$cmdreads2`;
($kNL is a constant to tidy up the code, which would otherwise have "\\n\")
$sed2cmd is being interpolated (?expanded?) correctly into the second command but sh doesn't like it. Neither do I (having spent 'quite a bit' of time on it )-:
I'd appreciate suggestions. Solutions would be even better! I'm guessing this is going to be real simple ....
Regards
m
|
|
|
|
11-23-2011, 11:18 AM
|
#2
|
|
Moderator
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,903
|
It would be interesting/enlightening to me if you could
explain why you use "text processing tools" like sed & awk
from within the "Swiss army knife" of data mangling.
If you could explain what the perl script does as a whole,
w/ some actual data (before & after processing examples)
.... maybe the thing can be done w/o external programs
and the problems quoting and escaping introduces?
Cheers,
Tink
|
|
|
|
11-23-2011, 11:42 AM
|
#3
|
|
Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 11,808
|
Quote:
Originally Posted by MMaddoxx
Hi,
Just my second cry for help here, so be gentle!
I am using sed to do a substitution. The target strings in myfile are
Code:
@stuffdeleted#TGACCA/1
I can do the substitution just fine on command line (in bash shell, Fedora 14)
Code:
cat myfile | sed 's_\(#[TCGA]\{6\}\)/1_\1/TESTSUSTUTIONVALUE2_g'
It all works OK. The output is:
Code:
@stuffdeleted#TGACCA/TESTSUBSTUTIONVALUE2
Because the string that matches the [TCGA]{6} pattern varies, I'm using the enclosing () to capture it into \1, which allows me to paste it back into the RHS of the substitution operation. I can't substitute just on the '/1' because that's not unique in the file.
So far so good. And then I put it into a perl script. And it breaks. The error code is widely posted online but I haven't found a code context that's comparable to my code.
Code:
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `sed s_(#[TCGA]{6})/1_1/2_g > head20_2.txt.test.sync'
The code context in the perl script first defines the sed line as a command
Code:
my $sed2cmd = "sed 's_\(#[TCGA]\{6\}\)/1_\1/2_g'";
and then includes this in a second commandline which is subsequently called using backticks
Code:
my $cmdreads2 = "awk \'{print \$1$kNL \$5$kNL \$7 | \"$sed2cmd > $read2\.test\.sync\"}\' $read1\.join\.tmp";
`$cmdreads2`;
($kNL is a constant to tidy up the code, which would otherwise have "\\n\")
$sed2cmd is being interpolated (?expanded?) correctly into the second command but sh doesn't like it. Neither do I (having spent 'quite a bit' of time on it )-:
I'd appreciate suggestions. Solutions would be even better! I'm guessing this is going to be real simple ....
Regards
m
|
I've got to agree with Tinkster. Perl was designed for chopping up/manipulating data, and has built-in sed, so there shouldn't be a need to fork to a system utility, unless you've got no other choices.
Try something like:
Code:
$variable =~ s_\(#[TCGA]\{6\}\)/1_\1/TESTSUSTUTIONVALUE2_g;
where the $variable is the incoming data line from your input file.
|
|
|
|
11-23-2011, 11:57 AM
|
#4
|
|
LQ Newbie
Registered: Nov 2011
Location: London, UK
Distribution: ubuntu
Posts: 10
Original Poster
Rep: 
|
@Tinkster
(-: OK
answer_1: probably because I don't know any better; I'm a beginner
answer_2: because text processing tools seem useful, and play is a good way to learn; yes I probably could do it all using Perl but how much fun would that be?
answer_3: because my supervisor does it (so it must be right, huh?)
answer_4: retraining by learning nix/perl/awk/sed/R among others delivers a hard-to-resit temptation to use them all in everything (be glad I didn't work some R and Java in there)
take your pick!
I can't share the data because they're not mine, and include details that would be trivial to track back to origin; a bad scene would surely follow. I've just seen TBOne's comment, which I will explore (didn;t know about sed built into perl, see?)
I was nontheless under the impression that these system tools awk sed were inherently faster at munching through large files (eg 20million lines) than a perl while(<>) loop. Is that not right?
|
|
|
|
11-23-2011, 12:18 PM
|
#5
|
|
Guru
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 11,808
|
Quote:
Originally Posted by MMaddoxx
@Tinkster
(-: OK
answer_1: probably because I don't know any better; I'm a beginner
answer_2: because text processing tools seem useful, and play is a good way to learn; yes I probably could do it all using Perl but how much fun would that be?
answer_3: because my supervisor does it (so it must be right, huh?)
answer_4: retraining by learning nix/perl/awk/sed/R among others delivers a hard-to-resit temptation to use them all in everything (be glad I didn't work some R and Java in there)
take your pick!
I can't share the data because they're not mine, and include details that would be trivial to track back to origin; a bad scene would surely follow. I've just seen TBOne's comment, which I will explore (didn;t know about sed built into perl, see?)
I was nontheless under the impression that these system tools awk sed were inherently faster at munching through large files (eg 20million lines) than a perl while(<>) loop. Is that not right?
|
They may be in SOME cases, but remember what you're asking the program to do. You're first invoking Perl, and beginning processing there...then, you're forking THAT to a system call (multiple, actually, since you're first cat'ing it, then piping that into sed). Using more resources than you need to, and it's not as clean.
If it was me, I'd try it both ways...perl-only, and time it, then you can pre-parse the data using the cat/sed method, then run THAT input data through the perl program, and see which is quicker. But at 20 mil. lines of input data, you're going to be waiting anyway.
Also, the beauty of doing it perl-only, is that you then have the $variable defined as the line with your changes, and the line without your changes is what fed it. So, doing multiple substitutions becomes a trivial change, if your data output needs change later.
Last edited by TB0ne; 11-23-2011 at 12:20 PM.
|
|
|
|
11-24-2011, 03:42 AM
|
#6
|
|
Senior Member
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,211
Rep: 
|
please, don't use sed and awk from perl.
it's not big or clever, to be honest it just makes you look silly
There's nowt wrong with programming for fun but it's more
fun if you try and do it well.
Michelangelo didn't paint the thingy chapel with a bread knife and a screwdriver did he?
(You need to get a new supervisor).
|
|
|
|
11-24-2011, 06:37 AM
|
#7
|
|
LQ Newbie
Registered: Nov 2011
Location: London, UK
Distribution: ubuntu
Posts: 10
Original Poster
Rep: 
|
OK ... but
Quote:
Originally Posted by bigearsbilly
please, don't use sed and awk from perl.
it's not big or clever, to be honest it just makes you look silly
There's nowt wrong with programming for fun but it's more
fun if you try and do it well.
Michelangelo didn't paint the thingy chapel with a bread knife and a screwdriver did he?
(You need to get a new supervisor).
|
Quite possibly.
But I find that calling awk and sed etc from perl is pretty rife in this field (informatics processing of data from next generation DNA sequencing machines). I have .pl scripts from 5 different sources which not only make calls on awk and sed but also on system sort().
Following the earlier comments on this thread I have started in on a 100% perl version of what I'm trying to do; it's waaaaaaay less compact than a version calling on awk and system sort. If perl is used merely as the glue/wrapper to make these various system calls ... is it really so bad? Is it really the equivalent of hacking at the ceiling of the Sistine Chapel with a waney-edged knife and 'driver?
I'm a learner, so here to listen, learn, be guided. But it seems to me that the minimal perl script calling system tools does at least - and compactly - get the jobs done (or it would do if I could get that quoting issue sorted :-)
further guidance always welcomed ...
m
|
|
|
|
11-24-2011, 08:00 AM
|
#8
|
|
Senior Member
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140
|
Longer to write, faster to execute ?
You can test script mods with time command, like : time script.pl
Last edited by Cedrik; 11-24-2011 at 08:02 AM.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 01:06 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|