LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   script.pl with sed shell calls: sh error syntax error near unexpected token `(' (http://www.linuxquestions.org/questions/programming-9/script-pl-with-sed-shell-calls-sh-error-syntax-error-near-unexpected-token-%60-915135/)

MMaddoxx 11-23-2011 11:58 AM

script.pl with sed shell calls: sh error syntax error near unexpected token `('
 
Hi,

Just my second cry for help here, so be gentle!

I am using sed to do a substitution. The target strings in myfile are
Code:

@stuffdeleted#TGACCA/1
I can do the substitution just fine on command line (in bash shell, Fedora 14)
Code:

cat myfile |  sed 's_\(#[TCGA]\{6\}\)/1_\1/TESTSUSTUTIONVALUE2_g'
It all works OK. The output is:
Code:

@stuffdeleted#TGACCA/TESTSUBSTUTIONVALUE2
Because the string that matches the [TCGA]{6} pattern varies, I'm using the enclosing () to capture it into \1, which allows me to paste it back into the RHS of the substitution operation. I can't substitute just on the '/1' because that's not unique in the file.


So far so good. And then I put it into a perl script. And it breaks. The error code is widely posted online but I haven't found a code context that's comparable to my code.
Code:

sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `sed s_(#[TCGA]{6})/1_1/2_g  > head20_2.txt.test.sync'

The code context in the perl script first defines the sed line as a command
Code:

my $sed2cmd = "sed 's_\(#[TCGA]\{6\}\)/1_\1/2_g'";
and then includes this in a second commandline which is subsequently called using backticks
Code:

my $cmdreads2 = "awk \'{print \$1$kNL \$5$kNL \$7 | \"$sed2cmd  > $read2\.test\.sync\"}\' $read1\.join\.tmp";
`$cmdreads2`;

($kNL is a constant to tidy up the code, which would otherwise have "\\n\")

$sed2cmd is being interpolated (?expanded?) correctly into the second command but sh doesn't like it. Neither do I (having spent 'quite a bit' of time on it )-:

I'd appreciate suggestions. Solutions would be even better! I'm guessing this is going to be real simple ....

Regards

m

Tinkster 11-23-2011 12:18 PM

It would be interesting/enlightening to me if you could
explain why you use "text processing tools" like sed & awk
from within the "Swiss army knife" of data mangling.

If you could explain what the perl script does as a whole,
w/ some actual data (before & after processing examples)
.... maybe the thing can be done w/o external programs
and the problems quoting and escaping introduces?



Cheers,
Tink

TB0ne 11-23-2011 12:42 PM

Quote:

Originally Posted by MMaddoxx (Post 4531853)
Hi,

Just my second cry for help here, so be gentle!

I am using sed to do a substitution. The target strings in myfile are
Code:

@stuffdeleted#TGACCA/1
I can do the substitution just fine on command line (in bash shell, Fedora 14)
Code:

cat myfile |  sed 's_\(#[TCGA]\{6\}\)/1_\1/TESTSUSTUTIONVALUE2_g'
It all works OK. The output is:
Code:

@stuffdeleted#TGACCA/TESTSUBSTUTIONVALUE2
Because the string that matches the [TCGA]{6} pattern varies, I'm using the enclosing () to capture it into \1, which allows me to paste it back into the RHS of the substitution operation. I can't substitute just on the '/1' because that's not unique in the file.


So far so good. And then I put it into a perl script. And it breaks. The error code is widely posted online but I haven't found a code context that's comparable to my code.
Code:

sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `sed s_(#[TCGA]{6})/1_1/2_g  > head20_2.txt.test.sync'

The code context in the perl script first defines the sed line as a command
Code:

my $sed2cmd = "sed 's_\(#[TCGA]\{6\}\)/1_\1/2_g'";
and then includes this in a second commandline which is subsequently called using backticks
Code:

my $cmdreads2 = "awk \'{print \$1$kNL \$5$kNL \$7 | \"$sed2cmd  > $read2\.test\.sync\"}\' $read1\.join\.tmp";
`$cmdreads2`;

($kNL is a constant to tidy up the code, which would otherwise have "\\n\")

$sed2cmd is being interpolated (?expanded?) correctly into the second command but sh doesn't like it. Neither do I (having spent 'quite a bit' of time on it )-:

I'd appreciate suggestions. Solutions would be even better! I'm guessing this is going to be real simple ....

Regards

m

I've got to agree with Tinkster. Perl was designed for chopping up/manipulating data, and has built-in sed, so there shouldn't be a need to fork to a system utility, unless you've got no other choices.

Try something like:
Code:

$variable =~ s_\(#[TCGA]\{6\}\)/1_\1/TESTSUSTUTIONVALUE2_g;
where the $variable is the incoming data line from your input file.

MMaddoxx 11-23-2011 12:57 PM

@Tinkster
(-: OK
answer_1: probably because I don't know any better; I'm a beginner
answer_2: because text processing tools seem useful, and play is a good way to learn; yes I probably could do it all using Perl but how much fun would that be?
answer_3: because my supervisor does it (so it must be right, huh?)
answer_4: retraining by learning nix/perl/awk/sed/R among others delivers a hard-to-resit temptation to use them all in everything (be glad I didn't work some R and Java in there)
take your pick!

I can't share the data because they're not mine, and include details that would be trivial to track back to origin; a bad scene would surely follow. I've just seen TBOne's comment, which I will explore (didn;t know about sed built into perl, see?)

I was nontheless under the impression that these system tools awk sed were inherently faster at munching through large files (eg 20million lines) than a perl while(<>) loop. Is that not right?

TB0ne 11-23-2011 01:18 PM

Quote:

Originally Posted by MMaddoxx (Post 4531899)
@Tinkster
(-: OK
answer_1: probably because I don't know any better; I'm a beginner
answer_2: because text processing tools seem useful, and play is a good way to learn; yes I probably could do it all using Perl but how much fun would that be?
answer_3: because my supervisor does it (so it must be right, huh?)
answer_4: retraining by learning nix/perl/awk/sed/R among others delivers a hard-to-resit temptation to use them all in everything (be glad I didn't work some R and Java in there)
take your pick!

I can't share the data because they're not mine, and include details that would be trivial to track back to origin; a bad scene would surely follow. I've just seen TBOne's comment, which I will explore (didn;t know about sed built into perl, see?)

I was nontheless under the impression that these system tools awk sed were inherently faster at munching through large files (eg 20million lines) than a perl while(<>) loop. Is that not right?

They may be in SOME cases, but remember what you're asking the program to do. You're first invoking Perl, and beginning processing there...then, you're forking THAT to a system call (multiple, actually, since you're first cat'ing it, then piping that into sed). Using more resources than you need to, and it's not as clean.

If it was me, I'd try it both ways...perl-only, and time it, then you can pre-parse the data using the cat/sed method, then run THAT input data through the perl program, and see which is quicker. But at 20 mil. lines of input data, you're going to be waiting anyway.

Also, the beauty of doing it perl-only, is that you then have the $variable defined as the line with your changes, and the line without your changes is what fed it. So, doing multiple substitutions becomes a trivial change, if your data output needs change later.

bigearsbilly 11-24-2011 04:42 AM

please, don't use sed and awk from perl.
it's not big or clever, to be honest it just makes you look silly ;)

There's nowt wrong with programming for fun but it's more
fun if you try and do it well.

Michelangelo didn't paint the thingy chapel with a bread knife and a screwdriver did he?

(You need to get a new supervisor).

MMaddoxx 11-24-2011 07:37 AM

OK ... but
 
Quote:

Originally Posted by bigearsbilly (Post 4532592)
please, don't use sed and awk from perl.
it's not big or clever, to be honest it just makes you look silly ;)

There's nowt wrong with programming for fun but it's more
fun if you try and do it well.

Michelangelo didn't paint the thingy chapel with a bread knife and a screwdriver did he?

(You need to get a new supervisor).

Quite possibly.

But I find that calling awk and sed etc from perl is pretty rife in this field (informatics processing of data from next generation DNA sequencing machines). I have .pl scripts from 5 different sources which not only make calls on awk and sed but also on system sort().

Following the earlier comments on this thread I have started in on a 100% perl version of what I'm trying to do; it's waaaaaaay less compact than a version calling on awk and system sort. If perl is used merely as the glue/wrapper to make these various system calls ... is it really so bad? Is it really the equivalent of hacking at the ceiling of the Sistine Chapel with a waney-edged knife and 'driver?

I'm a learner, so here to listen, learn, be guided. But it seems to me that the minimal perl script calling system tools does at least - and compactly - get the jobs done (or it would do if I could get that quoting issue sorted :-)

further guidance always welcomed ...
m

Cedrik 11-24-2011 09:00 AM

Longer to write, faster to execute ?

You can test script mods with time command, like : time script.pl


All times are GMT -5. The time now is 11:57 AM.