LinuxQuestions.org
Did you know LQ has a Linux Hardware Compatibility List?
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 11-23-2011, 10:58 AM   #1
MMaddoxx
LQ Newbie
 
Registered: Nov 2011
Location: London, UK
Distribution: ubuntu
Posts: 10

Rep: Reputation: Disabled
script.pl with sed shell calls: sh error syntax error near unexpected token `('


Hi,

Just my second cry for help here, so be gentle!

I am using sed to do a substitution. The target strings in myfile are
Code:
@stuffdeleted#TGACCA/1
I can do the substitution just fine on command line (in bash shell, Fedora 14)
Code:
cat myfile |  sed 's_\(#[TCGA]\{6\}\)/1_\1/TESTSUSTUTIONVALUE2_g'
It all works OK. The output is:
Code:
@stuffdeleted#TGACCA/TESTSUBSTUTIONVALUE2
Because the string that matches the [TCGA]{6} pattern varies, I'm using the enclosing () to capture it into \1, which allows me to paste it back into the RHS of the substitution operation. I can't substitute just on the '/1' because that's not unique in the file.


So far so good. And then I put it into a perl script. And it breaks. The error code is widely posted online but I haven't found a code context that's comparable to my code.
Code:
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `sed s_(#[TCGA]{6})/1_1/2_g  > head20_2.txt.test.sync'
The code context in the perl script first defines the sed line as a command
Code:
my $sed2cmd = "sed 's_\(#[TCGA]\{6\}\)/1_\1/2_g'";
and then includes this in a second commandline which is subsequently called using backticks
Code:
my $cmdreads2 = "awk \'{print \$1$kNL \$5$kNL \$7 | \"$sed2cmd  > $read2\.test\.sync\"}\' $read1\.join\.tmp";
`$cmdreads2`;
($kNL is a constant to tidy up the code, which would otherwise have "\\n\")

$sed2cmd is being interpolated (?expanded?) correctly into the second command but sh doesn't like it. Neither do I (having spent 'quite a bit' of time on it )-:

I'd appreciate suggestions. Solutions would be even better! I'm guessing this is going to be real simple ....

Regards

m
 
Old 11-23-2011, 11:18 AM   #2
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 22,950
Blog Entries: 11

Rep: Reputation: 860Reputation: 860Reputation: 860Reputation: 860Reputation: 860Reputation: 860Reputation: 860
It would be interesting/enlightening to me if you could
explain why you use "text processing tools" like sed & awk
from within the "Swiss army knife" of data mangling.

If you could explain what the perl script does as a whole,
w/ some actual data (before & after processing examples)
.... maybe the thing can be done w/o external programs
and the problems quoting and escaping introduces?



Cheers,
Tink
 
Old 11-23-2011, 11:42 AM   #3
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 13,803

Rep: Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365
Quote:
Originally Posted by MMaddoxx View Post
Hi,

Just my second cry for help here, so be gentle!

I am using sed to do a substitution. The target strings in myfile are
Code:
@stuffdeleted#TGACCA/1
I can do the substitution just fine on command line (in bash shell, Fedora 14)
Code:
cat myfile |  sed 's_\(#[TCGA]\{6\}\)/1_\1/TESTSUSTUTIONVALUE2_g'
It all works OK. The output is:
Code:
@stuffdeleted#TGACCA/TESTSUBSTUTIONVALUE2
Because the string that matches the [TCGA]{6} pattern varies, I'm using the enclosing () to capture it into \1, which allows me to paste it back into the RHS of the substitution operation. I can't substitute just on the '/1' because that's not unique in the file.


So far so good. And then I put it into a perl script. And it breaks. The error code is widely posted online but I haven't found a code context that's comparable to my code.
Code:
sh: -c: line 0: syntax error near unexpected token `('
sh: -c: line 0: `sed s_(#[TCGA]{6})/1_1/2_g  > head20_2.txt.test.sync'
The code context in the perl script first defines the sed line as a command
Code:
my $sed2cmd = "sed 's_\(#[TCGA]\{6\}\)/1_\1/2_g'";
and then includes this in a second commandline which is subsequently called using backticks
Code:
my $cmdreads2 = "awk \'{print \$1$kNL \$5$kNL \$7 | \"$sed2cmd  > $read2\.test\.sync\"}\' $read1\.join\.tmp";
`$cmdreads2`;
($kNL is a constant to tidy up the code, which would otherwise have "\\n\")

$sed2cmd is being interpolated (?expanded?) correctly into the second command but sh doesn't like it. Neither do I (having spent 'quite a bit' of time on it )-:

I'd appreciate suggestions. Solutions would be even better! I'm guessing this is going to be real simple ....

Regards

m
I've got to agree with Tinkster. Perl was designed for chopping up/manipulating data, and has built-in sed, so there shouldn't be a need to fork to a system utility, unless you've got no other choices.

Try something like:
Code:
$variable =~ s_\(#[TCGA]\{6\}\)/1_\1/TESTSUSTUTIONVALUE2_g;
where the $variable is the incoming data line from your input file.
 
Old 11-23-2011, 11:57 AM   #4
MMaddoxx
LQ Newbie
 
Registered: Nov 2011
Location: London, UK
Distribution: ubuntu
Posts: 10

Original Poster
Rep: Reputation: Disabled
@Tinkster
(-: OK
answer_1: probably because I don't know any better; I'm a beginner
answer_2: because text processing tools seem useful, and play is a good way to learn; yes I probably could do it all using Perl but how much fun would that be?
answer_3: because my supervisor does it (so it must be right, huh?)
answer_4: retraining by learning nix/perl/awk/sed/R among others delivers a hard-to-resit temptation to use them all in everything (be glad I didn't work some R and Java in there)
take your pick!

I can't share the data because they're not mine, and include details that would be trivial to track back to origin; a bad scene would surely follow. I've just seen TBOne's comment, which I will explore (didn;t know about sed built into perl, see?)

I was nontheless under the impression that these system tools awk sed were inherently faster at munching through large files (eg 20million lines) than a perl while(<>) loop. Is that not right?
 
Old 11-23-2011, 12:18 PM   #5
TB0ne
Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 13,803

Rep: Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365Reputation: 2365
Quote:
Originally Posted by MMaddoxx View Post
@Tinkster
(-: OK
answer_1: probably because I don't know any better; I'm a beginner
answer_2: because text processing tools seem useful, and play is a good way to learn; yes I probably could do it all using Perl but how much fun would that be?
answer_3: because my supervisor does it (so it must be right, huh?)
answer_4: retraining by learning nix/perl/awk/sed/R among others delivers a hard-to-resit temptation to use them all in everything (be glad I didn't work some R and Java in there)
take your pick!

I can't share the data because they're not mine, and include details that would be trivial to track back to origin; a bad scene would surely follow. I've just seen TBOne's comment, which I will explore (didn;t know about sed built into perl, see?)

I was nontheless under the impression that these system tools awk sed were inherently faster at munching through large files (eg 20million lines) than a perl while(<>) loop. Is that not right?
They may be in SOME cases, but remember what you're asking the program to do. You're first invoking Perl, and beginning processing there...then, you're forking THAT to a system call (multiple, actually, since you're first cat'ing it, then piping that into sed). Using more resources than you need to, and it's not as clean.

If it was me, I'd try it both ways...perl-only, and time it, then you can pre-parse the data using the cat/sed method, then run THAT input data through the perl program, and see which is quicker. But at 20 mil. lines of input data, you're going to be waiting anyway.

Also, the beauty of doing it perl-only, is that you then have the $variable defined as the line with your changes, and the line without your changes is what fed it. So, doing multiple substitutions becomes a trivial change, if your data output needs change later.

Last edited by TB0ne; 11-23-2011 at 12:20 PM.
 
Old 11-24-2011, 03:42 AM   #6
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,269

Rep: Reputation: 165Reputation: 165
please, don't use sed and awk from perl.
it's not big or clever, to be honest it just makes you look silly

There's nowt wrong with programming for fun but it's more
fun if you try and do it well.

Michelangelo didn't paint the thingy chapel with a bread knife and a screwdriver did he?

(You need to get a new supervisor).
 
Old 11-24-2011, 06:37 AM   #7
MMaddoxx
LQ Newbie
 
Registered: Nov 2011
Location: London, UK
Distribution: ubuntu
Posts: 10

Original Poster
Rep: Reputation: Disabled
OK ... but

Quote:
Originally Posted by bigearsbilly View Post
please, don't use sed and awk from perl.
it's not big or clever, to be honest it just makes you look silly

There's nowt wrong with programming for fun but it's more
fun if you try and do it well.

Michelangelo didn't paint the thingy chapel with a bread knife and a screwdriver did he?

(You need to get a new supervisor).
Quite possibly.

But I find that calling awk and sed etc from perl is pretty rife in this field (informatics processing of data from next generation DNA sequencing machines). I have .pl scripts from 5 different sources which not only make calls on awk and sed but also on system sort().

Following the earlier comments on this thread I have started in on a 100% perl version of what I'm trying to do; it's waaaaaaay less compact than a version calling on awk and system sort. If perl is used merely as the glue/wrapper to make these various system calls ... is it really so bad? Is it really the equivalent of hacking at the ceiling of the Sistine Chapel with a waney-edged knife and 'driver?

I'm a learner, so here to listen, learn, be guided. But it seems to me that the minimal perl script calling system tools does at least - and compactly - get the jobs done (or it would do if I could get that quoting issue sorted :-)

further guidance always welcomed ...
m
 
Old 11-24-2011, 08:00 AM   #8
Cedrik
Senior Member
 
Registered: Jul 2004
Distribution: Slackware
Posts: 2,140

Rep: Reputation: 242Reputation: 242Reputation: 242
Longer to write, faster to execute ?

You can test script mods with time command, like : time script.pl

Last edited by Cedrik; 11-24-2011 at 08:02 AM.
 
  


Reply

Tags
awk, perl, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[Backup Script] syntax error near unexpected token `else' Cr0nixx Programming 12 09-27-2011 01:37 PM
Bash script problem: syntax error near unexpected token `do rmaier9 Programming 2 08-15-2011 09:15 AM
[SOLVED] Script error: "syntax error near unexpected token 'then'" RyuuzakiMasato7 Linux - Server 18 06-20-2011 09:28 AM
./script.sh: line 2: syntax error near unexpected token `(' n00balert Programming 3 03-11-2010 04:22 PM
why is my shell script giving me - syntax error near unexpected token 'enterInfo() chisunum Linux - Newbie 3 10-23-2009 03:37 PM


All times are GMT -5. The time now is 07:48 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration