LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-27-2009, 03:41 AM   #1
john.daker
Member
 
Registered: Jul 2008
Posts: 33

Rep: Reputation: 15
[perl]How to treat string like "a b" as a single string when split?


Hi,everyone

How to treat string like "a b" as a single string when split?

For example:
when I split a string:
PHP Code:
$str="a b c d";
@
sa=split(/\s+/,$str);
foreach 
$t (@sa){
print 
$t."\n";

I got this:
Code:
a
b
c
d
then:
PHP Code:
$str='a "b c" d';
@
sa=split(/\s+/,$str);
foreach 
$t (@sa){
print 
$t."\n";

I got this:
Code:
a
"b
c"
d
But what i really want is treat "b c" as a single string, also '', [],{} ,() pairs.

so:
a string like:
a b "c d" [e f] g

should split as:
a
b
"c d"
[e f]
g

How can perl do that?

thanks.
 
Old 05-27-2009, 03:59 AM   #2
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by john.daker View Post
Hi,everyone

How to treat string like "a b" as a single string when split?

For example:
when I split a string:
PHP Code:
$str="a b c d";
@
sa=split(/\s+/,$str);
foreach 
$t (@sa){
print 
$t."\n";

I got this:
Code:
a
b
c
d
then:
PHP Code:
$str='a "b c" d';
@
sa=split(/\s+/,$str);
foreach 
$t (@sa){
print 
$t."\n";

I got this:
Code:
a
"b
c"
d
But what i really want is treat "b c" as a single string, also '', [],{} ,() pairs.

so:
a string like:
a b "c d" [e f] g

should split as:
a
b
"c d"
[e f]
g

How can perl do that?

thanks.
Perl can do that, but why do you insist on 'split' ?
 
Old 05-27-2009, 05:50 AM   #3
thangappan
Member
 
Registered: May 2009
Posts: 52

Rep: Reputation: 16
Perl Implementation

I have tried using Perl programming language.It gives moreover your desired output.I think it will be helpful for you.

my (@fields,@sfields);
while (<STDIN>){
chomp;

push @fields,$1 if($_ =~ s/\s+(\".*\")//g);
push @fields,$1 if($_ =~ s/\s+(\(.*\))//g);
push @fields,$1 if($_ =~ s/\s+(\[.*\])//g);
push @fields,$1 if($_ =~ s/\s+(\{.*\})//g);

@sfields = split(/\s+/,$_);
$" ="\n";
print "@sfields\n", "@fields\n";
undef @fields; undef @sfields;
}

I/P
a b "c d" [e f] g

O/P:
a
b
g
"c d"
[e f]
 
Old 05-27-2009, 06:01 AM   #4
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by thangappan View Post
I have tried using Perl programming language.It gives moreover your desired output.I think it will be helpful for you.

my (@fields,@sfields);
while (<STDIN>){
chomp;

push @fields,$1 if($_ =~ s/\s+(\".*\")//g);
push @fields,$1 if($_ =~ s/\s+(\(.*\))//g);
push @fields,$1 if($_ =~ s/\s+(\[.*\])//g);
push @fields,$1 if($_ =~ s/\s+(\{.*\})//g);

@sfields = split(/\s+/,$_);
$" ="\n";
print "@sfields\n", "@fields\n";
undef @fields; undef @sfields;
}

I/P
a b "c d" [e f] g

O/P:
a
b
g
"c d"
[e f]
http://perldoc.perl.org/Text/Balanced.html
 
Old 06-01-2009, 03:32 AM   #5
john.daker
Member
 
Registered: Jul 2008
Posts: 33

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by Sergei Steshenko View Post
Any code?
I cannot figure it out...
 
Old 06-01-2009, 03:44 AM   #6
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by john.daker View Post
Any code?
I cannot figure it out...
What "it" ? What exactly is not clear ? I.e. which is the first sentence you do not understand and what exactly in that sentence you do not understand ?

The document begins with one-two-line examples, e.g.:

Code:
 # Extract the initial substring of $text that is delimited by
 # two (unescaped) instances of the first character in $delim.

	($extracted, $remainder) = extract_delimited($text,$delim);
 
Old 06-01-2009, 03:59 AM   #7
john.daker
Member
 
Registered: Jul 2008
Posts: 33

Original Poster
Rep: Reputation: 15
just my first question, I've tried some code, but cannot do what i want..
I hardly can understand the manual
#!/usr/bin/perl -w
use strict;
use Text::Balanced qw (
extract_delimited
extract_bracketed
extract_quotelike
extract_codeblock
extract_multiple
);

my $text = 'a b "c d" [e f] g';
my ($extracted, $remainder) = extract_bracketed( $text, '[' );
print $extracted."\n";
print $remainder."\n";

Last edited by john.daker; 06-01-2009 at 04:08 AM.
 
Old 06-01-2009, 04:29 AM   #8
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by john.daker View Post
just my first question, I've tried some code, but cannot do what i want..
I hardly can understand the manual
#!/usr/bin/perl -w
use strict;
use Text::Balanced qw (
extract_delimited
extract_bracketed
extract_quotelike
extract_codeblock
extract_multiple
);

my $text = 'a b "c d" [e f] g';
my ($extracted, $remainder) = extract_bracketed( $text, '[' );
print $extracted."\n";
print $remainder."\n";
And what does your script produce ?
 
Old 06-01-2009, 04:35 AM   #9
john.daker
Member
 
Registered: Jul 2008
Posts: 33

Original Poster
Rep: Reputation: 15
Code:
extracted: undef
remainder:a b "c d" [e f] g
No,I dont think Text::Balanced can do this.
 
Old 06-01-2009, 04:58 AM   #10
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by john.daker View Post
Code:
extracted: undef
remainder:a b "c d" [e f] g
No,I dont think Text::Balanced can do this.
Really ? Have you read the DIAGNOSTICS part:
http://perldoc.perl.org/Text/Balanced.html#DIAGNOSTICS ?

Have you tried to enable diagnostics ?

Most importantly, have you read the part on prefixes : http://perldoc.perl.org/Text/Balance...about-prefixes ?

Print diagnostics first.

I already have a version which prints this:

Code:
$extracted=[e f] at ./try_tex_balanced.pl line 15.
$remainder= g at ./try_tex_balanced.pl line 16.
 
Old 06-01-2009, 05:02 AM   #11
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
here's what i do with Python, use it as guide to write the Perl equivalent. The algorithm is easy to understand. Note this is only simplistic case.
Code:
f=0 # declare a flag
# s = 'a b "c d" [e f] g {h i j k} l m' #test string
s = 'a b "c d" [e f] g {h i j k} l m [n o p q r s ] t u v' #test string
items = s.split() #split on space
punct = ['"',"[","]","{","}"]  #get a list of relevant punctuations 
for i in items:  #iterate the splitted items
    if i[-1] in punct: #check if last character is punctuation. 
        print i        
        f=0
        continue #use "next" keyword in Perl
    if i[0] in punct and not i[-1] in punct:  #check things like "c or [e
        f=1        
    if f:
        print i,  #print without newline (side by side)
    if f==0:
        print i #print when no punctuation.
output
Code:
# ./test.py
a
b
"c d"
[e f]
g
{h i j k}
l
m
[n o p q r s ]
t
u
v

Last edited by ghostdog74; 06-01-2009 at 05:04 AM.
 
Old 06-01-2009, 05:04 AM   #12
john.daker
Member
 
Registered: Jul 2008
Posts: 33

Original Poster
Rep: Reputation: 15
Ok, I dont know..
I need the code to split a b "c d" [e f] g to
a
b
"c d"
[e f]
g
using Text::Balanced, if you would like to show me and I appriciate it.
I cannot do this, i'm to new to perl

Last edited by john.daker; 06-01-2009 at 05:08 AM.
 
Old 06-01-2009, 05:12 AM   #13
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by john.daker View Post
Ok, I dont know..
I need the code to split a b "c d" [e f] g to
a
b
"c d"
[e f]
g
using Text::Balanced, if you would like to show me.
I cannot do this.
I am still waiting for answers to my questions:
  1. Have you read the piece of the document describing what prefix is - the piece is http://perldoc.perl.org/Text/Balance...about-prefixes ?
  2. Have you enabled diagnostics - the piece of document is: http://perldoc.perl.org/Text/Balanced.html#DIAGNOSTICS ?

If you enable diagnostics, you'll see what the problem with your code is.

If you (re)read the piece on prefixes, you'll see that "you have been warned", i.e. your original code couldn't work by construction.

I can show you my code, but the point is that I have read the document in order to understand why your code doesn't work, and you apparently haven't.

As many people have said here, the point is not to bring you fish, the point is to teach you to fish, so in this case reading mentioned above portions of the document and acting accordingly is part of the process.
 
Old 06-01-2009, 07:42 AM   #14
Telemachos
Member
 
Registered: May 2007
Distribution: Debian
Posts: 754

Rep: Reputation: 60
@ Sergei: I understand that you are following the style you prefer (teach a man to fish...), but the OP is obviously new to Perl, and frankly Text::Balanced is a hideously complicated module that assumes that you already have an excellent understanding of regular expressions and complex text operations. As for the note about prefixes, I'm relatively comfortable with Perl, and I can't see at all how it applies to the OP's problem. Would you perhaps explain what you have in mind by referring him to that note a bit more clearly?
 
Old 06-01-2009, 07:54 AM   #15
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by Telemachos View Post
@ Sergei: I understand that you are following the style you prefer (teach a man to fish...), but the OP is obviously new to Perl, and frankly Text::Balanced is a hideously complicated module that assumes that you already have an excellent understanding of regular expressions and complex text operations. As for the note about prefixes, I'm relatively comfortable with Perl, and I can't see at all how it applies to the OP's problem. Would you perhaps explain what you have in mind by referring him to that note a bit more clearly?
The OP posted code which doesn't work. The code doesn't work for an obvious reason which is wrong prefix.

I.e. the prefix the OP needs to make his code work is not the default prefix, and the documentation explains what the default prefix is.

I learned about Text::Balanced years ago, but the OP's code is the first case that I've used Text::Balanced.

Yet again I've found that reading documentation is useful.

In the beginning of the document one can read:


Quote:
DESCRIPTION

The various extract_... subroutines may be used to extract a delimited substring, possibly after skipping a specified prefix string. By default, that prefix is optional whitespace (/\s*/ ), but you can change it to whatever you wish (see below).
.

So, the root cause of the OP's problem is explained in the very two first sentences of the DESCRIPTION.

What else should be done if what documentation says is simply ignored ?

And the ignored items are not at the end, but in the beginning.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
"Permission denied" and "recursive directory loop" when searching for string in files mack1e Linux - Newbie 5 06-12-2008 07:38 AM
output the path for files whose names include string "string" (case insensitive) sean_zhang Linux - Newbie 1 03-04-2008 11:59 PM
Perl: Check if "$str" is of type [A-Z]<char string>[0-9][0-9] introuble Programming 2 06-02-2006 10:33 AM
Perl : Changing a single char in a string richhill Programming 2 09-17-2003 04:31 PM
perl string split problem! farhanali Programming 4 06-22-2003 07:08 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 11:55 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration