[perl]How to treat string like "a b" as a single string when split?

john.daker · 05-27-2009, 03:41 AM

Hi,everyone

How to treat string like "a b" as a single string when split?

For example:
when I split a string:

PHP Code:



$str="a b c d"; 
@sa=split(/\s+/,$str); 
foreach $t (@sa){ 
print $t."\n"; 
}

I got this:

Code:

a
b
c
d

then:

PHP Code:



$str='a "b c" d'; 
@sa=split(/\s+/,$str); 
foreach $t (@sa){ 
print $t."\n"; 
}

I got this:

Code:

a
"b
c"
d

But what i really want is treat "b c" as a single string, also '', [],{} ,() pairs.

so:
a string like:
a b "c d" [e f] g

should split as:
a
b
"c d"
[e f]
g

How can perl do that?

thanks.

Sergei Steshenko · 05-27-2009, 03:59 AM

Quote:

Originally Posted by john.daker

Hi,everyone

How to treat string like "a b" as a single string when split?

For example:
when I split a string:

PHP Code:



$str="a b c d";
@sa=split(/\s+/,$str);
foreach $t (@sa){
print $t."\n";
}

I got this:

Code:

a
b
c
d

then:

PHP Code:



$str='a "b c" d';
@sa=split(/\s+/,$str);
foreach $t (@sa){
print $t."\n";
}

I got this:

Code:

a
"b
c"
d

But what i really want is treat "b c" as a single string, also '', [],{} ,() pairs.

so:
a string like:
a b "c d" [e f] g

should split as:
a
b
"c d"
[e f]
g

How can perl do that?

thanks.

Perl can do that, but why do you insist on 'split' ?

thangappan · 05-27-2009, 05:50 AM

I have tried using Perl programming language.It gives moreover your desired output.I think it will be helpful for you.

my (@fields,@sfields);
while (<STDIN>){
chomp;

push @fields,$1 if($_ =~ s/\s+(\".*\")//g);
push @fields,$1 if($_ =~ s/\s+($.*$)//g);
push @fields,$1 if($_ =~ s/\s+(\[.*\])//g);
push @fields,$1 if($_ =~ s/\s+(\{.*\})//g);

@sfields = split(/\s+/,$_);
$" ="\n";
print "@sfields\n", "@fields\n";
undef @fields; undef @sfields;
}

I/P
a b "c d" [e f] g

O/P:
a
b
g
"c d"
[e f]

Sergei Steshenko · 05-27-2009, 06:01 AM

Quote:

Originally Posted by thangappan

I have tried using Perl programming language.It gives moreover your desired output.I think it will be helpful for you.

my (@fields,@sfields);
while (<STDIN>){
chomp;

push @fields,$1 if($_ =~ s/\s+(\".*\")//g);
push @fields,$1 if($_ =~ s/\s+($.*$)//g);
push @fields,$1 if($_ =~ s/\s+(\[.*\])//g);
push @fields,$1 if($_ =~ s/\s+(\{.*\})//g);

@sfields = split(/\s+/,$_);
$" ="\n";
print "@sfields\n", "@fields\n";
undef @fields; undef @sfields;
}

I/P
a b "c d" [e f] g

O/P:
a
b
g
"c d"
[e f]

http://perldoc.perl.org/Text/Balanced.html

john.daker · 06-01-2009, 03:32 AM

Quote:

Originally Posted by Sergei Steshenko

http://perldoc.perl.org/Text/Balanced.html

Any code?
I cannot figure it out...

Sergei Steshenko · 06-01-2009, 03:44 AM

Quote:

Originally Posted by john.daker

Any code?
I cannot figure it out...

What "it" ? What exactly is not clear ? I.e. which is the first sentence you do not understand and what exactly in that sentence you do not understand ?

The document begins with one-two-line examples, e.g.:

Code:

 # Extract the initial substring of $text that is delimited by
 # two (unescaped) instances of the first character in $delim.

	($extracted, $remainder) = extract_delimited($text,$delim);

john.daker · 06-01-2009, 03:59 AM

just my first question, I've tried some code, but cannot do what i want..
I hardly can understand the manual
#!/usr/bin/perl -w
use strict;
use Text::Balanced qw (
extract_delimited
extract_bracketed
extract_quotelike
extract_codeblock
extract_multiple
);

my $text = 'a b "c d" [e f] g';
my ($extracted, $remainder) = extract_bracketed( $text, '[' );
print $extracted."\n";
print $remainder."\n";

Sergei Steshenko · 06-01-2009, 04:29 AM

Quote:

Originally Posted by john.daker

just my first question, I've tried some code, but cannot do what i want..
I hardly can understand the manual
#!/usr/bin/perl -w
use strict;
use Text::Balanced qw (
extract_delimited
extract_bracketed
extract_quotelike
extract_codeblock
extract_multiple
);

my $text = 'a b "c d" [e f] g';
my ($extracted, $remainder) = extract_bracketed( $text, '[' );
print $extracted."\n";
print $remainder."\n";

And what does your script produce ?

john.daker · 06-01-2009, 04:35 AM

Code:

extracted: undef
remainder:a b "c d" [e f] g

No,I dont think Text::Balanced can do this.

Sergei Steshenko · 06-01-2009, 04:58 AM

Quote:

Originally Posted by john.daker

Code:

extracted: undef
remainder:a b "c d" [e f] g

No,I dont think Text::Balanced can do this.

Really ? Have you read the DIAGNOSTICS part:
http://perldoc.perl.org/Text/Balanced.html#DIAGNOSTICS ?

Have you tried to enable diagnostics ?

Most importantly, have you read the part on prefixes : http://perldoc.perl.org/Text/Balance...about-prefixes ?

Print diagnostics first.

I already have a version which prints this:

Code:

$extracted=[e f] at ./try_tex_balanced.pl line 15.
$remainder= g at ./try_tex_balanced.pl line 16.

ghostdog74 · 06-01-2009, 05:02 AM

here's what i do with Python, use it as guide to write the Perl equivalent. The algorithm is easy to understand. Note this is only simplistic case.

Code:

f=0 # declare a flag
# s = 'a b "c d" [e f] g {h i j k} l m' #test string
s = 'a b "c d" [e f] g {h i j k} l m [n o p q r s ] t u v' #test string
items = s.split() #split on space
punct = ['"',"[","]","{","}"]  #get a list of relevant punctuations 
for i in items:  #iterate the splitted items
    if i[-1] in punct: #check if last character is punctuation. 
        print i        
        f=0
        continue #use "next" keyword in Perl
    if i[0] in punct and not i[-1] in punct:  #check things like "c or [e
        f=1        
    if f:
        print i,  #print without newline (side by side)
    if f==0:
        print i #print when no punctuation.

output

Code:

# ./test.py
a
b
"c d"
[e f]
g
{h i j k}
l
m
[n o p q r s ]
t
u
v

john.daker · 06-01-2009, 05:04 AM

Ok, I dont know..
I need the code to split a b "c d" [e f] g to
a
b
"c d"
[e f]
g
using Text::Balanced, if you would like to show me and I appriciate it.
I cannot do this, i'm to new to perl

Sergei Steshenko · 06-01-2009, 05:12 AM

Quote:

Originally Posted by john.daker

Ok, I dont know..
I need the code to split a b "c d" [e f] g to
a
b
"c d"
[e f]
g
using Text::Balanced, if you would like to show me.
I cannot do this.

I am still waiting for answers to my questions:

Have you read the piece of the document describing what prefix is - the piece is http://perldoc.perl.org/Text/Balance...about-prefixes ?
Have you enabled diagnostics - the piece of document is: http://perldoc.perl.org/Text/Balanced.html#DIAGNOSTICS ?

If you enable diagnostics, you'll see what the problem with your code is.

If you (re)read the piece on prefixes, you'll see that "you have been warned", i.e. your original code couldn't work by construction.

I can show you my code, but the point is that I have read the document in order to understand why your code doesn't work, and you apparently haven't.

As many people have said here, the point is not to bring you fish, the point is to teach you to fish, so in this case reading mentioned above portions of the document and acting accordingly is part of the process.

Telemachos · 06-01-2009, 07:42 AM

@ Sergei: I understand that you are following the style you prefer (teach a man to fish...), but the OP is obviously new to Perl, and frankly Text::Balanced is a hideously complicated module that assumes that you already have an excellent understanding of regular expressions and complex text operations. As for the note about prefixes, I'm relatively comfortable with Perl, and I can't see at all how it applies to the OP's problem. Would you perhaps explain what you have in mind by referring him to that note a bit more clearly?

Sergei Steshenko · 06-01-2009, 07:54 AM

Quote:

Originally Posted by Telemachos

@ Sergei: I understand that you are following the style you prefer (teach a man to fish...), but the OP is obviously new to Perl, and frankly Text::Balanced is a hideously complicated module that assumes that you already have an excellent understanding of regular expressions and complex text operations. As for the note about prefixes, I'm relatively comfortable with Perl, and I can't see at all how it applies to the OP's problem. Would you perhaps explain what you have in mind by referring him to that note a bit more clearly?

The OP posted code which doesn't work. The code doesn't work for an obvious reason which is wrong prefix.

I.e. the prefix the OP needs to make his code work is not the default prefix, and the documentation explains what the default prefix is.

I learned about Text::Balanced years ago, but the OP's code is the first case that I've used Text::Balanced.

Yet again I've found that reading documentation is useful.

In the beginning of the document one can read:

Quote:

DESCRIPTION

The various extract_... subroutines may be used to extract a delimited substring, possibly after skipping a specified prefix string. By default, that prefix is optional whitespace (/\s*/ ), but you can change it to whatever you wish (see below).

.

So, the root cause of the OP's problem is explained in the very two first sentences of the DESCRIPTION.

What else should be done if what documentation says is simply ignored ?

And the ignored items are not at the end, but in the beginning.