LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   I want to take the value of certain column of a string in perl (https://www.linuxquestions.org/questions/programming-9/i-want-to-take-the-value-of-certain-column-of-a-string-in-perl-4175638260/)

andres_fever 09-12-2018 03:00 PM

I want to take the value of certain column of a string in perl
 
I have a string

Var=/dev/sda1 4.7G 4.3G 202M 96% /

I want to take the value of 96% of the variable Var to later perform operations with this. Help!!! I'm doing a perl script with this.

TB0ne 09-12-2018 03:08 PM

Quote:

Originally Posted by andres_fever (Post 5902633)
I have a string

Var=/dev/sda1 4.7G 4.3G 202M 96% /

I want to take the value of 96% of the variable Var to later perform operations with this. Help!!! I'm doing a perl script with this.

Read the LQ Rules and "Question Guidelines". You posted this same question twice within five minutes of each other, and you aren't showing us anything that you've done/tried. We are happy to help you, but you have to show us your efforts first. Post the pieces of the script where you're trying to get this done, and the results you're getting.

There are MANY ways to do what you're after, including using a system call and awk, along with the built-in perl split function.

andres_fever 09-12-2018 03:29 PM

I have tried with various forms with strip. It happens that this was done in bash with
column5=`echo $Var | cut -d' ' -f5`;
echo $column5;

I started with perl a few days ago and it has been difficult for me. I could not with perl

BW-userx 09-12-2018 03:38 PM

bash string manipulation, you can use
Code:

Substring Removal

${string#substring}

    Deletes shortest match of $substring from front of $string.
${string##substring}

    Deletes longest match of $substring from front of $string.

Code:

$ echo $Var
/dev/sda1 4.7G 4.3G 202M 96%
 
$ v=${Var##* }
 
$ echo $v
96%


andres_fever 09-12-2018 03:45 PM

Thanks for your help @BW-userx, but I want to do this with perl because I am learning, with bash I solve it as I showed in the previous one. thank you very much

BW-userx 09-12-2018 03:50 PM

'how to print columns in perl'

https://www.perlmonks.org/?node_id=1011694

just one thing I found on it.

individual 09-12-2018 03:55 PM

Quote:

Originally Posted by andres_fever (Post 5902662)
Thanks for your help @BW-userx, but I want to do this with perl because I am learning, with bash I solve it as I showed in the previous one. thank you very much

Here are some links that should help you.
split
difference between $array[1] and @array[1]
Perl Regular Expressions

scasey 09-12-2018 04:47 PM

Generally speaking, when one starts learning a new programming language it works best to study the syntax of the new language thoroughly. Don't fall into the trap of thinking that because some things appear to be similar that the syntax is also similar.

For example, both bash and perl use a dollar sign to identify a variable, say $var One significant syntactical difference is in assignment of values to a variable.
bash:
Code:

var=something
perl:
Code:

$var=something
You've been given some links to sites that discuss perl syntax, and you can search for more. Back in the day, one pretty much had to buy a book. I have a well-worn copy of O'Reilly Publishings Programming Perl

Good luck. Let us know what you've tried.

individual 09-12-2018 06:08 PM

Quote:

Originally Posted by scasey (Post 5902688)
You've been given some links to sites that discuss perl syntax, and you can search for more. Back in the day, one pretty much had to buy a book. I have a well-worn copy of O'Reilly Publishings Programming Perl

I'd recommend buying Programming Perl. When I first started learning it was a lot of help.

rtmistler 09-12-2018 10:50 PM

Moved: This thread is more suitable in Programming and has been moved accordingly to help your thread/question get the exposure it deserves.

Turbocapitalist 09-13-2018 04:59 AM

Others have mentioned split() and using columns. However, "There is more than one way to do it" with perl. Since the one of the biggest strengths of perl is in its pattern matching, you could make a regular expression to identify and extract the data you want, say a run of digits preceding a percent sign and delimited by whitespace. It can be done in two steps or one depending on how familiar you are with capture groups. There are also named capture groups if you want to try something advanced, though one would not be needed in a simple pattern.

The pattern needed here is so easy that it's tempting to just show the solution or a variation but for that reason, especially since you said you wanted to learn, I'd just push to regex.

individual 09-13-2018 03:57 PM

Quote:

Originally Posted by Turbocapitalist (Post 5902831)
Others have mentioned split() and using columns. However, "There is more than one way to do it" with perl. Since the one of the biggest strengths of perl is in its pattern matching, you could make a regular expression to identify and extract the data you want, say a run of digits preceding a percent sign and delimited by whitespace. It can be done in two steps or one depending on how familiar you are with capture groups. There are also named capture groups if you want to try something advanced, though one would not be needed in a simple pattern.

The pattern needed here is so easy that it's tempting to just show the solution or a variation but for that reason, especially since you said you wanted to learn, I'd just push to regex.

Indeed, regular expressions would work well here, but assuming that value will always be in that column it is easier to split the string and grab the value that way. The flip side would be: that value isn't guaranteed to be in that column, in which case regex would be the better choice.

Turbocapitalist 09-13-2018 11:46 PM

Quote:

Originally Posted by individual (Post 5903027)
Indeed, regular expressions would work well here, but assuming that value will always be in that column it is easier to split the string and grab the value that way. The flip side would be: that value isn't guaranteed to be in that column, in which case regex would be the better choice.

When I first read that I saw 'easier' as 'faster' in the context of the program running, not necessarily in writing. Personal style and familiarity determine what is easier to write.

In order to answer the part about which runs fastest, I whipped up three data files each 500000 lines long and processed them on a slower computer that was doing as little else as possible. One data file was with fixed positions in fixed columns, one with a variable column position in a fixed numebr of columns, and one with a variable column position in a variable number of columns. The latter two are just for fun because split() won't work there:

Code:

$ head -n 4 01a.data
75nKylBe x3Se7I94 PCmwA90F b3px4aBF 41% ZV7iU0su
HUmP0BaC cyJZKvGS DZY6kCof 8YfRJXxB 16% jksP5D2I
WEeYx6aS 9tLI0aPy CBFy0tRr Ze65vBIO 54% dPyNVhuq
4R6ajwqs nSThEVgc aROkBX4K lhoaGW4H 10% 8bH9DWQX

$ head -n 4 01b.data
83% t3ZjfLwz 70Q4FAPf QLacDTzg mrCY9tvc 67zSes0Z
XT0x482f rQ54bWSV xSUYC5gB CvqWz9d6 70% tR52MLZz
34% YN8OemgE UnFMzNmL AdjK0SFU KUaJyIn8 aibPj80w
Xe3fxBHu 1z9ZNA2L YtGI1Uh2 55% ktQEdioO uC9hSH7N

$ head -n 4 01c.data
13% H4eP1xWB Ua4qkLRM 539Qs08i drzGD17Z
76sZXwK4 Gy90t1u7 K1NVadwG 89% med7x5Ti M6yIaGj0
PIxOKL8g 48% QjLglfBE DcYXaMnJ ixZfq4wS cr0jO5aG PhK8RJeY
4ajKdMy6 VMbrd192 8p5zrfWl 69% BGUqjnAJ YhWnF1qC hlbfXyq4

In the first data set, the one where both split() and regex can be used, I found that split() was a bit faster in subjective time. I expected that. However, regex seemed a bit faster on the variable positions than on the fixe positions. I did five runs of each, hoping that the slower hardware would amplify the differences. I saved the results from the last three runs of each:

split() on fixed columns

real 0m3.735s user 0m3.713s sys 0m0.020s
real 0m3.763s user 0m3.733s sys 0m0.030s
real 0m3.707s user 0m3.677s sys 0m0.030s

Regex on fixed columns

real 0m4.877s user 0m4.819s sys 0m0.050s
real 0m4.988s user 0m4.968s sys 0m0.020s
real 0m4.915s user 0m4.885s sys 0m0.020s

Regex on variable column position in a fixed number of columns

real 0m4.483s user 0m4.452s sys 0m0.030s
real 0m4.511s user 0m4.464s sys 0m0.040s
real 0m4.455s user 0m4.405s sys 0m0.050s

Regex on a variable column position in a variable number of columns:

real 0m4.523s user 0m4.503s sys 0m0.020s
real 0m4.517s user 0m4.515s sys 0m0.000s
real 0m4.513s user 0m4.477s sys 0m0.030s

I tried to make the scripts a similar as possible. Here is the first script:

Code:

#!/usr/bin/perl
# extract the 5th column using split

use strict;
use warnings;
use integer;

while (<>) {
    chomp;
[redacted]
    print $line[4],qq(\n);
}

exit ( 0 );

and the second script:

Code:

#!/usr/bin/perl
# extract a column using regex

use strict;
use warnings;
use integer;

while (<>) {
    chomp;
[redacted]
    print $pct,qq(\n);
}

exit ( 0 );

Not quite scientific but suggests some choices.

pan64 09-14-2018 02:55 AM

Quote:

Originally Posted by andres_fever (Post 5902652)
I started with perl a few days ago and it has been difficult for me. I could not with perl

Would be nice to see what did you try and probably we can help you to fix it...

individual 09-14-2018 09:15 AM

Quote:

Originally Posted by Turbocapitalist (Post 5903133)
When I first read that I saw 'easier' as 'faster' in the context of the program running, not necessarily in writing. Personal style and familiarity determine what is easier to write.

In order to answer the part about which runs fastest, I whipped up three data files each 500000 lines long and processed them on a slower computer that was doing as little else as possible.

SNIPPED

Not quite scientific but suggests some choices.

Hey, thanks for taking the time to test those out. I agree and disagree with your statement about personal style dictating which is easier to write. For a beginner who may not have used regular expressions before, split is most likely going easier to write and comprehend.


All times are GMT -5. The time now is 08:05 AM.