Counting to a specific character

mikehalfogre · 02-12-2010, 10:45 AM

Hello ,

I am loading variables for cXtXdXsX disk names into a script, and at present I have only accounted for there being 3 characters from c to t. I need to change it to a variable recognition so that it can count any number of charcters such as c1t , c10t , or c100t.

I can then take that information and use it with the following string to strip off the lead characters so as to make the 3 in $substr either a variable or redirect to multiple occurrences of raw based on the count returned.

sub raw {
$substr = substr ($_, 3);
$raw1 = substr ($substr, 0, -4);
$raw = lc($raw1);
}

any tips would be appreciated on how to count from the c to the t inclusive so I get 3, 4, 5, etc ...

nadroj · 02-12-2010, 11:02 AM

First: Next time you post a programming question, tell us what programming language it is specific to. I'm assuming you are using Perl for this.

Now, it seems like what you want is regular expressions. If you've never heard of them or used them, then certainly look into it as they are extremely powerful. So you want a string "c" followed by any number of characters, up to a "t"? So a regular expression for that might be "c*t". This may be represented in Perl using

Code:

my $inputString = "c100t";

if ( $inputString =~ /c(.*)t/ )
{   # string matched
    my $matchedString = $1;
    print "match is $1\n";
}
else
{
    # string didnt match
}

Again, look into how REs work in Perl if you dont already know. In short, the "=~" says its going to match the string on the left to the RE on the right. The first and second "/"s denote the start and end of the RE. The ".*" means to match any character (this wont match new lines, you need an extra option for that--look into it if need be). So it will match "c" followed by zero or more characters followed by a "t". The brackets means that part will be put into a variable, namely "$1". If there were a second pair of brackets, that matched string would be put into a variable "$2", etc.

So for the above example input of "c100t", "$1" will be "100" (exclusive of "c" and "t"). If you want "$1" to be "c100t" then move the brackets to include that string, so it would be changed to "(c.*t)". Similarly for other combinations including/excluding "c" or "t".

Of course after you have this string you can do whatever with it (i.e. count length, convert to lc, etc), which is all straightforward.

mikehalfogre · 02-12-2010, 12:22 PM

Thanks for the tips below. I think I can make use of this, and apologies for omission of the language. Yes it is Perl. The only thing I'm not sure of on your response, is the code you have below will print the c*t to the screen and not count it correct?

my $inputString = "c100t";

if ( $inputString =~ /c(.*)t/ )
{ # string matched
my $matchedString = $1;
print "match is $1\n";
}
else
{
# string didnt match
}

If I wanted it to provide the number of characters, I would then have to run the variable through a Perl $LENGTH check or use a Unix sys command like wc -c so I can change my script to :

sub raw {
$substr = substr ($_, $count);
$raw1 = substr ($substr, 0, -4);
$raw = lc($raw1);
}

Since the last 4 are usually in the same setup, I should be able to manage the same string search for it as well. This has been invaluable. Thanks for the tips.

nadroj · 02-12-2010, 12:32 PM

Quote:

is the code you have below will print the c*t to the screen and not count it correct?

Quote:

Originally Posted by nadroj

Of course after you have this string you can do whatever with it (i.e. count length, convert to lc, etc), which is all straightforward.]

So, yes, the code is just to extract that regular expression. After you have the string ("100t", "100tx", whatever) then it is straightforward to check the length of it, convert the case, etc. As an example, I simply saved it to a variable and printed it. The main thing is just getting that string. Or at least I thought the main thing was that you wanted to get some string starting with c, ending with t, but you dont know how many characters in between. I think Im more confused about what your requirement actually is... if it isnt what I just described, then explain further.

If you know exactly how many characters, you can specify in the RE. For example, matching "c", then any 3 characters, then a "t":

Code:

if ( $inputString =~ /c(.{3})t/ )

Where "{n"} specifies exactly "n" characters. So something like "c123t" would match but "c12t" would not match.

If you expect them to be exactly 3 numbers (digits) between you could do

Code:

if ( $inputString =~ /c(\d{3})t/ )

A random search brings up this reference which may help if you want to learn more or modify the RE: http://www.troubleshooters.com/codec...impleWildcards.

If your input is something like "c123t c456t", etc. then you have to do an "ungreedy" match, and a loop. If this applies, let me know and Ill do an example.

Hopefully you are seeing how useful they are!

forrestt · 02-12-2010, 12:40 PM

And, why would you need to do that? As nadroj has eluded to:

Code:

#!/usr/bin/perl

sub raw {
        my $raw = shift;
        $raw =~ s/c(.*)t.*/$1/;
        return lc($raw);
}

my $inputString = "c100t3d0s1";

print &raw($inputString);

HTH

Forrest

mikehalfogre · 02-12-2010, 01:32 PM

Your pretty close on the explanation , what I am trying to do is dummy proof the way disks are handled with Solaris MPxIO and assigning them to ASM.

c5t4849544143484920373730353032303331393235d0s2
c5t4849544143484920373730353032303331393236d0s2
c5t4849544143484920373730353032303331393237d0s2
c5t4849544143484920373730353032303331393238d0s2

I am stripping of the c5t and the d0s2 above so I can use the target name to track down the correct device in /devices/scsi_vhci , but I need to use the variable approach you first mentioned because the channel identifier can be 1 or 2 characters (commonly) and if my memory is working right it could conceivably go to 3 characters, although I dont recall seeing that many channels in a box before. The other thing I will be able to use is the same fuzzy string matching for the last 4 to strip those off , should it be d1s2, d10s2, d255s2. While with MPxIO this scenario is unlikely I have seen it once before and am preparing for it that way.

Your help above explains it better than I have been able to find on many of the pages I've searched through, but I also may not have been using the best keywords to search against.

nadroj · 02-12-2010, 01:43 PM

Well, I have no clue what any of that stuff is. But try this example, and try to modify it to do whatever exactly you need.

Code:

#! /usr/bin/perl

my @array =     ("c5t4849544143484920373730353032303331393235d0s2",
                "c5t4849544143484920373730353032303331393236d0s2",
                "c5t4849544143484920373730353032303331393237d0s2",
                "c5t4849544143484920373730353032303331393238d0s2");
           
foreach $disk (@array)
{
    if ( $disk =~/(c\d{1,3}t).*(d\d+s2)/ )
    {
        # $1 now stores whatever the first part represents (i.e. "c5t")
        # $2 now stores whatever the last part represents (i.e. "d255s2")

        print "first part is '$1', last part is '$2'\n";
    }
    else
    {
        print "'$disk' is in invalid format\n";
    }
}

Prints

Code:

first part is 'c5t', last part is 'd0s2'
first part is 'c5t', last part is 'd0s2'
first part is 'c5t', last part is 'd0s2'
first part is 'c5t', last part is 'd0s2'

EDIT: FYI:

Code:

/(c\d{1,3}t).*(d\d+s2)/

Matches a "c", followed by 1 to 3 (inclusive) digits, and then a "t". All of that is stored in "$1". The last part matches a "d", then 1 or more digits, then "s2", and stores it in "$2".

forrestt · 02-12-2010, 02:02 PM

I had already figured you were trying to manipulate Solaris SCSI Strings. I just don't see the need to use the approach you were trying to use. I am only familiar with these strings being of the form "c.*t.*d.*s.*". Perhaps this will give you what you want.

Code:

#!/usr/bin/perl

sub SCSI {
        my $raw = shift;
        my %SCSI;

        $raw =~ m/c(.*)t(.*)d(.*)s(.*)/;

        $SCSI{'controller'} = $1;
        $SCSI{'target'}     = $2;
        $SCSI{'disk'}       = $3;
        $SCSI{'slice'}      = $4;

        return \%SCSI
}

my @array =     ("c5t4849544143484920373730353032303331393235d0s2",
                "c5t4849544143484920373730353032303331393236d0s2",
                "c5t4849544143484920373730353032303331393237d0s2",
                "c5t4849544143484920373730353032303331393238d0s2");

foreach $disk (@array) {
        $SCSI = &SCSI($disk);

        foreach $key (keys(%$SCSI)) {
                print "$key - $$SCSI{$key}\n";
        }
}

HTH

Forrest

mikehalfogre · 02-12-2010, 02:06 PM

I see what thats doing -- Im doing some of that already with how to match usernames and groups based on some internal standards -- that makes much more sense to me now. You hit it right on the nose , and now I know how to handle getting this setup. Thanks for the help -- I'm still learning how to setup the code you put in for editing , thats still a bit of sticking point for me, but getting better.

Thanks again!!

mikehalfogre · 02-12-2010, 02:12 PM

forrestt , for what I am doing I actually need to count the c(.*)t and then strip it off the disk name , same with the d(.*)s2 portion. When you convert the raw disks on Sol10 with MPxIO , you arent chowning /dev/rdsk, but the ssd@g,<wwn disk identifier>:<slice> , and the wwn disk identifier is used to name the disk as opposed to the old method of c3t5d47s2 . To many chances for some mixups to occur with having to change VTOC's and chown the right slice , so the script I am creating just asks for the new disks , a disk to model after -- who should own it and the mode , and does all the work for the user. Its two-fold in that you dont have to do anything but know some variables, eliminating typos with some builtin checks to verify variables, and also if you have to put 50 disks into RAW use for oracle , this is a big time-saver -- even over a couple of for loops.

thanks for the input , I will play with both your script and ndroj to see which is better suited to my needs in this script.