LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   PHP: egrep For Loops and If Statements (https://www.linuxquestions.org/questions/programming-9/php-egrep-for-loops-and-if-statements-129346/)

jhrbek 12-27-2003 05:42 PM

PHP: egrep For Loops and If Statements
 
Hi, hopefully someone can help me. :)

I have a really stupid problem that is driving me crazy. I've been using php for over a year now and I have never seen it behave this way. Here is what's happening. I have a for loop with a few if statements inside. Normally, each if statement would be evaluated on each iteration of the for loop. However, this is not the case this time. I think it might have something to do with my use of the ereg function, nonetheless, I am at a loss. Here is a sample of my script:

Code:

for($i="0"; $i < "50"; $i++) {
 // Parse our data

 $string=$rawdata[$i];
  // Catch the names
  if (ereg ("([^0-9])(,)", $string, $regs)) {
  $data[$i][name]=$string;
}
 // Catch the regular phone lines
  if (ereg ("([1-9]\-[1-9])", $string) && !(ereg ("(Teen)", $string))) {
  $data[$i][phone]=$string;
}
  // Catch the teen phone lines
  if (ereg ("(Teen)", $string)) {
  $data[$i][tphone]=$string;
}
}

What is happening is that instead of creating a 2 dimensional array, it is making a messed up array, still 2-d, but not correct.

If I had 10 names (this is for an address book), each name has a phone number (sometimes 2), an address, etc. So, I should get an array like:

Code:

array (
  0: (
        [name]
        [address]
        [phone]
        [etc...]
      )
  1: (
        [name]
        [address]
        [phone]
        [etc...]
)

Above is what I expect to get. Instead, this is what I get:

Code:

array (
  0: (
        [name]
        )
  1: (
        [address]
        )
  2: (
        [phone]
        )
  3: (
        [etc...]
        )
)


It seems to increment the counter after ereg evaluates a statement to be true. Why is this so? I've tried while loops, for loops, and everything else I can think of. This is just plain frustrating. Please help if you think you can. :)

-j

codedv 12-28-2003 08:37 AM

Could you please post an example of the format in which the data appears?

jhrbek 12-28-2003 10:01 AM

Here is some sample data:
Code:

Zeller, Charles & Eve
453-5432
2509 Jefferson Road
(265)
Ziemba, & Darlene
275-2949_
Teen 345-2332
2720 Overlook Cr.
(2078)
Zimmer, Richard
255-2600
1528 Trumball Terrace
(4333)

it's structure is:

NAME
PHONE
TEEN PHONE(sometimes)
ADDRESS
LOT NUMBER

That's what I need in the array. If you paste that into a file and then read it in with those egrep statements, let me know if you can get it to work. :)

codedv 12-28-2003 01:30 PM

It appears that you might have got into a mess with your regular exprssions. The reason you appear to be using them here is to check for the presence of data.

Let us evaluate the one you have in your first if statement:
if (ereg ("([^0-9])(,)", $string, $regs))

This regular expression "([^0-9])(,)" will match a single non numerical character followed by a comma "," anywhere in $string. Lets use your second address and apply this expression:

Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)

The text in bold is where the match would have occured. You then load the entire contents of $string into the array item [name].

Now the second if statement:
if (ereg ("([1-9]\-[1-9])", $string) && !(ereg ("(Teen)", $string)))

This regular expression "([1-9]\-[1-9])" will match a number between 1 and 9 not including 0 a "-" and another number between 1 and 9. Lets now apply this regular expression to the variable $string:

Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)

This regular expression matches the string hopwever you only proceed if the string contains the word "Teen" to load the entire contents of $string into the array item [phone]

The third if statement:
if (ereg ("(Teen)", $string))

Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)

Quite simply you proceed to load the entire contents of $string variable into the array item [tphone].

I'm quite sure that this behaviour is not what you intended. The main thing wrong with the regular expressions is that they do not recognise the repitition of characters. e.g. to match your telephone number [1-9]\-[1-9] matches only three characters and therefore, would not match 123-1234. It appears that you are also making the assumption that the ereg function somehow modifies the $string variable.

Assuming that the data you supply is in the following format you can contstruct serveral regular expressions to match each data item and then join them together in one regular expression to match the entire string.

I am assuming the following format:
Code:

[name]{NL}
[number]{NL}
Teen [teen]{NL}
[address]{NL}
[lot]{NL}

{NL} = new line character
[teen] and [number] shall be in the format xxx-xxxx
[lot] shall be in the format (:number:)

Firstly to match name we can use the following regular expression:
(.+)\n

The period "." will match an character except newline characters.
The plus "+" requires that there be one or more of the preceeding character.
and "\n" matches a newline character.

Anything included in brackets i.e "(.+)" will be captured separatly. Therefore this regular expression will match:

"Ziemba, & Darlene{nl}" and capture the text "Ziemba, & Darlene"

We then need to match the first telephone number:
([0-9]{3}\-[0-9]{4})\n

The character class [0-9] will match any numeric character.
{3} means there must be exactly 3 of the preceeding character.
\- matches the - character
[0-9]{4} matches exatcly 4 numeric characters.
Finally we match the newline characrer.

Therefore this regular expression will match:

"275-2949{NL}" and capture the text "275-2949"

We then need to match the teen telephone number:
(Teen ([0-9]{3}\-[0-9]{4})\n)?

This regular is similar to the one that matches the number however it is enclosed entirely in brackets. This is refrrered to as grouping. As the teen number may or may not appear we use the "?" to signify that there will be either zero or one of the preceeding grouped expression.

Therefore this regualr expression will match:

"Teen 345-2332{NL}" and capture the text "Teen 345-2332{NL}" and "345-2332"

Next to match the address:
(.+)\n

This will match a single line containing one or more characters and will therefore, match:

"2720 Overlook Cr.{NL}" and capture the text "2720 Overlook Cr."

Finally to match the lot number:
\(([0-9]+)\)

Note to match the barkets they have to be excaped. This expression will match one or more numberical characters enclosed in brakets:

"(2078)" will be matches and the text "2078" will be captured.

Finally we need to put the regular expression together. When we do we come up with this:
(.+)\n([0-9]{3}\-[0-9]{4})\n(Teen ([0-9]{3}\-[0-9]{4})\n)?(.+)\n\(([0-9]+)\)

This will match and entire string containing all the details in the format I mentioned above. If it does not then we can assume the data is corrupt.

The following script uses this regular expression and also the preg_match() function. I prefer the perl style regular expressions as they are a lot more powerful. The function is as follows:

preg_match (regex, string, matches);

The regex argument should be a valid regular expression enclosed in solidus' i.e "/regex/". The string is the text the regular expression is being applied to.

The matches argument is used to load the match and all the captured text into an array. i.e. if this expression were applied to the string I used above $matches would be an array in the following format:

Code:

$matches[0] =>
  "Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)"

$matches[1] => "Ziemba, & Darlene"
$matches[2] =>  "275-2949"
$matches[3] =>  "Teen 345-2332"
$matches[4] => "345-2332"
$matches[5] => "2720 Overlook Cr."
$matches[6] => "(2078)"

I hope this script is what you a looking for. If not then hopfully you can modify the regular expression to fit your needs:
PHP Code:

<?php

/* load in some test data */
$rawdata = Array (
"Zeller, Charles & Eve
453-5432
2509 Jefferson Road
(265)"
,

"Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)"
,

"Zimmer, Richard
255-2600
1528 Trumball Terrace
(4333)"
);

for(
$i=0$i 3$i++) { /* N.B: don't enclose numbers in quotes unless they are text - it confuses things */

  
$string $rawdata[$i];

  
/* this regular expression matches an entire $string in the following format:
   *
   * name[NL]
   * number[NL] (number must be in format xxx-xxxx)
   * Teen teen[NL] (optional - if present must be in format xxx-xxxx)
   * address[NL]
   * lot[NL] (must be in format (:number:))
   *
   * where [NL] = newline character
   *
   * If the data in $string fails to conform to this format it will be skipped.
   * The data collected from this expression is loaded into the array $matches by the preg_match() function
   */

  
if (! preg_match ("/(.+)\n([0-9]{3}\-[0-9]{4})\n(Teen ([0-9]{3}\-[0-9]{4})\n)?(.+)\n\(([0-9]+)\)/"$string$matches))
    continue;

  
/* load the collected data into the $data array */
  /* N.B index names should be included in quotes */
  
$data[$i]['name'] = $matches[1];
  
$data[$i]['number'] = $matches[2];
  
$data[$i]['teen'] = $matches[4];
  
$data[$i]['address'] = $matches[5];
  
$data[$i]['lot'] = $matches[6];

}

/* for demonstration purposes dump the $data array variable */
print_r ($data);
?><?php
$rawdata 
= Array ("Zeller, Charles & Eve
453-5432
2509 Jefferson Road
(265)"
,
   * If 
the data in $string fails to conform to this format it will be skipped.
   * 
The data collected from this expression is loaded into the array $matches by the preg_match() function
   */

  if (! 
preg_match ("/(.+)\n([0-9]{3}\-[0-9]{4})\n(Teen ([0-9]{3}\-[0-9]{4})\n)?(.+)\n\(([0-9]+)\)/"$string$matches))
    continue;

  
/* load the collected data into the $data array */
  
$data[$i]['name'] = $matches[1];
  
$data[$i]['number'] = $matches[2];
  
$data[$i]['teen'] = $matches[4];
  
$data[$i]['address'] = $matches[5];
  
$data[$i]['lot'] = $matches[6];

}

/* for demonstration purposes dump the $data array variable */
print_r ($data);
?>


jhrbek 12-28-2003 09:06 PM

Thanks for the detailed answer! Some of the things you mentioned I didn't know about and others, I forgot about because I grew lazy with my coding practices. :D

I tried using your code and I met it with limited success. For some reason the long regexp statement is evaluating everything as false, so I get a big empty array. :( Better than before though, at least it is formatted correctly. BTW, I had no idea you could put all those regexp cases together like that, thanks for showing that to me!

I think my lines are not terminated with a \n though. The original text file came from a windows machine. So, is there an easy way to determine how my EOL's are marked? I think that is why the regexp statement is failing.

-j

codedv 12-29-2003 02:22 AM

That may well be the reson. On windows lines are terminated with a carridge return and linefeed "\r\n" in text files. PHP should really map \r\n into \n but if you are opening the file in binary mode it won't. You could always modify the regex to allow for Windows style EOL's though. Every time you check for a newline, check for an optional \r as well:

\r?\n - will do it

jhrbek 12-31-2003 10:23 AM

Still stuck. :(

I just don't know why this isn't working. I modified your code to see what was happening and the preg match is evaluating to false for everything. :( Not sure why. I attached my entire script this time.

Code:

<?php

 // vars and data source
 $filename = "beaverlake.txt";
 $fsize = filesize($filename);
 $handle = fopen ($filename, "rt");

 while (!feof ($handle)) {
  $rawdata[] = fgets($handle, $fsize);
 }
 // Close our file handle
  fclose ($handle);

//print_r($rawdata);

$arrSize = count($rawdata);

for($i=0; $i < 10; $i++) {

  $string = $rawdata[$i];
//  echo "STRING: $string\n";

if (!preg_match ("/(.+)\r?\n([0-9]{3}-[0-9]{4})\r?\n(Teen ([0-9]{3}-[0-9]{4})\r?\n)?(.+)\r?\n(([0-9$
{  echo "STRING: $string\n"; continue; }

  print_r($matches);

  $data[$i]['name'] = $matches[1];
  $data[$i]['number'] = $matches[2];
  $data[$i]['teen'] = $matches[4];
  $data[$i]['address'] = $matches[5];
  $data[$i]['lot'] = $matches[6];

}
print_r($data);
?>

Also, if you want the entire datafile, I can email it to you. Just send me a message at jhrbek-postfix at gplsinc dot com :D

codedv 12-31-2003 10:39 AM

Entire data file????

Are you not loading each address into an array called $rawdata. Each array item being an address? Or are you loading the entire file into one variable?

jhrbek 12-31-2003 10:44 AM

I loop through the file, save it to an array ($rawdata). Then I close the file. If you uncomment the print_r($rawdata) you will see all the data is there, in a big one dimensional array...well, probably not as big for you. My data file has around 1,500 lines, so I have a big array. I originally wanted to avoid storing an array all together and process the data line by line, but problems necessitated an array so I could see the data better.

I tried running trim() on each line, appending a \n to the end of it and then saving it to an array, but that didn't work either. I thought trim would kill any "unknown" characters that might be causing problems, it didn't. I even tried putting commas on the end of each line in the array after I trimmed it and still nothing. :( (of course modifying the regexp pattern to look for commas instead of \r\n or \n)

codedv 12-31-2003 11:56 AM

Looking at your code - you use fgets() to read a single line at a time from the file. You insert the data read by this line into an array item of $rawdata[]. This means that each single element of your array will contain only one line of the file. Not an entire address.

You need to have a method of identifying the beginning and end of an address from the file.

jhrbek 12-31-2003 11:58 AM

I tried using commas, but that didn't work. I'll have to think about it some more. Hmm.

codedv 12-31-2003 12:00 PM

it needs to be a character or word, which will not appear in the address. Something like:

<end>

or

#

or even easier separate each address with two blank lines.

jhrbek 12-31-2003 12:08 PM

I'm going to try something else, this isn't working. The delimiters you suggested I have already tried. :(

codedv 12-31-2003 12:24 PM

Ok then. If you are still stuck send me an email wiith the file attached- www.sccode.com - use the contact me link and I'll see if I can sort it out.

jhrbek 12-31-2003 12:26 PM

cool, thanks. I'll try to crack it myself for awhile longer but this is going on a week and I'm just feeling really retarded right now. :D


All times are GMT -5. The time now is 06:50 PM.