LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 12-27-2003, 05:42 PM   #1
jhrbek
LQ Newbie
 
Registered: Dec 2003
Distribution: RedHat 8
Posts: 17

Rep: Reputation: 0
PHP: egrep For Loops and If Statements


Hi, hopefully someone can help me.

I have a really stupid problem that is driving me crazy. I've been using php for over a year now and I have never seen it behave this way. Here is what's happening. I have a for loop with a few if statements inside. Normally, each if statement would be evaluated on each iteration of the for loop. However, this is not the case this time. I think it might have something to do with my use of the ereg function, nonetheless, I am at a loss. Here is a sample of my script:

Code:
for($i="0"; $i < "50"; $i++) {
 // Parse our data

 $string=$rawdata[$i];
  // Catch the names
   if (ereg ("([^0-9])(,)", $string, $regs)) {
   $data[$i][name]=$string;
}
 // Catch the regular phone lines
   if (ereg ("([1-9]\-[1-9])", $string) && !(ereg ("(Teen)", $string))) {
   $data[$i][phone]=$string;
}
  // Catch the teen phone lines
   if (ereg ("(Teen)", $string)) {
   $data[$i][tphone]=$string;
}
}
What is happening is that instead of creating a 2 dimensional array, it is making a messed up array, still 2-d, but not correct.

If I had 10 names (this is for an address book), each name has a phone number (sometimes 2), an address, etc. So, I should get an array like:

Code:
array (
   0: (
        [name]
        [address]
        [phone]
        [etc...]
       )
   1: (
        [name]
        [address]
        [phone]
        [etc...]
)
Above is what I expect to get. Instead, this is what I get:

Code:
array (
   0: (
        [name]
        )
   1: (
        [address]
        )
   2: (
        [phone]
        )
   3: (
        [etc...]
        )
)

It seems to increment the counter after ereg evaluates a statement to be true. Why is this so? I've tried while loops, for loops, and everything else I can think of. This is just plain frustrating. Please help if you think you can.

-j
 
Old 12-28-2003, 08:37 AM   #2
codedv
Member
 
Registered: Nov 2003
Location: Slough, UK
Distribution: Debian
Posts: 146

Rep: Reputation: 15
Could you please post an example of the format in which the data appears?
 
Old 12-28-2003, 10:01 AM   #3
jhrbek
LQ Newbie
 
Registered: Dec 2003
Distribution: RedHat 8
Posts: 17

Original Poster
Rep: Reputation: 0
Here is some sample data:
Code:
Zeller, Charles & Eve
453-5432
2509 Jefferson Road
(265)
Ziemba, & Darlene
275-2949_
Teen 345-2332
2720 Overlook Cr.
(2078)
Zimmer, Richard
255-2600
1528 Trumball Terrace
(4333)
it's structure is:

NAME
PHONE
TEEN PHONE(sometimes)
ADDRESS
LOT NUMBER

That's what I need in the array. If you paste that into a file and then read it in with those egrep statements, let me know if you can get it to work.
 
Old 12-28-2003, 01:30 PM   #4
codedv
Member
 
Registered: Nov 2003
Location: Slough, UK
Distribution: Debian
Posts: 146

Rep: Reputation: 15
It appears that you might have got into a mess with your regular exprssions. The reason you appear to be using them here is to check for the presence of data.

Let us evaluate the one you have in your first if statement:
if (ereg ("([^0-9])(,)", $string, $regs))

This regular expression "([^0-9])(,)" will match a single non numerical character followed by a comma "," anywhere in $string. Lets use your second address and apply this expression:

Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)

The text in bold is where the match would have occured. You then load the entire contents of $string into the array item [name].

Now the second if statement:
if (ereg ("([1-9]\-[1-9])", $string) && !(ereg ("(Teen)", $string)))

This regular expression "([1-9]\-[1-9])" will match a number between 1 and 9 not including 0 a "-" and another number between 1 and 9. Lets now apply this regular expression to the variable $string:

Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)

This regular expression matches the string hopwever you only proceed if the string contains the word "Teen" to load the entire contents of $string into the array item [phone]

The third if statement:
if (ereg ("(Teen)", $string))

Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)

Quite simply you proceed to load the entire contents of $string variable into the array item [tphone].

I'm quite sure that this behaviour is not what you intended. The main thing wrong with the regular expressions is that they do not recognise the repitition of characters. e.g. to match your telephone number [1-9]\-[1-9] matches only three characters and therefore, would not match 123-1234. It appears that you are also making the assumption that the ereg function somehow modifies the $string variable.

Assuming that the data you supply is in the following format you can contstruct serveral regular expressions to match each data item and then join them together in one regular expression to match the entire string.

I am assuming the following format:
Code:
[name]{NL}
[number]{NL}
Teen [teen]{NL}
[address]{NL}
[lot]{NL}
{NL} = new line character
[teen] and [number] shall be in the format xxx-xxxx
[lot] shall be in the format (:number:)

Firstly to match name we can use the following regular expression:
(.+)\n

The period "." will match an character except newline characters.
The plus "+" requires that there be one or more of the preceeding character.
and "\n" matches a newline character.

Anything included in brackets i.e "(.+)" will be captured separatly. Therefore this regular expression will match:

"Ziemba, & Darlene{nl}" and capture the text "Ziemba, & Darlene"

We then need to match the first telephone number:
([0-9]{3}\-[0-9]{4})\n

The character class [0-9] will match any numeric character.
{3} means there must be exactly 3 of the preceeding character.
\- matches the - character
[0-9]{4} matches exatcly 4 numeric characters.
Finally we match the newline characrer.

Therefore this regular expression will match:

"275-2949{NL}" and capture the text "275-2949"

We then need to match the teen telephone number:
(Teen ([0-9]{3}\-[0-9]{4})\n)?

This regular is similar to the one that matches the number however it is enclosed entirely in brackets. This is refrrered to as grouping. As the teen number may or may not appear we use the "?" to signify that there will be either zero or one of the preceeding grouped expression.

Therefore this regualr expression will match:

"Teen 345-2332{NL}" and capture the text "Teen 345-2332{NL}" and "345-2332"

Next to match the address:
(.+)\n

This will match a single line containing one or more characters and will therefore, match:

"2720 Overlook Cr.{NL}" and capture the text "2720 Overlook Cr."

Finally to match the lot number:
\(([0-9]+)\)

Note to match the barkets they have to be excaped. This expression will match one or more numberical characters enclosed in brakets:

"(2078)" will be matches and the text "2078" will be captured.

Finally we need to put the regular expression together. When we do we come up with this:
(.+)\n([0-9]{3}\-[0-9]{4})\n(Teen ([0-9]{3}\-[0-9]{4})\n)?(.+)\n\(([0-9]+)\)

This will match and entire string containing all the details in the format I mentioned above. If it does not then we can assume the data is corrupt.

The following script uses this regular expression and also the preg_match() function. I prefer the perl style regular expressions as they are a lot more powerful. The function is as follows:

preg_match (regex, string, matches);

The regex argument should be a valid regular expression enclosed in solidus' i.e "/regex/". The string is the text the regular expression is being applied to.

The matches argument is used to load the match and all the captured text into an array. i.e. if this expression were applied to the string I used above $matches would be an array in the following format:

Code:
$matches[0] => 
   "Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)"

$matches[1] => "Ziemba, & Darlene"
$matches[2] =>  "275-2949"
$matches[3] =>  "Teen 345-2332"
$matches[4] => "345-2332"
$matches[5] => "2720 Overlook Cr."
$matches[6] => "(2078)"
I hope this script is what you a looking for. If not then hopfully you can modify the regular expression to fit your needs:
PHP Code:
<?php

/* load in some test data */
$rawdata = Array (
"Zeller, Charles & Eve
453-5432
2509 Jefferson Road
(265)"
,

"Ziemba, & Darlene
275-2949
Teen 345-2332
2720 Overlook Cr.
(2078)"
,

"Zimmer, Richard
255-2600
1528 Trumball Terrace
(4333)"
);

for(
$i=0$i 3$i++) { /* N.B: don't enclose numbers in quotes unless they are text - it confuses things */

  
$string $rawdata[$i];

  
/* this regular expression matches an entire $string in the following format:
   *
   * name[NL]
   * number[NL] (number must be in format xxx-xxxx)
   * Teen teen[NL] (optional - if present must be in format xxx-xxxx)
   * address[NL]
   * lot[NL] (must be in format (:number:))
   *
   * where [NL] = newline character
   *
   * If the data in $string fails to conform to this format it will be skipped.
   * The data collected from this expression is loaded into the array $matches by the preg_match() function
   */

  
if (! preg_match ("/(.+)\n([0-9]{3}\-[0-9]{4})\n(Teen ([0-9]{3}\-[0-9]{4})\n)?(.+)\n\(([0-9]+)\)/"$string$matches))
    continue;

  
/* load the collected data into the $data array */
  /* N.B index names should be included in quotes */
  
$data[$i]['name'] = $matches[1];
  
$data[$i]['number'] = $matches[2];
  
$data[$i]['teen'] = $matches[4];
  
$data[$i]['address'] = $matches[5];
  
$data[$i]['lot'] = $matches[6];

}

/* for demonstration purposes dump the $data array variable */
print_r ($data);
?><?php
$rawdata 
= Array ("Zeller, Charles & Eve
453-5432
2509 Jefferson Road
(265)"
,
   * If 
the data in $string fails to conform to this format it will be skipped.
   * 
The data collected from this expression is loaded into the array $matches by the preg_match() function
   */

  if (! 
preg_match ("/(.+)\n([0-9]{3}\-[0-9]{4})\n(Teen ([0-9]{3}\-[0-9]{4})\n)?(.+)\n\(([0-9]+)\)/"$string$matches))
    continue;

  
/* load the collected data into the $data array */
  
$data[$i]['name'] = $matches[1];
  
$data[$i]['number'] = $matches[2];
  
$data[$i]['teen'] = $matches[4];
  
$data[$i]['address'] = $matches[5];
  
$data[$i]['lot'] = $matches[6];

}

/* for demonstration purposes dump the $data array variable */
print_r ($data);
?>

Last edited by codedv; 12-28-2003 at 01:31 PM.
 
Old 12-28-2003, 09:06 PM   #5
jhrbek
LQ Newbie
 
Registered: Dec 2003
Distribution: RedHat 8
Posts: 17

Original Poster
Rep: Reputation: 0
Thanks for the detailed answer! Some of the things you mentioned I didn't know about and others, I forgot about because I grew lazy with my coding practices.

I tried using your code and I met it with limited success. For some reason the long regexp statement is evaluating everything as false, so I get a big empty array. Better than before though, at least it is formatted correctly. BTW, I had no idea you could put all those regexp cases together like that, thanks for showing that to me!

I think my lines are not terminated with a \n though. The original text file came from a windows machine. So, is there an easy way to determine how my EOL's are marked? I think that is why the regexp statement is failing.

-j
 
Old 12-29-2003, 02:22 AM   #6
codedv
Member
 
Registered: Nov 2003
Location: Slough, UK
Distribution: Debian
Posts: 146

Rep: Reputation: 15
That may well be the reson. On windows lines are terminated with a carridge return and linefeed "\r\n" in text files. PHP should really map \r\n into \n but if you are opening the file in binary mode it won't. You could always modify the regex to allow for Windows style EOL's though. Every time you check for a newline, check for an optional \r as well:

\r?\n - will do it
 
Old 12-31-2003, 10:23 AM   #7
jhrbek
LQ Newbie
 
Registered: Dec 2003
Distribution: RedHat 8
Posts: 17

Original Poster
Rep: Reputation: 0
Still stuck.

I just don't know why this isn't working. I modified your code to see what was happening and the preg match is evaluating to false for everything. Not sure why. I attached my entire script this time.

Code:
<?php

 // vars and data source
 $filename = "beaverlake.txt";
 $fsize = filesize($filename);
 $handle = fopen ($filename, "rt");

 while (!feof ($handle)) {
  $rawdata[] = fgets($handle, $fsize);
 }
 // Close our file handle
  fclose ($handle);

//print_r($rawdata);

$arrSize = count($rawdata);

for($i=0; $i < 10; $i++) {

  $string = $rawdata[$i];
//  echo "STRING: $string\n";

if (!preg_match ("/(.+)\r?\n([0-9]{3}-[0-9]{4})\r?\n(Teen ([0-9]{3}-[0-9]{4})\r?\n)?(.+)\r?\n(([0-9$
{  echo "STRING: $string\n"; continue; }

  print_r($matches);

  $data[$i]['name'] = $matches[1];
  $data[$i]['number'] = $matches[2];
  $data[$i]['teen'] = $matches[4];
  $data[$i]['address'] = $matches[5];
  $data[$i]['lot'] = $matches[6];

}
print_r($data);
?>
Also, if you want the entire datafile, I can email it to you. Just send me a message at jhrbek-postfix at gplsinc dot com

Last edited by jhrbek; 12-31-2003 at 10:25 AM.
 
Old 12-31-2003, 10:39 AM   #8
codedv
Member
 
Registered: Nov 2003
Location: Slough, UK
Distribution: Debian
Posts: 146

Rep: Reputation: 15
Entire data file????

Are you not loading each address into an array called $rawdata. Each array item being an address? Or are you loading the entire file into one variable?
 
Old 12-31-2003, 10:44 AM   #9
jhrbek
LQ Newbie
 
Registered: Dec 2003
Distribution: RedHat 8
Posts: 17

Original Poster
Rep: Reputation: 0
I loop through the file, save it to an array ($rawdata). Then I close the file. If you uncomment the print_r($rawdata) you will see all the data is there, in a big one dimensional array...well, probably not as big for you. My data file has around 1,500 lines, so I have a big array. I originally wanted to avoid storing an array all together and process the data line by line, but problems necessitated an array so I could see the data better.

I tried running trim() on each line, appending a \n to the end of it and then saving it to an array, but that didn't work either. I thought trim would kill any "unknown" characters that might be causing problems, it didn't. I even tried putting commas on the end of each line in the array after I trimmed it and still nothing. (of course modifying the regexp pattern to look for commas instead of \r\n or \n)

Last edited by jhrbek; 12-31-2003 at 10:53 AM.
 
Old 12-31-2003, 11:56 AM   #10
codedv
Member
 
Registered: Nov 2003
Location: Slough, UK
Distribution: Debian
Posts: 146

Rep: Reputation: 15
Looking at your code - you use fgets() to read a single line at a time from the file. You insert the data read by this line into an array item of $rawdata[]. This means that each single element of your array will contain only one line of the file. Not an entire address.

You need to have a method of identifying the beginning and end of an address from the file.
 
Old 12-31-2003, 11:58 AM   #11
jhrbek
LQ Newbie
 
Registered: Dec 2003
Distribution: RedHat 8
Posts: 17

Original Poster
Rep: Reputation: 0
I tried using commas, but that didn't work. I'll have to think about it some more. Hmm.
 
Old 12-31-2003, 12:00 PM   #12
codedv
Member
 
Registered: Nov 2003
Location: Slough, UK
Distribution: Debian
Posts: 146

Rep: Reputation: 15
it needs to be a character or word, which will not appear in the address. Something like:

<end>

or

#

or even easier separate each address with two blank lines.
 
Old 12-31-2003, 12:08 PM   #13
jhrbek
LQ Newbie
 
Registered: Dec 2003
Distribution: RedHat 8
Posts: 17

Original Poster
Rep: Reputation: 0
I'm going to try something else, this isn't working. The delimiters you suggested I have already tried.
 
Old 12-31-2003, 12:24 PM   #14
codedv
Member
 
Registered: Nov 2003
Location: Slough, UK
Distribution: Debian
Posts: 146

Rep: Reputation: 15
Ok then. If you are still stuck send me an email wiith the file attached- www.sccode.com - use the contact me link and I'll see if I can sort it out.
 
Old 12-31-2003, 12:26 PM   #15
jhrbek
LQ Newbie
 
Registered: Dec 2003
Distribution: RedHat 8
Posts: 17

Original Poster
Rep: Reputation: 0
cool, thanks. I'll try to crack it myself for awhile longer but this is going on a week and I'm just feeling really retarded right now.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
C Loops and switch statements ripwheels8 Programming 4 11-13-2004 04:47 PM
if statements and case statements not working in bourne shell script mparkhurs Programming 3 06-12-2004 02:41 AM
PHP if statements and variables antken Programming 4 09-24-2003 12:45 PM
Using egrep Barbarian Programming 5 10-20-2002 02:54 PM
IF statements in PHP antken Programming 6 09-17-2002 09:38 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:43 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration