Software to parsing date and address
I've got an interesting challenge.
I get text ads which have a date and address somewhere in the body. The date could be anything. 01/01/08 or Sat 5th or Sat Jan 5th or Sat & Sun 5 & 6... you name it. Currently I've been maintaining a php script which looks for address and date patterns. It's about 80% accurate but must be monitored closely. I'm thinking of redesigning it with some type of AI behind it. Before I went coding I just wanted to check here to see if anyone knew of some linux software which did this (or part of it) already. |
How does your current script work? Seems to me that regular expressions might be the tool you need. Just google up on them and you'll find all you need.
|
Quote:
The problem is I always get these ads where there is a slight deviation to the pattern. (not to mention spelling mistakes) (ie Orangeville, 57 Broadway vs Orangeville., 57th Broadway. I have to include the [.,]{1,2} and [t]?[h]? and also filter in case someone has squished the number to the street name, which starts with th.) Hence my question. If anyone knows of software which currently does the job. So I'm not re-inventing the wheel. |
--Removed--
Now I see what you mean.. this is not user input this is some kind of pattern finding. jlinkels |
All times are GMT -5. The time now is 08:52 AM. |