LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Software to parsing date and address (https://www.linuxquestions.org/questions/linux-software-2/software-to-parsing-date-and-address-611326/)

MikeyCarter 01-05-2008 09:53 AM

Software to parsing date and address
 
I've got an interesting challenge.

I get text ads which have a date and address somewhere in the body. The date could be anything. 01/01/08 or Sat 5th or Sat Jan 5th or Sat & Sun 5 & 6... you name it.


Currently I've been maintaining a php script which looks for address and date patterns. It's about 80% accurate but must be monitored closely. I'm thinking of redesigning it with some type of AI behind it.

Before I went coding I just wanted to check here to see if anyone knew of some linux software which did this (or part of it) already.

PatrickNew 01-06-2008 06:58 PM

How does your current script work? Seems to me that regular expressions might be the tool you need. Just google up on them and you'll find all you need.

MikeyCarter 01-07-2008 09:13 AM

Quote:

Originally Posted by PatrickNew (Post 3013751)
How does your current script work? Seems to me that regular expressions might be the tool you need. Just google up on them and you'll find all you need.

Currently works by regular expressions. About 60 of them all ranked. I even have check against the current date. (ie if the ad says Sun) is that this Sunday or last Sunday. Or Sun 6. There is a high degree of probability that there is only one or two Sun 6th in the given year. (At least within the range of a few months.)

The problem is I always get these ads where there is a slight deviation to the pattern. (not to mention spelling mistakes)


(ie Orangeville, 57 Broadway vs Orangeville., 57th Broadway. I have to include the [.,]{1,2} and [t]?[h]? and also filter in case someone has squished the number to the street name, which starts with th.)


Hence my question. If anyone knows of software which currently does the job. So I'm not re-inventing the wheel.

jlinkels 01-07-2008 02:15 PM

--Removed--

Now I see what you mean.. this is not user input this is some kind of pattern finding.

jlinkels


All times are GMT -5. The time now is 08:52 AM.