ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I need some advice, as the title says, with parsing natural text. I don't even know if this is called natural text, but I mean human language. IE I need to transform:
at nine o'clock wake me up
at 9, wake me up
wake me up at eight thirty
to
hours: 9, min: 0, msg: wake me up
hours: 9, min: 0, msg: wake me up
hours: 8, min: 30, msg: wake me up
I started to do it with regexps, but I don't know enough about them. I found on the internet the following related solutions:
* using a Domain Specific Language (DSL). I don't see my problem as a DSL thou.
* Computer Science parsing: making a grammar, parser, lexer, BNF, etc.
* Recursive descent, which AFAIU means nesting regexps in a saner way.
* Studying regexp *as a language*, meaning, know what you are using. I know regexps have a bad reputation, but maybe because the tutorials on the internet just show them shallowly.
And that's it. I searched the forum, with no hits. I searched again with the "Click to find similar threads", got 5, none of them talk about natural language.
I'm looking for advice, like in "how would you approach the problem".
* using a Domain Specific Language (DSL). I don't see my problem as a DSL thou.
You essentially want to implement a DSL, but that's a description of your problem not a technique to solve it.
Quote:
* Recursive descent, which AFAIU means nesting regexps in a saner way.
Recursive descent is a technique for implementing parsers.
It looks like the language you want to parse could described like this:
Code:
<words>* at <time> <words>*
Since you don't care about the structure of non-time part of the phrase, you can probably parse this with regular expressions. You'll need to think about all the different ways a time can be expressed. I think natural language processing techniques would be overkill, you're not interested in nouns vs verbs vs adverbs, subjects vs objects...
Thanks to everyone!! ntubski, BowCatShot, smallpond, and the rest that I already have thanked. I find NPL overkill for this, and sincerely, I don't want to learn it I like regexes better, so I will go that way.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.