LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-15-2014, 10:02 AM   #1
dvadell
LQ Newbie
 
Registered: Aug 2010
Posts: 6

Rep: Reputation: 0
[Advice] on parsing natural text


Hi everybody,

I need some advice, as the title says, with parsing natural text. I don't even know if this is called natural text, but I mean human language. IE I need to transform:

at nine o'clock wake me up
at 9, wake me up
wake me up at eight thirty

to

hours: 9, min: 0, msg: wake me up
hours: 9, min: 0, msg: wake me up
hours: 8, min: 30, msg: wake me up

I started to do it with regexps, but I don't know enough about them. I found on the internet the following related solutions:

* using a Domain Specific Language (DSL). I don't see my problem as a DSL thou.
* Computer Science parsing: making a grammar, parser, lexer, BNF, etc.
* Recursive descent, which AFAIU means nesting regexps in a saner way.
* Studying regexp *as a language*, meaning, know what you are using. I know regexps have a bad reputation, but maybe because the tutorials on the internet just show them shallowly.

And that's it. I searched the forum, with no hits. I searched again with the "Click to find similar threads", got 5, none of them talk about natural language.

I'm looking for advice, like in "how would you approach the problem".

Thanks in advance!
-- Diego.
 
Old 08-15-2014, 11:43 AM   #2
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,223

Rep: Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320Reputation: 5320
Well, you could start by searching for the right phrase, which is "natural language processing" or "nlp".

This is not a topic I'm familiar with, but a quick search got me the following Youtube playlist for the video lectures from a Coursera course on NLP:

Columbia University - Natural Language Processing

I don't know what your favorite programming language is, but a Google search for "nlp library" got me the following for Python:

Natural Language Toolkit

And searching for that library on pyvideo.org got me the following videos:

Human as a Second Language: Succeeding with the Natural Language Toolkit
HOWTO: Teach Python to Read

I also saw this: Natural language processing: an introduction

Last edited by dugan; 08-15-2014 at 12:06 PM.
 
Old 08-15-2014, 11:53 AM   #3
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,140

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
Here's a thing I downloaded a couple of months ago but haven't had a chance to look at yet.

http://honnibal.wordpress.com/2013/1...dency-parsing/
 
1 members found this post helpful.
Old 08-15-2014, 08:18 PM   #4
BowCatShot
LQ Newbie
 
Registered: Aug 2014
Posts: 15

Rep: Reputation: Disabled
Look up the date command and see if it will do the conversion for you, after you've done the parsing. For example

echo $(date "+%r" --date="9")

gives

09:00:00 AM

echo $(date "+%l %M %S" --date="9")

gives

9 00 00
 
Old 08-15-2014, 09:27 PM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by dvadell View Post
* using a Domain Specific Language (DSL). I don't see my problem as a DSL thou.
You essentially want to implement a DSL, but that's a description of your problem not a technique to solve it.

Quote:
* Recursive descent, which AFAIU means nesting regexps in a saner way.
Recursive descent is a technique for implementing parsers.

It looks like the language you want to parse could described like this:
Code:
<words>* at <time> <words>*
Since you don't care about the structure of non-time part of the phrase, you can probably parse this with regular expressions. You'll need to think about all the different ways a time can be expressed. I think natural language processing techniques would be overkill, you're not interested in nouns vs verbs vs adverbs, subjects vs objects...
 
Old 08-22-2014, 09:11 AM   #6
dvadell
LQ Newbie
 
Registered: Aug 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Thanks to everyone!! ntubski, BowCatShot, smallpond, and the rest that I already have thanked. I find NPL overkill for this, and sincerely, I don't want to learn it I like regexes better, so I will go that way.

Cheers,
-- Diego.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Parsing text and combining the parsed text zeratul111 Linux - Newbie 6 10-28-2010 12:46 PM
Parsing text file sandeepsudeep Linux - Newbie 7 10-09-2007 05:34 AM
Need some advice on parsing and editing a text file MojoRising Linux - General 4 09-20-2006 03:48 AM
I need help parsing text from a text file rsmccain Linux - General 2 01-05-2006 02:43 PM
Text parsing question bruoersolitario Linux - General 4 04-15-2004 02:12 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration