Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
|
01-12-2011, 12:41 AM
|
#31
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
Quote:
Originally Posted by nobtiba
3) So I just try the very simple one:
bin/opennlp POSTagger en-pos-maxent.bin
|
Yeah, it's waiting for you to type words as input. Try like this:
Code:
echo "some words to tag" | bin/opennlp POSTagger en-pos-maxent.bin
It should just output something like
Code:
some_XYZ words_XYZ to_XYZ tag_XYZ
but with some other uppercase letters instead of XYZ.
If that does not work, opennlp does not work.
If en-pos-maxent.bin file is not in the working directory, or opennlp is not in bin/ subdirectory, you need to adjust the command to include the correct paths to those two files.
You might use the command locate opennlp en-pos-maxent.bin to see where they actually are.
_________________________________________________
Quote:
Originally Posted by nobtiba
I have to find out what is wrong here first (do you have any suggestion to point it out ?). Please bear with me, I am trying hard and get back to you asap.
|
Create a new directory, and save these files from my previous posts: test-opennlp, check.awk, words.tagged, and input.xml.
Then run
Code:
sed -e 's|^\r||g; s|\r$||g; s|\r|\n|g;' -i check.awk input.xml test-opennlp words.tagged
chmod u+x check.awk test-opennlp
to fix any newline issues from Pastebin, and to allow script execution.
If I do the above, and run
Code:
./check.awk input.xml
I get
Code:
<example>
<input>
<id>abcdef</id>
<positive>0.0%</positive>
<neutral>0.0%</neutral>
<negative>0.0%</negative>
</input>
<input>
<id>12345678</id>
<positive>0.0%</positive>
<neutral>0.0%</neutral>
<negative>100.0%</negative>
</input>
<input>
<id>1234</id>
<positive>0.0%</positive>
<neutral>0.0%</neutral>
<negative>0.0%</negative>
</input>
</example>
Do you get the same output? The percentages may vary, but they should usually be like that.
(The test-opennlp just produces random tags, so this is not yet useful.)
If you do, then replace ./test-opennlp with opennlp POSTagger en-pos-maxent.bin in check.awk, and try the same command again. This time it should produce valid output.
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 02:31 AM.
|
|
|
01-12-2011, 01:42 AM
|
#32
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
Hey Nominal ... thanks for the information on opennlp, makes more sense now.
Once we all agree on the format of the dictionary (his previous one seems to have more details but not sure where it all came from) I am happy
to update my script as a solution.
|
|
|
01-12-2011, 02:39 AM
|
#33
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
You're welcome, grail.
I wonder if you might test the one-awk-script-only solution I put in message #29 in this thread?
It's certainly not the most elegant solution, and the code is quite verbose, but I tried to write working understandable "example" code.
Looking at the entire thread, it's quite funny how this thing has evolved!
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 02:08 AM.
|
|
|
01-12-2011, 03:06 AM
|
#34
|
Member
Registered: Jan 2007
Posts: 40
Original Poster
Rep:
|
Quote:
Originally Posted by Nominal Animal
Yeah, it's waiting for you to type words as input. Try like this:
Code:
echo "some words to tag" | bin/opennlp POSTagger en-pos-maxent.bin
It should just output something like
Code:
some_XYZ words_XYZ to_XYZ tag_XYZ
but with some other uppercase letters instead of XYZ.
If that does not work, opennlp does not work Nominal Animal
|
Yes, I know if normally, bin/opennlp POSTagger en-pos-maxent.bin will wait for me to type words as input, now it return immediately, dont know it gone wrong from when... opennlp does not work
Quote:
Originally Posted by Nominal Animal
Create a new directory, and save these files from my previous posts: test-opennlp, check.awk, words.tagged, and input.xml.
Then run
Code:
sed -e 's|^\r||g; s|\r$||g; s|\r|\n|g;' -i check.awk input.xml test-opennlp words.tagged
chmod u+x check.awk test-opennlp
to fix any newline issues from Pastebin, and to allow script execution.
If I do the above, and run
Code:
./check.awk input.xml
I get
....
Do you get the same output? The percentages may vary, but they should usually be like that.
Nominal Animal
|
No, I don't get that output  I got this:
Code:
usr/bin/awk: syntax error at source line 37 source file ./check.awk
context is
print content >>> |& <<< classifier
/usr/bin/awk: illegal statement at source line 38 source file ./check.awk
/usr/bin/awk: illegal statement at source line 38 source file ./check.awk
How do you think about it ? 
|
|
|
01-12-2011, 05:18 AM
|
#35
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
@ Nominal - so finally got a local install opennlp working. Based on the following input file:
Code:
<example>
<input>
<id>abcdef</id>
<content> Taking
family
</content>
</input>
<input>
<id>12345678</id>
<content> friends
way
PureColor
</content>
</input>
<input>
<id>1234</id>
<content>unique </content>
</input>
</example>
My words file is the one I used previously where your words.tagged file I created with the following:
Code:
Taking_VBG positive
friends_NNS positive
PureColor_NN negative
unique_JJ neutral
This gives us the same results for the same words assuming the scripts works
Your output:
Code:
<example>
<input>
<id>abcdef</id>
<positive>100.0%</positive>
<neutral>0.0%</neutral>
<negative>0.0%</negative>
</input>
<input>
<id>12345678</id>
<positive>50.0%</positive>
<neutral>0.0%</neutral>
<negative>50.0%</negative>
</input>
<input>
<id>1234</id>
<positive>0.0%</positive>
<neutral>100.0%</neutral>
<negative>0.0%</negative>
</input>
</example>
My output:
Code:
<example>
<result>
<id>abcdef</id>
<negative>negative result 0</negative>
<neutral>neutral result 0</neutral>
<positive>positive result 1</positive>
</result>
<result>
<id>12345678</id>
<negative>negative result 1</negative>
<neutral>neutral result 0</neutral>
<positive>positive result 1</positive>
</result>
<result>
<id>1234</id>
<negative>negative result 0</negative>
<neutral>neutral result 1</neutral>
<positive>positive result 0</positive>
</result>
</example>
Apart from the percentages format the values are all equal for the counts. Your script alters the original indenting and does not convert the input tag into a result tag, but these are minor
changes.
One addition I had to make to the opennlp line is to add - 2>/dev/null - to the end otherwise the processing is displayed, eg:
Code:
Loading POS Tagger model ... done (1.172s)
Average: 1000.0 sent/s
Total: 4 sent
Runtime: 0.004s
<example>
<input>
<id>abcdef</id>
<positive>100.0%</positive>
<neutral>0.0%</neutral>
<negative>0.0%</negative>
</input>
Loading POS Tagger model ... done (1.183s)
Average: 1250.0 sent/s
Total: 5 sent
Runtime: 0.004s
<input>
<id>12345678</id>
<positive>50.0%</positive>
<neutral>0.0%</neutral>
<negative>50.0%</negative>
</input>
Loading POS Tagger model ... done (1.246s)
Average: 666.7 sent/s
Total: 2 sent
Runtime: 0.003s
<input>
<id>1234</id>
<positive>0.0%</positive>
<neutral>100.0%</neutral>
<negative>0.0%</negative>
</input>
</example>
|
|
1 members found this post helpful.
|
01-12-2011, 01:52 PM
|
#36
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
@nobtiba: Okay; you'll need to fix opennlp first.. perhaps you could try reinstalling it?
The syntax errors on line 37 and 38 of check.awk are because Mac awk does not support coroutines ( |&).
Switch to using a temporary file, by changing the start of the while loop in check.awk into
Code:
# Process content via opennlp
print content | classifier "> .temporary-content"
close(classifier "> .temporary-content")
while ((getline < ".temporary-content") > 0) {
and adding this after the while loop (just before the "Count known words" comment):
Code:
close(".temporary-content")
system("rm -f .temporary-content")
Does this get you the expected output?
@grail: Great, thanks! That means the scripts work, although testing is needed to find out if they work as designed or not 
At minimum, it's a working start for nobtiba to develop further.
To all readers of this thread:
I'd recommend any further development (other than fixing bugs and simplifying the current scripts) to switch to Python or command-line PHP (or whichever language nobtiba prefers) and a proper XML parser (e.g. minidom, simplexml). If it's a standalone script that calls opennlp directly for each content, in Python and PHP it will have about the same size and complexity as the check.awk script in this thread.
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 02:07 AM.
|
|
|
01-12-2011, 06:36 PM
|
#37
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
Quote:
Originally Posted by Nominal Animal
I'd recommend any further development (other than fixing bugs and simplifying the current scripts) to switch to Python or command-line PHP (or whichever language nobtiba prefers) and a proper XML parser (e.g. minidom, simplexml). If it's a standalone script that calls opennlp directly for each content, in Python and PHP it will have about the same size and complexity as the check.awk script in this thread.
|
Nice preempt before Sergei gets here ... LOL
|
|
|
01-13-2011, 12:13 AM
|
#38
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
Nobtiba and grail, I got crazy and rewrote the entire thing as a command-line PHP script (using /usr/bin/php as the interpreter). It uses a proper XML parser, SimpleXML (normally packaged with PHP). It's not bad, less than 250 lines. I tried to make the code readable, but it is still quite .. PHP.
SimpleXML really wants the <?XML version="1.0"?> at the beginning of the XML. The XMLs or the dictionary do not need to be local files; this script accepts URLs as well as file names.
Run this script without arguments or with -h or --help to see the usage. Option --longhelp will output a long and detailed description on its usage and functionality.
Code:
#!/usr/bin/php
<?PHP
$DEFAULT_DICTIONARY = "content-classifier.dict";
$DEFAULT_FILTERCMD = "opennlp POSTTagger en-pos-maxent.bin";
$DICTIONARY = @$_ENV["CONTENT_DICTIONARY"];
$FILTERCMD = @$_ENV["CONTENT_FILTER"];
if (strlen($DICTIONARY) < 1)
$DICTIONARY = $DEFAULT_DICTIONARY;
if (strlen($FILTERCMD) < 1)
$FILTERCMD = $DEFAULT_FILTERCMD;
/* Check command line parameter count.
*/
if ($argc < 2 || @$argv[1] == "-h" || @$argv[1] == "--help" || @$argv[1] == "--longhelp") {
fprintf(STDERR, "\n");
fprintf(STDERR, "Usage: %s [ -h | --help ]\n", $argv[0]);
fprintf(STDERR, " %s --longhelp\n", $argv[0]);
fprintf(STDERR, " %s XML-file-or-URL ...\n", $argv[0]);
fprintf(STDERR, "\n");
fprintf(STDERR, "The exit status will be nonzero if any errors are encountered.\n");
fprintf(STDERR, "\n");
if (@$argv[1] == "--longhelp") {
fprintf(STDERR, "This script reads XML content, with one or more\n");
fprintf(STDERR, "\t<input>\n");
fprintf(STDERR, "\t <id>...</id>\n");
fprintf(STDERR, "\t <content>...</content>\n");
fprintf(STDERR, "\t</input>\n");
fprintf(STDERR, "structures. The text in the content element is filtered through\n");
fprintf(STDERR, "an external program. All punctuation is removed after filtering,\n");
fprintf(STDERR, "then each word is looked up in a dictionary, with format\n");
fprintf(STDERR, "\t# This is a comment line\n");
fprintf(STDERR, "\tgoodword positive\n");
fprintf(STDERR, "\totherword neutral\n");
fprintf(STDERR, "\tbadword negative\n");
fprintf(STDERR, "Any other columns in the file are ignored.\n");
fprintf(STDERR, "When word 'WordS_EXT' is compared against the dictionary,\n");
fprintf(STDERR, "'WordS_EXT', 'WordS', 'words_EXT', and 'words' is looked up in order,\n");
fprintf(STDERR, "until a match is found. Words not in the dictionary are ignored.\n");
fprintf(STDERR, "\n");
fprintf(STDERR, "You can define the location of the dictionary file by setting\n");
fprintf(STDERR, "environment variable CONTENT_DICTIONARY to point to it; the default\n");
fprintf(STDERR, "value is '%s'.\n", $DEFAULT_DICTIONARY);
fprintf(STDERR, "\n");
fprintf(STDERR, "You can also define the filtering command by setting environment\n");
fprintf(STDERR, "variable CONTENT_FILTER to the desired command; the default\n");
fprintf(STDERR, "value is '%s'.\n", $DEFAULT_FILTERCMD);
fprintf(STDERR, "\n");
fprintf(STDERR, "The output of this script will contain\n");
fprintf(STDERR, "\t<example>\n");
fprintf(STDERR, "\t <result>\n");
fprintf(STDERR, "\t <id>...</id>\n");
fprintf(STDERR, "\t <positive>0.0%%</positive>\n");
fprintf(STDERR, "\t <neutral>0.0%%</neutral>\n");
fprintf(STDERR, "\t <negative>0.0%%</negative>\n");
fprintf(STDERR, "\t </result>\n");
fprintf(STDERR, "\t</example>\n");
fprintf(STDERR, "for each input structure, with percentages describing the fractions\n");
fprintf(STDERR, "of words found in the dictionary, belonging to each category, in\n");
fprintf(STDERR, "the XML content after filtering.\n");
fprintf(STDERR, "\n");
}
fprintf(STDERR, "This script is in public domain. Use it only at your own risk.\n");
fprintf(STDERR, "\n");
exit(1);
}
/* Generate the dictionary.
*/
$dict = array();
define('POSITIVES', 1); /* Integers for faster access */
define('NEUTRALS', 2);
define('NEGATIVES', 3);
$temp = @file($DICTIONARY);
if ($temp === FALSE) {
fprintf(stderr, "Cannot read dictionary, '%s'.\n", $DICTIONARY);
exit(1);
}
foreach ($temp as $line) {
$line = preg_replace("/[\t\n\v\f\r ]+/", " ", $line);
list($word, $type) = @explode(" ", trim(substr($line, 0, strcspn($line, "#;"))) . " ");
if (strlen($word) < 1) continue;
switch (strtolower($type)) {
case 'positive': $dict[$word] = POSITIVES; break;
case 'neutral': $dict[$word] = NEUTRALS; break;
case 'negative': $dict[$word] = NEGATIVES; break;
}
}
unset($temp, $line, $word, $type);
/* Exit status is nonzero if any errors.
*/
$status = 0;
$records = 0;
/* Loop processing each XML input file or URL at a time.
*/
for ($arg = 1; $arg < $argc; $arg++) {
/* Load the XML.
*/
$xml = @simplexml_load_file($argv[$arg]);
if ($xml === FALSE) {
fprintf(STDERR, "%s: Cannot load file or URL.\n", $argv[$arg]);
$status |= 1;
continue;
}
/* Loop over input elements.
*/
for ($index = 0; @$xml->input[$index] !== NULL; $index++) {
$node = $xml->input[$index];
$id = trim(@$node->id, "\t\n\v\f\r ");
/* Execute the filter command.
*/
$pipe = array( 0=>NULL, 1=>NULL, 2=>STDERR );
$filter = @proc_open($FILTERCMD, array(
0=>array("pipe", "r"),
1=>array("pipe", "w"),
2=>STDERR ), $pipe);
if ($filter === FALSE) {
fprintf(STDERR, "Error executing filter '%s'.\n", $filter);
$status |= 2;
break(2);
}
/* We need nonblocking pipes. (Those would be nice in real life, too.)
*/
stream_set_blocking($pipe[0], 0);
stream_set_write_buffer($pipe[0], 0);
stream_set_blocking($pipe[1], 0);
/* Send $content to the filter, read $result.
*/
$content = trim(@$node->content) . "\n";
$response = "";
while (strlen($content) > 0) {
/* Try to send some more content.
*/
$w = fwrite($pipe[0], $content);
if ($w > 0)
$content = substr($content, $w);
/* See if there is anything to receive.
*/
$response .= fread($pipe[1], 8192);
}
/* We've written everything, so close the end of the pipe.
*/
@fflush($pipe[0]);
@fclose($pipe[0]);
/* Read the rest of the filter's response.
*/
@stream_set_blocking($pipe[1], 1);
while (!feof($pipe[1]))
$response .= fread($pipe[1], 8192);
/* Close the filter.
*/
$error = proc_close($filter);
if ($error !== 0) {
fprintf(STDERR, "Error executing filter '%s'.\n", $FILTERCMD);
$status |= 4;
break(2);
}
/* Clean up the response.
*/
$response = trim(preg_replace('/[\t\n\v\f\r ]+/', ' ', $response));
$response = preg_replace('/[^ \-_0-9A-Za-z\x80-\xFF]+/', '', $response);
$words = @explode(" ", $response);
/* Classify each word.
*/
$count = array( POSITIVES=>0, NEUTRALS=>0, NEGATIVES=>0 );
foreach ($words as $word) {
$base = substr($word, 0, strrpos($word, "_"));
if (array_key_exists($word, $dict))
$count[$dict[$word]]++;
else
if (array_key_exists($base, $dict))
$count[$dict[$base]]++;
else {
/* Try lower case.
*/
$base = strtolower($base);
$word = $base . substr($word, strlen($base));
if (array_key_exists($word, $dict))
$count[$dict[$word]]++;
else
if (array_key_exists($base, $dict))
$count[$dict[$base]]++;
}
}
/* Calculate relative percentages.
*/
$total = $count[POSITIVES] + $count[NEUTRALS] + $count[NEGATIVES];
if ($total > 0) $scale = 100.0 / $total;
else $scale = 0.0;
$positive = sprintf("%.1f%%", $scale * $count[POSITIVES]);
$neutral = sprintf("%.1f%%", $scale * $count[NEUTRALS]);
$negative = sprintf("%.1f%%", $scale * $count[NEGATIVES]);
$unknown = count($words) - $total;
/* Output stage.
* $id: Contents of the id element.
* $positive: Percentage of positively classified words.
* $neutral: Percentage of neutrally classified words.
* $negative: Percentage of negatively classified words.
*/
$records++;
if ($records == 1) {
/* First record, so output the XML header.
*/
echo '<?XML version="1.0" standalone="yes"?>', "\n";
echo "<example>\n";
}
echo "\t<result>\n";
echo "\t\t<id>", $id, "</id>\n";
echo "\t\t<positive>", $positive, "</positive>\n";
echo "\t\t<neutral>", $neutral, "</neutral>\n";
echo "\t\t<negative>", $negative, "</negative>\n";
echo "\t</result>\n";
}
unset($xml);
}
/* If there were any records output, output the XML trailer.
*/
if ($records > 0) {
echo "</example>\n";
}
/* Exit with nonzero status if there were any errors.
*/
exit($status);
?>
If you save the above script as content-classifier.php and allow it to be executed ( chmod a+x content-classifier.php), you can try it against the input.xml, words.tagged and test-opennlp already shown in this thread via
Code:
env CONTENT_DICTIONARY=words.tagged CONTENT_FILTER=./test-opennlp ./content-classifier.php input.xml
Cheers, Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 02:28 AM.
|
|
|
01-13-2011, 01:52 AM
|
#39
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
Well I am a php noob so I will default to you that this is all good Nominal
It does seem way complicated, even without all the help stuff, compared to our awk solutions, but I am told it is a good tool once you get
into it.
|
|
|
01-13-2011, 02:08 AM
|
#40
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
Quote:
Originally Posted by grail
It does seem way complicated
|
Most of the inner loop, from "Execute the filter" up to and including the "Close the filter" code block is just handling the external filter command as a coprocess. It'd be just a couple of lines if it used a temporary file.. I just like to avoid unnecessary temp files.
I do prefer Python 3 to PHP, but I don't like to deal with the differences between Python 2 and 3.
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 03:24 AM.
|
|
|
01-13-2011, 02:31 AM
|
#41
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
Quote:
I don't like to deal with the differences between Python 2 and 3.
|
I hear you on that one <sheesh> I mean get people to move on already ... (even though I know personally how tough it is to be backwards compatible)
|
|
|
01-14-2011, 03:27 AM
|
#42
|
Member
Registered: Jan 2007
Posts: 40
Original Poster
Rep:
|
Quote:
Originally Posted by Nominal Animal
@nobtiba: Okay; you'll need to fix opennlp first.. perhaps you could try reinstalling it?
The syntax errors on line 37 and 38 of check.awk are because Mac awk does not support coroutines ( |&).
Switch to using a temporary file, by changing the start of the while loop in check.awk into
Code:
# Process content via opennlp
print content | classifier "> .temporary-content"
close(classifier "> .temporary-content")
while ((getline < ".temporary-content") > 0) {
and adding this after the while loop (just before the "Count known words" comment):
Code:
close(".temporary-content")
system("rm -f .temporary-content")
Does this get you the expected output?
Nominal Animal
|
I changed and still got this error:
/usr/bin/awk: syntax error at source line 37 source file ./check.awk
context is
>>> ent" <<<
/usr/bin/awk: illegal statement at source line 38 source file ./check.awk
/usr/bin/awk: illegal statement at source line 38 source file ./check.awk
Either need to make more sufficient change or awk is really broken as I google smthing here:
http://www.daemonforums.org/showthread.php?t=4232
Your PHP is great, however I also have a PHP version working, then if I could have a java version, then it would be perfect. That is why I ask how to input a string to ./countnew, in the near future I think will have to rewrite ./countnew in java
Hope I can check all the script here soon. Meet so many errors. 
|
|
|
01-14-2011, 07:16 AM
|
#43
|
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Arch
Posts: 10,038
|
Unless you have now moved from your MAC to a SOLARIS I am not sure how this link is of any value??
So what does the following single line do for you:
Code:
awk 'BEGIN{classifier = "bin/opennlp POSTagger en-pos-maxent.bin";print "hello" | classifier "> test_output"}'
Remember that you will need to put in your paths to opennlp and en-pos-maxent.bin
|
|
|
01-14-2011, 11:19 AM
|
#44
|
Senior Member
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
|
Quote:
Originally Posted by nobtiba
I changed and still got this error:
/usr/bin/awk: syntax error at source line 37 source file ./check.awk
context is
>>> ent" <<<
/usr/bin/awk: illegal statement at source line 38 source file ./check.awk
/usr/bin/awk: illegal statement at source line 38 source file ./check.awk
|
Oh, my. I'd say your awk version is too different to mine for us to get the awk version to run properly for you too.
Quote:
Originally Posted by nobtiba
Your PHP is great, however I also have a PHP version working, then if I could have a java version, then it would be perfect. That is why I ask how to input a string to ./countnew, in the near future I think will have to rewrite ./countnew in java
|
So the PHP version worked for you too? That's good. In many ways it's a "proper" solution, while the awk script is just a limited version.
Java sounds like a good idea, if you know or want to learn Java. OpenNLP has a POS tagger API you can use directly in your Java program. Although you have a bit of extra work, in loading the POS tagger model, it should work even better than the PHP version. (If you distribute the derivative, you'll need to abide by either the LGPL license, or the Apache License V2.0; see the OpenNLP site and the Apache OpenNLP site.)
Nominal Animal
Last edited by Nominal Animal; 03-21-2011 at 03:14 AM.
|
|
1 members found this post helpful.
|
01-16-2011, 09:22 PM
|
#45
|
Member
Registered: Jan 2007
Posts: 40
Original Poster
Rep:
|
Quote:
Originally Posted by grail
Unless you have now moved from your MAC to a SOLARIS I am not sure how this link is of any value??
So what does the following single line do for you:
Code:
awk 'BEGIN{classifier = "bin/opennlp POSTagger en-pos-maxent.bin";print "hello" | classifier "> test_output"}'
Remember that you will need to put in your paths to opennlp and en-pos-maxent.bin
|
It will return this:
awk: syntax error at source line 1
context is
BEGIN{classifier = "bin/opennlp POSTagger en-pos-maxent.bin";print "hello" | classifier "> >>> test_output" <<<
awk: illegal statement at source line 1
(I use this command line in opennlp folder already, which means I used the correct path)
|
|
|
All times are GMT -5. The time now is 08:33 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|