LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Where can I get Wall Street Journal Penn Treebank for free LEGALLY? (https://www.linuxquestions.org/questions/programming-9/where-can-i-get-wall-street-journal-penn-treebank-for-free-legally-867468/)

ghantauke 03-09-2011 08:16 AM

Where can I get Wall Street Journal Penn Treebank for free LEGALLY?
 
Also the plain corpus if possible.
Thanks in advance.

TB0ne 03-09-2011 04:12 PM

Quote:

Originally Posted by ghantauke (Post 4284198)
Also the plain corpus if possible.
Thanks in advance.

You Google for it, and follow the links....I found much with a quick search, have you tried that?

Also, if you're talking about a piece of commercial software, you should PAY FOR IT...no one here is going to help you steal.

stress_junkie 03-09-2011 04:15 PM

Quote:

Originally Posted by ghantauke (Post 4284198)
Also the plain corpus if possible.
Thanks in advance.

Your other four threads suggest that you already have it in source form.

ghantauke 03-10-2011 08:41 PM

Quote:

Originally Posted by TB0ne (Post 4284745)
You Google for it, and follow the links....I found much with a quick search, have you tried that?

Also, if you're talking about a piece of commercial software, you should PAY FOR IT...no one here is going to help you steal.

1. Its not a software.
2. Yes I tried googling it (obviously) and didn't find it for free hence the thread here. Please do some research before you state something as a fact.


Quote:

Originally Posted by stress_junkie (Post 4284749)
Your other four threads suggest that you already have it in source form.

Thanks for the reply.

TB0ne 03-11-2011 08:39 AM

Quote:

Originally Posted by ghantauke (Post 4286318)
1. Its not a software.

Oh? Databases and other such things ARE software.
Quote:

2. Yes I tried googling it (obviously) and didn't find it for free hence the thread here. Please do some research before you state something as a fact.
Didn't try too hard, I guess. I get 110,000 hits just by putting in "penn treebank", with the first four links having alot of what you're looking for. How about YOU doing some research before you state something as fact??
Quote:

Thanks for the reply but heres the problem.
The file I mentioned in that thread is in .crp format which can only be used with tgrep (which is an older version of tgrep2). I tried converting the file with the tgrep2 -p command but it gives me the error "ERROR: Tree 1 doesn't start with (.". Therefore I want the source in .t2c format.
Since you've looked at the documentation and done such extensive research, you've probably come across the exact syntax you need to convert the files.
http://docs.google.com/viewer?a=v&q=...g8A5xsEMMQozPw

According to the tgrep2 man pages, you have to use a combination of tgrep and tgrep2 to convert the files. Did you read/search the pages?

ghantauke 03-11-2011 09:25 AM

Quote:

Originally Posted by TB0ne (Post 4286969)
Oh? Databases and other such things ARE software.

Didn't try too hard, I guess. I get 110,000 hits just by putting in "penn treebank", with the first four links having alot of what you're looking for. How about YOU doing some research before you state something as fact??

Since you've looked at the documentation and done such extensive research, you've probably come across the exact syntax you need to convert the files.
http://docs.google.com/viewer?a=v&q=...g8A5xsEMMQozPw

According to the tgrep2 man pages, you have to use a combination of tgrep and tgrep2 to convert the files. Did you read/search the pages?

Please do point out any one of those 110,000 links that actually lets you download the wall street journal in penn treebank form for free. Having 110,000 usless links and 1 useful link are two completely different things. I'll correct myself. "Please do some 'proper' research" before you state something.

About the documentation, thats a good advice which I appreciate. I have already had a look at it and the manual says you need to have tgrep command installed to change the format of a tgrep (.crp) file to tgrep2 file (.t2c). Unfortunately, I cannot install tgrep in my machine as its outdated and has a lot of bugs in the installing process which I have spent days to try and debug to no avail. I have started a thread concerning that but I gave up debugging it because its too much trouble. This thread is for the alternative approach.

djsmiley2k 03-11-2011 10:00 AM

I find, if you can't get something from 11,000 links for free, its not meant to be there for free.

ghantauke 03-11-2011 10:23 AM

Quote:

Originally Posted by djsmiley2k (Post 4287049)
I find, if you can't get something from 11,000 links for free, its not meant to be there for free.

I did manage to get it in .crp format for free so there's a good chance that its out there in a different format too. The only question is where exactly. Appreciate your view though.
As for everyone out there who's going to post a reply please try and give a better answer than just "google it" as no one in their right mind would be wasting their time here if they didn't do that already.

wje_lq 03-11-2011 10:24 AM

Quote:

Originally Posted by ghantauke (Post 4287019)
This thread is for the alternative approach.

And the alternative approach would be?
Quote:

Originally Posted by djsmiley2k (Post 4287049)
I find, if you can't get something from 11,000 links for free, its not meant to be there for free.

Exactly. The "alternative approach" is to steal it. To do that, you don't come here; you put on your hip boots and wallow around in the muck of warez sites and such. This is not such a site.

Oh. And. Be aware that when I searched for Wall Street Journal Penn Treebank, I found that the third (with duckduckgo) and the fifth (with google) entry was this question: "Where can I get Wall Street Journal Penn Treebank for free?" And yes, they point right to this thread. If you're going to steal something, you need to learn to be more discreet. Lawsuits, or worse, can cost a little.

Just sayin'.

dugan 03-11-2011 10:25 AM

Everything is "out there for free." That doesn't mean it's legal.

ghantauke 03-11-2011 10:33 AM

Quote:

Originally Posted by dugan (Post 4287077)
Everything its "out there for free." That doesn't mean it's legal.

I did manage to get the .crp file "legally" for free. Everything thats "out there for free" doesn't mean its illegal.

wje_lq 03-11-2011 10:53 AM

Quote:

Originally Posted by ghantauke (Post 4287083)
I did manage to get the .crp file "legally" for free.

That's quite possible, if you got it from an acquaintance. You might not have committed a crime, but it's almost certain that if you got it this way, your acquaintance has violated the terms of the license under which he got it.

wje_lq 03-11-2011 11:10 AM

Change of title
 
Let the record show that ghantauke has edited the title of the thread. The old title:
Quote:

Where can I get Wall Street Journal Penn Treebank for free?
The new title:
Quote:

Where can I get Wall Street Journal Penn Treebank for free LEGALLY?
It's clear he's becoming a little nervous. His justification for editing his original post (edit time: 8:32AM PST) is "too many people misunderstanding". Changing the title won't help the participants in the thread understand better; it will just change the search results so he's less likely to be caught.

No matter. He's made such a vigorous defense of the legality of what he's doing that now he has me curious. I've sent electronic mail to Daniel Bernard, Digital Product Chief, The Wall Street Journal Digital Network, with a link to this thread. If I hear back from him, I'll convey the results.

szboardstretcher 03-11-2011 11:22 AM

Quote:

Originally Posted by wje_lq (Post 4287117)
Let the record show that ghantauke has edited the title of the thread. The old title:

The new title:

It's clear he's becoming a little nervous. His justification for editing his original post (edit time: 8:32AM PST) is "too many people misunderstanding". Changing the title won't help the participants in the thread understand better; it will just change the search results so he's less likely to be caught.

No matter. He's made such a vigorous defense of the legality of what he's doing that now he has me curious. I've sent electronic mail to Daniel Bernard, Digital Product Chief, The Wall Street Journal Digital Network, with a link to this thread. If I hear back from him, I'll convey the results.

Now that I've read this, I'm hesitant to review anything negatively or complain about the FCC or anything like that because it seems that anything I/we say will be reported back the the party in question by corporate narcs.

I love Microsoft. All hail FCC! Go Riaa.

wje_lq 03-11-2011 11:43 AM

Quote:

Originally Posted by szboardstretcher (Post 4287129)
Now that I've read this, I'm hesitant to review anything negatively or complain about the FCC or anything like that because it seems that anything I/we say will be reported back the the party in question by corporate narcs.

I love Microsoft. All hail FCC! Go Riaa.

Oh, piffle. He wasn't reviewing anything negatively or complaining. He made a request for information about action which would be of dubious legality at best. If he's right, he has nothing to worry about. If he's wrong, then he shouldn't be bull****ting us in the first place. I don't go around playing sheriff, but I hate to be bull****ted. If WSJ even bothers to respond (which they probably won't), we'll find out whether
  1. he has nothing to worry about, or
  2. he's been bull****ting us.


All times are GMT -5. The time now is 03:14 PM.