LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-09-2011, 08:16 AM   #1
ghantauke
Member
 
Registered: Nov 2010
Posts: 114

Rep: Reputation: 6
Where can I get Wall Street Journal Penn Treebank for free LEGALLY?


Also the plain corpus if possible.
Thanks in advance.

Last edited by ghantauke; 03-11-2011 at 12:15 PM. Reason: too many people misunderstanding (rather accusing me)
 
Old 03-09-2011, 04:12 PM   #2
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,622

Rep: Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963
Quote:
Originally Posted by ghantauke View Post
Also the plain corpus if possible.
Thanks in advance.
You Google for it, and follow the links....I found much with a quick search, have you tried that?

Also, if you're talking about a piece of commercial software, you should PAY FOR IT...no one here is going to help you steal.
 
Old 03-09-2011, 04:15 PM   #3
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 335Reputation: 335Reputation: 335Reputation: 335
Quote:
Originally Posted by ghantauke View Post
Also the plain corpus if possible.
Thanks in advance.
Your other four threads suggest that you already have it in source form.
 
Old 03-10-2011, 08:41 PM   #4
ghantauke
Member
 
Registered: Nov 2010
Posts: 114

Original Poster
Rep: Reputation: 6
Quote:
Originally Posted by TB0ne View Post
You Google for it, and follow the links....I found much with a quick search, have you tried that?

Also, if you're talking about a piece of commercial software, you should PAY FOR IT...no one here is going to help you steal.
1. Its not a software.
2. Yes I tried googling it (obviously) and didn't find it for free hence the thread here. Please do some research before you state something as a fact.


Quote:
Originally Posted by stress_junkie View Post
Your other four threads suggest that you already have it in source form.
Thanks for the reply.

Last edited by ghantauke; 03-11-2011 at 01:35 PM. Reason: Just so people won't get confused with what I actually did with the .crp file.
 
0 members found this post helpful.
Old 03-11-2011, 08:39 AM   #5
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 26,622

Rep: Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963Reputation: 7963
Quote:
Originally Posted by ghantauke View Post
1. Its not a software.
Oh? Databases and other such things ARE software.
Quote:
2. Yes I tried googling it (obviously) and didn't find it for free hence the thread here. Please do some research before you state something as a fact.
Didn't try too hard, I guess. I get 110,000 hits just by putting in "penn treebank", with the first four links having alot of what you're looking for. How about YOU doing some research before you state something as fact??
Quote:
Thanks for the reply but heres the problem.
The file I mentioned in that thread is in .crp format which can only be used with tgrep (which is an older version of tgrep2). I tried converting the file with the tgrep2 -p command but it gives me the error "ERROR: Tree 1 doesn't start with (.". Therefore I want the source in .t2c format.
Since you've looked at the documentation and done such extensive research, you've probably come across the exact syntax you need to convert the files.
http://docs.google.com/viewer?a=v&q=...g8A5xsEMMQozPw

According to the tgrep2 man pages, you have to use a combination of tgrep and tgrep2 to convert the files. Did you read/search the pages?
 
Old 03-11-2011, 09:25 AM   #6
ghantauke
Member
 
Registered: Nov 2010
Posts: 114

Original Poster
Rep: Reputation: 6
Quote:
Originally Posted by TB0ne View Post
Oh? Databases and other such things ARE software.

Didn't try too hard, I guess. I get 110,000 hits just by putting in "penn treebank", with the first four links having alot of what you're looking for. How about YOU doing some research before you state something as fact??

Since you've looked at the documentation and done such extensive research, you've probably come across the exact syntax you need to convert the files.
http://docs.google.com/viewer?a=v&q=...g8A5xsEMMQozPw

According to the tgrep2 man pages, you have to use a combination of tgrep and tgrep2 to convert the files. Did you read/search the pages?
Please do point out any one of those 110,000 links that actually lets you download the wall street journal in penn treebank form for free. Having 110,000 usless links and 1 useful link are two completely different things. I'll correct myself. "Please do some 'proper' research" before you state something.

About the documentation, thats a good advice which I appreciate. I have already had a look at it and the manual says you need to have tgrep command installed to change the format of a tgrep (.crp) file to tgrep2 file (.t2c). Unfortunately, I cannot install tgrep in my machine as its outdated and has a lot of bugs in the installing process which I have spent days to try and debug to no avail. I have started a thread concerning that but I gave up debugging it because its too much trouble. This thread is for the alternative approach.
 
0 members found this post helpful.
Old 03-11-2011, 10:00 AM   #7
djsmiley2k
Member
 
Registered: Feb 2005
Location: Coventry, UK
Distribution: Home: Gentoo x86/amd64, Debian ppc. Work: Ubuntu, SuSe, CentOS
Posts: 343
Blog Entries: 1

Rep: Reputation: 72
I find, if you can't get something from 11,000 links for free, its not meant to be there for free.
 
1 members found this post helpful.
Old 03-11-2011, 10:23 AM   #8
ghantauke
Member
 
Registered: Nov 2010
Posts: 114

Original Poster
Rep: Reputation: 6
Quote:
Originally Posted by djsmiley2k View Post
I find, if you can't get something from 11,000 links for free, its not meant to be there for free.
I did manage to get it in .crp format for free so there's a good chance that its out there in a different format too. The only question is where exactly. Appreciate your view though.
As for everyone out there who's going to post a reply please try and give a better answer than just "google it" as no one in their right mind would be wasting their time here if they didn't do that already.

Last edited by ghantauke; 03-11-2011 at 10:27 AM.
 
1 members found this post helpful.
Old 03-11-2011, 10:24 AM   #9
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
Quote:
Originally Posted by ghantauke View Post
This thread is for the alternative approach.
And the alternative approach would be?
Quote:
Originally Posted by djsmiley2k View Post
I find, if you can't get something from 11,000 links for free, its not meant to be there for free.
Exactly. The "alternative approach" is to steal it. To do that, you don't come here; you put on your hip boots and wallow around in the muck of warez sites and such. This is not such a site.

Oh. And. Be aware that when I searched for Wall Street Journal Penn Treebank, I found that the third (with duckduckgo) and the fifth (with google) entry was this question: "Where can I get Wall Street Journal Penn Treebank for free?" And yes, they point right to this thread. If you're going to steal something, you need to learn to be more discreet. Lawsuits, or worse, can cost a little.

Just sayin'.
 
Old 03-11-2011, 10:25 AM   #10
dugan
LQ Guru
 
Registered: Nov 2003
Location: Canada
Distribution: distro hopper
Posts: 11,219

Rep: Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309Reputation: 5309
Everything is "out there for free." That doesn't mean it's legal.

Last edited by dugan; 03-11-2011 at 10:43 AM. Reason: corrected typo: "its -> "is"
 
1 members found this post helpful.
Old 03-11-2011, 10:33 AM   #11
ghantauke
Member
 
Registered: Nov 2010
Posts: 114

Original Poster
Rep: Reputation: 6
Quote:
Originally Posted by dugan View Post
Everything its "out there for free." That doesn't mean it's legal.
I did manage to get the .crp file "legally" for free. Everything thats "out there for free" doesn't mean its illegal.
 
0 members found this post helpful.
Old 03-11-2011, 10:53 AM   #12
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
Quote:
Originally Posted by ghantauke View Post
I did manage to get the .crp file "legally" for free.
That's quite possible, if you got it from an acquaintance. You might not have committed a crime, but it's almost certain that if you got it this way, your acquaintance has violated the terms of the license under which he got it.
 
Old 03-11-2011, 11:10 AM   #13
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
Change of title

Let the record show that ghantauke has edited the title of the thread. The old title:
Quote:
Where can I get Wall Street Journal Penn Treebank for free?
The new title:
Quote:
Where can I get Wall Street Journal Penn Treebank for free LEGALLY?
It's clear he's becoming a little nervous. His justification for editing his original post (edit time: 8:32AM PST) is "too many people misunderstanding". Changing the title won't help the participants in the thread understand better; it will just change the search results so he's less likely to be caught.

No matter. He's made such a vigorous defense of the legality of what he's doing that now he has me curious. I've sent electronic mail to Daniel Bernard, Digital Product Chief, The Wall Street Journal Digital Network, with a link to this thread. If I hear back from him, I'll convey the results.
 
Old 03-11-2011, 11:22 AM   #14
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278

Rep: Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694Reputation: 1694
Quote:
Originally Posted by wje_lq View Post
Let the record show that ghantauke has edited the title of the thread. The old title:

The new title:

It's clear he's becoming a little nervous. His justification for editing his original post (edit time: 8:32AM PST) is "too many people misunderstanding". Changing the title won't help the participants in the thread understand better; it will just change the search results so he's less likely to be caught.

No matter. He's made such a vigorous defense of the legality of what he's doing that now he has me curious. I've sent electronic mail to Daniel Bernard, Digital Product Chief, The Wall Street Journal Digital Network, with a link to this thread. If I hear back from him, I'll convey the results.
Now that I've read this, I'm hesitant to review anything negatively or complain about the FCC or anything like that because it seems that anything I/we say will be reported back the the party in question by corporate narcs.

I love Microsoft. All hail FCC! Go Riaa.
 
0 members found this post helpful.
Old 03-11-2011, 11:43 AM   #15
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
Quote:
Originally Posted by szboardstretcher View Post
Now that I've read this, I'm hesitant to review anything negatively or complain about the FCC or anything like that because it seems that anything I/we say will be reported back the the party in question by corporate narcs.

I love Microsoft. All hail FCC! Go Riaa.
Oh, piffle. He wasn't reviewing anything negatively or complaining. He made a request for information about action which would be of dubious legality at best. If he's right, he has nothing to worry about. If he's wrong, then he shouldn't be bull****ting us in the first place. I don't go around playing sheriff, but I hate to be bull****ted. If WSJ even bothers to respond (which they probably won't), we'll find out whether
  1. he has nothing to worry about, or
  2. he's been bull****ting us.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Same native and tagged vlan possible? acid_kewpie Linux - Networking 0 01-12-2011 04:04 AM
LXer: KOffice 2.0.0 tagged for release LXer Syndicated Linux News 0 05-28-2009 05:30 PM
Install debian on a encrypted usb penn gabsik Linux - General 3 04-09-2008 07:19 AM
OMfG TAGGED Lebanese Disease General 2 08-17-2005 03:41 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 09:56 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration