LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-12-2008, 06:45 PM   #1
jimbo1708
Member
 
Registered: Jan 2007
Location: Pennsylvania
Distribution: Ubuntu 8.10 Server/9.04 Desktop, openSUSE 11.1
Posts: 154

Rep: Reputation: 31
removing the best qualities of a pdf


Hello,

I am looking to make a pdf file generated by LaTeX harder for a computer to parse into plain text. One of my professors publishes all of our works to the internet and I don't want my papers to be "searchable", but I don't want them to be unreadable. Any Suggestions? The paper must be in pdf in the end and any process it goes through, must do a decent job of preserving quality of both text and images.

- Jim Bryan
 
Old 04-12-2008, 07:06 PM   #2
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 65
Forgive me if I mis-understand what you want. You have at least one don't which should be a do in the original post. I am making what at the moment seems like a reasonable assumption about what you mean. Probably I'm totally wrong. :-)

You can make a PDF with images of text instead of actual text, although it won't look anything like as good on a decent display. Also, wanting to publish but wanting your document to not be searchable...? If you think images of text are not searchable, you might be right for another few months or even a year or two, but omni-present OCR is on the way, so I wouldn't recommend counting on your images of text not getting indexed in the near future.

If you don't want people to be able to search your PDFs, don't publish them. Just publish an abstract and keep the PDFs away from search engines.
 
Old 04-12-2008, 07:09 PM   #3
dasy2k1
Member
 
Registered: Oct 2005
Location: 127.0.0.1
Distribution: Manjaro
Posts: 963

Rep: Reputation: 36
put an appropiate robots.txt file in the directory with the pdfs in
taht will stop most search engines indexing them
 
Old 04-12-2008, 07:13 PM   #4
jimbo1708
Member
 
Registered: Jan 2007
Location: Pennsylvania
Distribution: Ubuntu 8.10 Server/9.04 Desktop, openSUSE 11.1
Posts: 154

Original Poster
Rep: Reputation: 31
Thank you for your reply. Thats the right track, except that I am not that worried about the security of my document. I don't expect my pdf to be completely impossible to parse into plain text, but I just want someone to have to go an extra step to do it.

On another side note, I want the images to stay as images, I didn't expect them to be translated into ASCII text to anything. Thank you.

- Jim
 
Old 04-12-2008, 07:15 PM   #5
jimbo1708
Member
 
Registered: Jan 2007
Location: Pennsylvania
Distribution: Ubuntu 8.10 Server/9.04 Desktop, openSUSE 11.1
Posts: 154

Original Poster
Rep: Reputation: 31
dasy2k1,

I don't have access to the webserver, I can only control the pdf I submit.

- Jim
 
Old 04-12-2008, 08:01 PM   #6
beadyallen
Member
 
Registered: Mar 2008
Location: UK
Distribution: Fedora, Gentoo
Posts: 209

Rep: Reputation: 36
Well one way to do it is by using imagemagick 'convert'. You could take the original PDF, batch convert each page into a png (whether directly or via postscript (I dunno if convert will handle pdf's directly, you may need pdf2ps). Then just do the reverse. As long as you take it to a bitmap style file format, you'll loose the ascii data.However, it will either make the file huge, or it'll look bad when you zoom in (depends on the resolution). Why are you wanting to do this anyway? Surely you want people to be able to search your documents.
 
Old 04-12-2008, 08:17 PM   #7
SqdnGuns
Senior Member
 
Registered: Aug 2005
Location: Pensacola, FL
Distribution: Slackware64® Current & Arch
Posts: 1,092

Rep: Reputation: 174Reputation: 174
Quote:
Originally Posted by beadyallen View Post
Why are you wanting to do this anyway? Surely you want people to be able to search your documents.
To make it hard for plagiarism software checks? Big Uni's use them now............
 
Old 04-12-2008, 11:12 PM   #8
jimbo1708
Member
 
Registered: Jan 2007
Location: Pennsylvania
Distribution: Ubuntu 8.10 Server/9.04 Desktop, openSUSE 11.1
Posts: 154

Original Poster
Rep: Reputation: 31
actually it is to prevent the plagerism check software from archiving my document. I don't mind that it is readable by people, but I don't like that it is publically searchable by the bots.

I've used convert a lot before, I will try that. The problem is that TeX doesn't allow you to just print certain pages out to file.

- Jim
 
Old 04-12-2008, 11:21 PM   #9
SqdnGuns
Senior Member
 
Registered: Aug 2005
Location: Pensacola, FL
Distribution: Slackware64® Current & Arch
Posts: 1,092

Rep: Reputation: 174Reputation: 174
Quote:
Originally Posted by jimbo1708 View Post
actually it is to prevent the plagerism check software from archiving my document. I don't mind that it is readable by people, but I don't like that it is publically searchable by the bots.

I've used convert a lot before, I will try that. The problem is that TeX doesn't allow you to just print certain pages out to file.

- Jim
Dayummmm, I'm good......Good Luck.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Removing Background From PDF or Postscript Files btbx Linux - Software 1 04-11-2008 12:41 PM
LXer: Linux PDF editor for manipulating PDF documents LXer Syndicated Linux News 0 12-19-2007 09:50 AM
Generating Pdf/Tex and changing Pdf Permissions nx5000 Linux - Software 3 03-28-2006 04:37 PM
Convert pdf to html or txt or remaster the pdf? jago25_98 Linux - Software 1 12-13-2005 01:11 AM
Foxit PDF Reader 1.3: A Nice PDF Viewer Cinematography Linux - Software 6 05-03-2005 04:36 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 09:20 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration