Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
If I use curl to copy the html off of a website, would the owners of the website think I was trying to hack them?
As an example, say I decided to build a mysql database program that records daily rainfall in a specific area of the country. I would be using curl to copy the website daily, then using gawk to find the amount of rainfall in the html file, then importing this value into a mysql database.
Or would the owners of the website even know I was using curl?
As I understand it, curl downloads the source from a url. Since that's what a browser does, or wget, how does the server know (or its owners care) which tool is being used? Hacking is when you try to access information on the server that isn't intended to be downloaded, and that naturally gets noticed (although not always, unfortunately!)
Distribution: Debian Sid AMD64, Raspbian Wheezy, various VMs
Originally Posted by teckk
Read the man page, report your self as a browser.
curl -A "Mozilla5/0 Firefox 28" http://www...com -o - > report.html
I would be tempted to do something like this. Logs do show the user agent string* and while web scraping is common and accepted practice I think some people may see repeated uses of curl as some kind of attempted hack.
*I just have to drop in my story that, at an old place of work, I used to connect to our web-based Outlook solution with the user agent "Hey Steve :-)" as the guy who was running the pilot used to check the logs to see which browsers and OSs people were using.