Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
If I use curl to copy the html off of a website, would the owners of the website think I was trying to hack them?
As an example, say I decided to build a mysql database program that records daily rainfall in a specific area of the country. I would be using curl to copy the website daily, then using gawk to find the amount of rainfall in the html file, then importing this value into a mysql database.
Or would the owners of the website even know I was using curl?
As I understand it, curl downloads the source from a url. Since that's what a browser does, or wget, how does the server know (or its owners care) which tool is being used? Hacking is when you try to access information on the server that isn't intended to be downloaded, and that naturally gets noticed (although not always, unfortunately!)
Distribution: Debian Sid AMD64, Raspbian Wheezy, various VMs
Posts: 7,680
Rep:
Quote:
Originally Posted by teckk
Read the man page, report your self as a browser.
Code:
curl -A "Mozilla5/0 Firefox 28" http://www...com -o - > report.html
I would be tempted to do something like this. Logs do show the user agent string* and while web scraping is common and accepted practice I think some people may see repeated uses of curl as some kind of attempted hack.
*I just have to drop in my story that, at an old place of work, I used to connect to our web-based Outlook solution with the user agent "Hey Steve :-)" as the guy who was running the pilot used to check the logs to see which browsers and OSs people were using.
Many thanks for the information, folks. I will check out the man page for curl, and mark this thread as solved. I am having fun learning gawk and mysql.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.