LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   curl question regarding security (https://www.linuxquestions.org/questions/linux-newbie-8/curl-question-regarding-security-4175552138/)

l33y 08-30-2015 10:08 AM

curl question regarding security
 
If I use curl to copy the html off of a website, would the owners of the website think I was trying to hack them?

As an example, say I decided to build a mysql database program that records daily rainfall in a specific area of the country. I would be using curl to copy the website daily, then using gawk to find the amount of rainfall in the html file, then importing this value into a mysql database.

Or would the owners of the website even know I was using curl?

Thanks in advance

DavidMcCann 08-30-2015 10:47 AM

As I understand it, curl downloads the source from a url. Since that's what a browser does, or wget, how does the server know (or its owners care) which tool is being used? Hacking is when you try to access information on the server that isn't intended to be downloaded, and that naturally gets noticed (although not always, unfortunately!)

teckk 08-30-2015 04:14 PM

Read the man page, report your self as a browser.
Code:

curl -A "Mozilla5/0 Firefox 28" http://www...com -o - > report.html

273 08-30-2015 04:19 PM

Quote:

Originally Posted by teckk (Post 5413350)
Read the man page, report your self as a browser.
Code:

curl -A "Mozilla5/0 Firefox 28" http://www...com -o - > report.html

I would be tempted to do something like this. Logs do show the user agent string* and while web scraping is common and accepted practice I think some people may see repeated uses of curl as some kind of attempted hack.

*I just have to drop in my story that, at an old place of work, I used to connect to our web-based Outlook solution with the user agent "Hey Steve :-)" as the guy who was running the pilot used to check the logs to see which browsers and OSs people were using.

Habitual 08-31-2015 08:54 AM

Quote:

Originally Posted by l33y (Post 5413238)
Or would the owners of the website even know I was using curl?

Yes, they would.
If they review access.log, that is.

Code:

My_host_Name - - [31/Aug/2015:09:53:38 -0400] "GET / HTTP/1.1" 200 53925 "-" "curl/7.35.0"

l33y 08-31-2015 10:56 PM

Many thanks for the information, folks. I will check out the man page for curl, and mark this thread as solved. I am having fun learning gawk and mysql.


All times are GMT -5. The time now is 03:09 PM.