[SOLVED] curl question regarding security

l33y · 08-30-2015, 10:08 AM

If I use curl to copy the html off of a website, would the owners of the website think I was trying to hack them?

As an example, say I decided to build a mysql database program that records daily rainfall in a specific area of the country. I would be using curl to copy the website daily, then using gawk to find the amount of rainfall in the html file, then importing this value into a mysql database.

Or would the owners of the website even know I was using curl?

Thanks in advance

DavidMcCann · 08-30-2015, 10:47 AM

As I understand it, curl downloads the source from a url. Since that's what a browser does, or wget, how does the server know (or its owners care) which tool is being used? Hacking is when you try to access information on the server that isn't intended to be downloaded, and that naturally gets noticed (although not always, unfortunately!)

teckk · 08-30-2015, 04:14 PM

Read the man page, report your self as a browser.

Code:

curl -A "Mozilla5/0 Firefox 28" http://www...com -o - > report.html

273 · 08-30-2015, 04:19 PM

Quote:

Originally Posted by teckk

Read the man page, report your self as a browser.

Code:

curl -A "Mozilla5/0 Firefox 28" http://www...com -o - > report.html

I would be tempted to do something like this. Logs do show the user agent string* and while web scraping is common and accepted practice I think some people may see repeated uses of curl as some kind of attempted hack.

*I just have to drop in my story that, at an old place of work, I used to connect to our web-based Outlook solution with the user agent "Hey Steve :-)" as the guy who was running the pilot used to check the logs to see which browsers and OSs people were using.

Habitual · 08-31-2015, 08:54 AM

Quote:

Originally Posted by l33y

Or would the owners of the website even know I was using curl?

Yes, they would.
If they review access.log, that is.

Code:

My_host_Name - - [31/Aug/2015:09:53:38 -0400] "GET / HTTP/1.1" 200 53925 "-" "curl/7.35.0"

l33y · 08-31-2015, 10:56 PM

Many thanks for the information, folks. I will check out the man page for curl, and mark this thread as solved. I am having fun learning gawk and mysql.