Entire OS checksum

softwarelabus · 06-13-2011, 02:37 PM

Hi,

Sorry if this is a trivial question. It's been about a decade since I was an admin on Linux. I wanted to write a simple script to scan the entire linux server to obtain some file checksums, perhaps a checksum for each major root folder. This will be a web & email server. This is an attempt to know if the os is ever hacked. Any recommendations as to which folders and file types should be included in the checksum scans?

One possibility is to double the checksums. For example, checksum #1 would consist of checksums for the 1st half of each file, while checksum #2 would consist of checksums for the 2nd half of the files. IOW, checksum #1 would sum the total bytes for the first half a each file. The reasoning behind this is just in case a hacker decides to add some useless bytes to a file so that the file checksum does not change. Thus, splitting the checksum per file would detect such attempts. That's just a basic outline. I would not do an exact 50% - 50% split. One day it might 30% - 70%, the next day it could be 51% - 49%, etc.

Thanks for any input and help.

win32sux · 06-13-2011, 02:55 PM

WHy not take the time to set up AIDE or Tripwire? They'll let you use multiple hash functions, plus they'll look for many more inconsistencies which are symptomatic of a compromise (and cannot be found via simple checksumming). Besides, they're designed for this sort of thing.

softwarelabus · 06-13-2011, 03:01 PM

Okay, I'll look into those. That could be used in conjunction with my own little script. I have no idea who AIDE is or every individual that worked on the code. I'm sure it's trustworthy.

As for my script, another idea is to upload it every time before using it just to make sure it was not edited. ... Just thinking here.

win32sux · 06-13-2011, 03:08 PM

Quote:

Originally Posted by softwarelabus

Okay, I'll look into those. That could be used in conjunction with my own little script.

Okay.

Quote:

I have no idea who AIDE is or every individual that worked on the code. I'm sure it's trustworthy.

You don't know the individuals who made the software you were using when you posted this either (Microsoft Windows). At least with AIDE and Tripwire you have access to the source code, which you can audit yourself (or pay someone else to do it). But yeah, if it's feasible for you to assume that there is evil code in these products (or any other), then by all means please do so and take appropriate measures.

Quote:

As for my script, another idea is to upload it every time before using it just to make sure it was not edited. ... Just thinking here.

What makes you think it can't be edited as soon as it's uploaded? HIDS authors have taken all kinds of similar issues into consideration. Perhaps reading about the different ways to deploy HIDS will give you some more ideas to work with, as well as a better understanding of the assorted risk and convenience levels which different deployment techniques represent. For example, running HIDS on the monitored host OS vs. running it from a separate OS; using read-only media; etc.

softwarelabus · 06-13-2011, 03:28 PM

Like I said, my script would be upload and immediately executed every time it's used. It might take a few seconds to upload & execute it. The script could generate live feedback over https as it scans from folder to folder. Can a hacker analyze my custom script and edit it within 2 seconds? One massive advantage for custom private code over popular open-sourced code is that hackers have never analyzed it. Maybe my script is not full proof. If not, then I welcome anyone to point out the vulnerability so it can be improved.

softwarelabus · 06-13-2011, 03:37 PM

Quote:

Originally Posted by win32sux

[snip]
But yeah, if it's feasible for you to assume that there is evil code in these products (or any other), then by all means please do so and take appropriate measures.

Indeed, 1st rule, never underestimate a hacker. With all the hackers in the world, who knows if one is on board that open-sourced security project.

unSpawn · 06-13-2011, 06:28 PM

Quote:

Originally Posted by softwarelabus

One massive advantage for custom private code over popular open-sourced code is that hackers have never analyzed it. Maybe my script is not full proof. If not, then I welcome anyone to point out the vulnerability so it can be improved.

If you want to know if the OS is compromised then you, with all due respect, should not focus on a scripted kludge but change your scope. When you chose the right enterprise or long term support distribution and when a host is properly hardened and auditing takes place regularly then you have a way better basis (as malicious activity is often preceded by reconnaissance and failed attempts) wrt early warning and detection. Feel free to ask more specific questions but if your search engine-fu isn't weak it would be efficient if you look at standard hardening practices as promoted by SANS, the Center for Internet Security (CIS) and say OWASP first.

win32sux · 06-13-2011, 06:35 PM

Quote:

Originally Posted by softwarelabus

Like I said, my script would be upload and immediately executed every time it's used. It might take a few seconds to upload & execute it. The script could generate live feedback over https as it scans from folder to folder. Can a hacker analyze my custom script and edit it within 2 seconds? One massive advantage for custom private code over popular open-sourced code is that hackers have never analyzed it. Maybe my script is not full proof. If not, then I welcome anyone to point out the vulnerability so it can be improved.

Don't get me wrong: I do commend you for taking the initiative to do some basic integrity checking on your own OS. It's just that I become a bit hesitant whenever I see anyone try to re-invent the wheel. Granted, you've said that you'll be using your own script in addition to a proper HIDS (which takes into account much more than just checksums), so that's not so bad. With that in mind, I do hope you take unSpawn's great advice, because it's the type that will actually help make your box less likely to be compromised in the first place.

As for your "custom private code" argument, I respectfully disagree. It sounds to me like pure security through obscurity, which has been proven time and time again to be an extremely ineffective model. Besides, if your box gets owned, there's nothing stopping the bad guys from copying your script when you upload it, then studying it and taking the necessary measures for it to feed you fake data via your HTTPS connection (or whatever). Again, this sort of vulnerability is not intrinsically specific to "custom private code" solutions, but the accompanying false sense of security can be quite dangerous in and of itself.

Quote:

Originally Posted by softwarelabus

Indeed, 1st rule, never underestimate a hacker. With all the hackers in the world, who knows if one is on board that open-sourced security project.

Who knows if there's one on board any project (be it closed or open source)?

That's one reason why the freedom for anyone to audit source code is so important.

softwarelabus · 06-14-2011, 07:04 AM

Quote:

Originally Posted by win32sux

Don't get me wrong: I do commend you for taking the initiative to do some basic integrity checking on your own OS. It's just that I become a bit hesitant whenever I see anyone try to re-invent the wheel. Granted, you've said that you'll be using your own script in addition to a proper HIDS (which takes into account much more than just checksums), so that's not so bad. With that in mind, I do hope you take unSpawn's great advice, because it's the type that will actually help make your box less likely to be compromised in the first place.

I think we're in agreement that a custom script that scans/reads (no writing) files obtaining numerous checksums will add security.

Quote:

Originally Posted by win32sux

As for your "custom private code" argument, I respectfully disagree. It sounds to me like pure security through obscurity, which has been proven time and time again to be an extremely ineffective model.

It's your right to disagree, but again, by all means feel free to point out the vulnerability in my script idea. As stated, open-source has a major disadvantage in that hackers have access to the code for analysis.

Quote:

Originally Posted by win32sux

Besides, if your box gets owned, there's nothing stopping the bad guys from copying your script when you upload it, then studying it and taking the necessary measures for it to feed you fake data via your HTTPS connection (or whatever).

One major advantage my script has is that it will be *uploaded* every time. It's actually easy to write some code on my end over here that modifies my script before it's uploaded such that I would expect a specific message from my script. That means the hackers must analyze my *custom* script on the fly within a few seconds after it's upload. A task that IMO is impossible.

Quote:

Originally Posted by win32sux

Again, this sort of vulnerability is not intrinsically specific to "custom private code" solutions, but the accompanying false sense of security can be quite dangerous in and of itself.

The point I'm making is that custom code is a major advantage over open-source. I think hackers love open-source.

Quote:

Originally Posted by win32sux

Who knows if there's one on board any project (be it closed or open source)?

One major difference. Open source means the codes open to nearly 7 billion people. Closed source means it's open to maybe half dozen at most, and in my case it means just one person, me!

Quote:

Originally Posted by win32sux

That's one reason why the freedom for anyone to audit source code is so important.

The problem with that is who in their right mind has ever even thought to analyze the massive amounts of code in an open-sourced project? I wouldn't. The task becomes exponentially more difficult with code size. And besides, a hacker's not going write code that flashes, "Here I am!" It's possible to write code that accomplishes two tasks at once, where one task is impossible to analyze and simulate in your mind.

Don't get me wrong. I'm not trying to argue. I just think it's important to point out that the addition of custom code is a huge aid to security.

softwarelabus · 06-14-2011, 07:06 AM

A bit puzzling, nobody wants to answer my question in the 1st post, but only deter people from writing custom security code, and to only accept open-sourced code??? Hmmm

Hangdog42 · 06-14-2011, 07:32 AM

Quote:

Originally Posted by softwarelabus

A bit puzzling, nobody wants to answer my question in the 1st post, but only deter people from writing custom security code, and to only accept open-sourced code??? Hmmm

I think your question was largely answered. But to be more direct, I think your double checksum idea probably isn't necessary. Any algorithm worth its weight in warm spit isn't going to generate the same result from changed files of the same size, so your worry about "byte padding" is largely misplaced. If you use something like md5 or sha1, internal changes to the file will generate different results. Furthermore, in order for your idea to be valid, you would have to store results for all of the file splits you're proposing. After all, you need clean results in order to know if things have changed. Sounds like a lot of redundant work to me.

Quote:

Originally Posted by softwarelabus

Don't get me wrong. I'm not trying to argue. I just think it's important to point out that the addition of custom code is a huge aid to security.

I would strongly urge you to review the HBGary fiasco. Here is a case where proprietary, closed source code known to very few individuals resulted in HBGary being ridiculously vulnerable to attack by Anonymous. This is a security company that got their ass handed to them, and the main reason they got demolished was that their closed-source proprietary code was so poorly written. Nobody reviewed it, even a little bit. If HBGary had used an open source CMS, they wouldn't have had half the trouble they did.

So no, custom code does not equal security. Ever. Secure code equals security, regardless of degree of openness.

softwarelabus · 06-14-2011, 07:36 AM

Quote:

Originally Posted by win32sux

[snip]
It's just that I become a bit hesitant whenever I see anyone try to re-invent the wheel.
[snip]

No offense, but I would say that when even companies such as the almighty Google, that has the funds to buy the best, are getting hacked to this day, that the Internet security is not a wheel, yet. More like a square. ;-)

softwarelabus · 06-14-2011, 07:46 AM

Quote:

Originally Posted by Hangdog42

I think your question was largely answered.

The question was which folders are best to scan? No answers yet.

Quote:

Originally Posted by Hangdog42

But to be more direct, I think your double checksum idea probably isn't necessary. Any algorithm worth its weight in warm spit isn't going to generate the same result from changed files of the same size, so your worry about "byte padding" is largely misplaced. If you use something like md5 or sha1, internal changes to the file will generate different results. Furthermore, in order for your idea to be valid, you would have to store results for all of the file splits you're proposing. After all, you need clean results in order to know if things have changed. Sounds like a lot of redundant work to me.

I don't get it. Doesn't you're paragraph, above, support what I'm saying. It sounds like what you're saying is that it's difficult for a hacker to modify files without changing the checksum, and that my multiple checksum idea is only a waste. Again, first rule, never underestimate a hacker. In fact, this thread could be riddled with hackers.

Quote:

Originally Posted by Hangdog42

I would strongly urge you to review the HBGary fiasco. Here is a case where proprietary, closed source code known to very few individuals resulted in HBGary being ridiculously vulnerable to attack by Anonymous. This is a security company that got their ass handed to them, and the main reason they got demolished was that their closed-source proprietary code was so poorly written. Nobody reviewed it, even a little bit. If HBGary had used an open source CMS, they wouldn't have had half the trouble they did.

So no, custom code does not equal security. Ever. Secure code equals security, regardless of degree of openness.

Haha, I fail to see your logic. As an example, if I can find a bad academic scientist or one that made a single major mistake, then by your reasoning it means academic science itself is bad. Custom undisclosed code is a major advantage in that it prevents hackers from analyzing the code.

Hangdog42 · 06-14-2011, 08:05 AM

Quote:

Originally Posted by softwarelabus

The question was which folders are best to scan? No answers yet.

I think because that answer is dependent on many factors specific to your installation. The knee-jerk answer is everything, but that is likely to generate a huge signal to noise ratio. What is your tolerance for investigations everytime something changes? So if you eliminate folders (maybe like /tmp) that change a lot, you'll reduce the noise, but give the crackers a place to hide stuff. We can't answer these sorts of questions for you because the answer is completely dependent upon your own needs and tolerance for risk.

Quote:

Originally Posted by softwarelabus

I don't get it. Doesn't you're paragraph, above, support what I'm saying. It sounds like what you're saying is that it's difficult for a hacker to modify files without changing the checksum, and that my multiple checksum idea is only a waste. Again, first rule, never underestimate a hacker.

There is a difference between underestimating the crackers and granting them superhuman abilities. So lets go review you're original post:

Quote:

Originally Posted by softwarelabus

One possibility is to double the checksums. For example, checksum #1 would consist of checksums for the 1st half of each file, while checksum #2 would consist of checksums for the 2nd half of the files. IOW, checksum #1 would sum the total bytes for the first half a each file. The reasoning behind this is just in case a hacker decides to add some useless bytes to a file so that the file checksum does not change. Thus, splitting the checksum per file would detect such attempts. That's just a basic outline. I would not do an exact 50% - 50% split. One day it might 30% - 70%, the next day it could be 51% - 49%, etc.

Here you are talking about splitting the files and doing checksums on the separate bits because you are afraid a cracker might alter a file and pad out the bytes to make it the same size. and therefore evade the checksum scans. What I am saying is that if you use a decent algorithm to start, splitting the files is completely and totally unnecessary. ANY change to the file will be detected by a whole file checksum.

Furthermore, for every split you propose (50-50, 70-30, 51-49), you will have to have a pre-existing set of data for comparison. That means that you can't just randomly make up a new percentage on the fly because you never would know if the checksums were valid or not. So essentially, you're creating a ton of additional work for zero security gain.

So no, I'm not agreeing with you at all. I'm saying your multiple checksum idea is a waste of time and effort compared to a single checksum.

Quote:

Originally Posted by softwarelabus

HHaha, I fail to see your logic. As an example, if I can find a bad academic scientist, then by your reasoning it means academic science itself is bad. Custom undisclosed code is a major advantage in that it prevents hackers from analyzing the code.

Then maybe you ought to try thinking harder. Your assertion throughout this thread is that undisclosed code is secure code. I'm saying that is a dangerous and completely incorrect assertion. The HBGary example proves that undisclosed code can have very, very large security holes that even script kiddies can exploit. Any code that is exposed to users in any manner is potentially vulernable. People will try all sorts of crap on it. And they don't need to see the code to make that job easier. Normal attack vectors like SQL injection or buffer overflows require absolutely no knowledge of the underlying code. They just require some sort of input form. Other vectors, like man-in-the-middle only require that communications occur over a network. Again, no knowledge of underlying code needed.

softwarelabus · 06-14-2011, 08:49 AM

Quote:

Originally Posted by Hangdog42

I think because that answer is dependent on many factors specific to your installation. The knee-jerk answer is everything, but that is likely to generate a huge signal to noise ratio. What is your tolerance for investigations everytime something changes? So if you eliminate folders (maybe like /tmp) that change a lot, you'll reduce the noise, but give the crackers a place to hide stuff. We can't answer these sorts of questions for you because the answer is completely dependent upon your own needs and tolerance for risk.

Okay, but I think the only acceptable result is zero noise. Files that change a lot such as log files wouldn't be scanned. Indeed, the hacker could replace a log with a bin and I wouldn't see it, but a main goal is to prevent the web server from being hacked, that includes PHP & Python files, and Apache bin files. The script will scan such files.

Quote:

Originally Posted by Hangdog42

There is a difference between underestimating the crackers and granting them superhuman abilities. So lets go review you're original post:

When Google email gets hacked, it seems superhuman to a lot of people, lol.

Quote:

Originally Posted by Hangdog42

Here you are talking about splitting the files and doing checksums on the separate bits because you are afraid a cracker might alter a file and pad out the bytes to make it the same size. and therefore evade the checksum scans. What I am saying is that if you use a decent algorithm to start, splitting the files is completely and totally unnecessary. ANY change to the file will be detected by a whole file checksum.

Sorry. The hacker would pad the file with code, and strip unnecessary bytes to obtain the same net checksum. Very easy!

Quote:

Originally Posted by Hangdog42

Furthermore, for every split you propose (50-50, 70-30, 51-49), you will have to have a pre-existing set of data for comparison. That means that you can't just randomly make up a new percentage on the fly because you never would know if the checksums were valid or not. So essentially, you're creating a ton of additional work for zero security gain. So no, I'm not agreeing with you at all. I'm saying your multiple checksum idea is a waste of time and effort compared to a single checksum.

Not a waste of time. Hackers can edit a file without changing the entire files checksum. So the script does two things. First, it verifies the previous scan, and obviously it would remember what the previous scan %'s are, e.g., (50-50, 70-30, 51-49). That's easy enough. Second, it records a new scan.

Quote:

Originally Posted by Hangdog42

Then maybe you ought to try thinking harder. Your assertion throughout this thread is that undisclosed code is secure code. I'm saying that is a dangerous and completely incorrect assertion. The HBGary example proves that undisclosed code can have very, very large security holes that even script kiddies can exploit.

I never said it's "secure code." I have essentially said that customized undisclosed code has the potential of being *more* secure. Also, I fail to see your point except that you're saying nothing in life is guaranteed. Agreed, lol!

As for being dangerous, come on, adding security is safer, not more dangerous. Does it seem like it would make me feel secure & at ease? Come on man, I'm paranoid! Haha

Quote:

Originally Posted by Hangdog42

Any code that is exposed to users in any manner is potentially vulernable. People will try all sorts of crap on it. And they don't need to see the code to make that job easier.

Hmmm, so now you're saying that if a hacker can see the source code, that it does not help them find vulnerabilities. What school of logic is that from? You might want to rethink that one, my friend.

Quote:

Originally Posted by Hangdog42

Normal attack vectors like SQL injection or buffer overflows require absolutely no knowledge of the underlying code. They just require some sort of input form. Other vectors, like man-in-the-middle only require that communications occur over a network. Again, no knowledge of underlying code needed.

I agree, that's a common and well known attack that's preventable by using SQL prepared statements.

It's the uncommon attacks that I'm concerned about. Attacks that hackers don't want the public to know about.