Controlling discourteous server behavior by users

GATTACA · 08-12-2012, 08:43 AM

Hello.

I'm running a 64-core linux server in my department that we use as a cluster. I have a question about controlling user behavior on the hardware.

We have certain students who try to bypass our job queue by *not* submitting their jobs to it and simply running them on the server under their user accounts.

So instead of "qsub script_to_launch_job.sh" they just do: "job.sh"

The problem with this is that the students are circumventing the queue and hogging resources that other responsible users have agreed to abide by.

The students in question have been talked to and they continue to ignore my requests. Talking to their advisers has not help since they seem to encourage fast results at any cost to further their research (my impression and of course I could be wrong).

Does anyone have a way of preventing/detecting such behavior and nipping it in the bud?
Obviously I could sit at my desk and monitor user activity all day but I have other things to do.

Ideally I would like some way to kill their jobs after X number of minutes and then send them an email warning to use the submission queue instead.

Any suggestions would be greatly appreciated.

Thanks

Celyr · 08-12-2012, 10:10 AM

I would suggest you to limit very hard the resources they can get bypassing the queue, what about cgroups ? Which is how they do at my university btw.
That would be a very complete solution

TB0ne · 08-12-2012, 11:07 AM

Quote:

Originally Posted by GATTACA

Hello.
I'm running a 64-core linux server in my department that we use as a cluster. I have a question about controlling user behavior on the hardware. We have certain students who try to bypass our job queue by *not* submitting their jobs to it and simply running them on the server under their user accounts.

So instead of "qsub script_to_launch_job.sh" they just do: "job.sh"

The problem with this is that the students are circumventing the queue and hogging resources that other responsible users have agreed to abide by. The students in question have been talked to and they continue to ignore my requests. Talking to their advisers has not help since they seem to encourage fast results at any cost to further their research (my impression and of course I could be wrong).

Does anyone have a way of preventing/detecting such behavior and nipping it in the bud? Obviously I could sit at my desk and monitor user activity all day but I have other things to do. Ideally I would like some way to kill their jobs after X number of minutes and then send them an email warning to use the submission queue instead.

Well, the cgroups feature is a good way to go. You could also use AppArmor, which also works, and may be a bit less involved, since it focuses on apps, rather than on system-level resources.

AppArmor is a mandatory access control (MAC) system, for a specific executable is inherited by all of its children. Set a user's login to be /bin/bash-user, set the AppArmor profile for that specific shell, and the only way they're getting out of that permission jail is through kernel exploits. Even if they somehow find the root password, the AppArmor profile for the user will still apply even after they su up since su and the bash process it spawns are children of /bin/bash-user. However, if you allow root to log in over the network...shame on you, and you deserve to have the user sidestep your security.

The hard part is building the AppArmor profile, but there are ways to get it built fairly easily these days. I'm judging by openSUSE, but I'm confident that other distros have such tools too. The steps are:

Create an AppArmor profile for the user and set it to audit mode
Set users login-shell to /bin/bash-user
Login as the user, and perform the tasks you WANT the user to be able to do.
Use the auditlog to build the AppArmor profile
Set the policy to enforce.

After that, use that profile template and apply it to other users.

But, I'd go much lower-tech. Since the students are concerned with time, and doing things in a hurry, if you find one doing this, kill the job, and lock out their account for 24/48 hours. Let the student cry and whine to the advisor...who will then ask why the user is locked out. You tell them for violating system policy, after repeated warnings. Late assignments/bad grades for lazy users = users who won't want to risk it, and will start to play nicely, especially after making an example of a few of them.

jefro · 08-12-2012, 11:52 AM

If these little hackers get past your security then ask them to figure out a way to prevent them as a class project.

Of course you could simply way if you do bypass the proper means, you task will receive a failing grade but why not use them?

Celyr · 08-12-2012, 12:39 PM

Come on guys they are students, and in my opinion all multiuser servers should _should_ limit users resources

TB0ne · 08-12-2012, 01:03 PM

Quote:

Originally Posted by Celyr

Come on guys they are students, and in my opinion all multiuser servers should _should_ limit users resources

True, but the students are there to LEARN. Putting limits on things like this now (which may not exist in the 'real world'), won't teach them anything. cgroups is a good idea, and so is AppArmor. But all that's being done is the administrator is the one who is bearing the burden, while the students do as they wish. Some will even make a game of it, and try to figure out a way around things.

It's much more effective to confront them. Tell them you KNOW it's them, and make THEM suffer the consequences of their actions. That's the real world. I work in bank/financial institutions often, and on the internal network, things are fairly open. Why? Because data security and auditing are ALWAYS watching, and if you try anything, you'll be bounced out the door IMMEDIATELY. As a result, things like what's described here are VERY rare.

If you were a student, would tech solutions deter you, unless they were followed up with any disciplinary action?

chrism01 · 08-12-2012, 06:16 PM

Unfortunately, the OP sounds like he doesn't have that power.
He could make a formal complaint in writing to the relevant Depts, but who knows if that would work.

In addition to the Tech suggestions above, there are always the options in limits.conf (/etc/security/ on RHEL) and/or ulimit http://linux.die.net/man/1/bash

GATTACA · 08-12-2012, 07:27 PM

Quote:

Originally Posted by chrism01

Unfortunately, the OP sounds like he doesn't have that power.
He could make a formal complaint in writing to the relevant Depts, but who knows if that would work.

In addition to the Tech suggestions above, there are always the options in limits.conf (/etc/security/ on RHEL) and/or ulimit http://linux.die.net/man/1/bash

This is true. I don't have many options. I'm in a biological research field and these students aren't computer programmers. The students I'm dealing with are earning their Ph.D. in Biochemistry (I think, I've never asked them about their career paths). They view my requests as a nuisance that can be ignored.

The class project solution sounded like a good idea, but they'd never do it and I am powerless to make them (biochem remember?).

I will take a look at /etc/ulimit since the server in question is a RHEL box.
Thanks for the other posts guys.
I really appreciate all of your suggestions.

273 · 08-12-2012, 07:39 PM

What does your boss say about this?
Surely you're in a position to do as mentioned above and ban offenders for a period of time? If not, why do you care how much CPU time they use, if there are no rules defining it?
Personally, in your position I'd tell my boos and otherwise ignore the abuse of the system.

GATTACA · 08-12-2012, 08:24 PM

Quote:

Originally Posted by 273

What does your boss say about this?
Surely you're in a position to do as mentioned above and ban offenders for a period of time? If not, why do you care how much CPU time they use, if there are no rules defining it?
Personally, in your position I'd tell my boos and otherwise ignore the abuse of the system.

My boss prefers not to make waves. He agrees the students should obey the established rules but he's not willing to do anything about it since it doesn't impact him directly.

I care only because I am responsible for the shared resource and with these guys circumventing the established policy I have other users complaining. None of them are powerful enough to do much about it either. I was hoping there was a good way to enforce the departmental policy.

I am the only linux admin in the department and no one else here gets how this shared resource works (we are new at bioinformatics). In contrast the -70 freezer, everyone understands, that's a shared resource everyone respects. But this computer business is just too much for them I guess.

TB0ne · 08-13-2012, 10:35 AM

Quote:

Originally Posted by GATTACA

My boss prefers not to make waves. He agrees the students should obey the established rules but he's not willing to do anything about it since it doesn't impact him directly.

I care only because I am responsible for the shared resource and with these guys circumventing the established policy I have other users complaining. None of them are powerful enough to do much about it either. I was hoping there was a good way to enforce the departmental policy.

I am the only linux admin in the department and no one else here gets how this shared resource works (we are new at bioinformatics). In contrast the -70 freezer, everyone understands, that's a shared resource everyone respects. But this computer business is just too much for them I guess.

I bolded a part of your sentence above. If you're the only one who understands it, then you're in the drivers seat. And I heartily applaud your dedication, too...if the users could only understand how things like this affect systems (and OTHER users), things would be so much better, on pretty much ANY system.

A convincing-sounding reason as to why an account got locked for a few days, and why you CAN'T unlock it can go totally unchallenged. If you're working with science-types, they will understand cause and effect.

Yes, it sounds unethical, but like you said, you're responsible, and are trying to do your job. If your boss is gutless, then remove them from the equation, and put the blame on technology. Something like:

Code:

<USER TROLL MODE>

"The recent upgrade to the PAM authentication module on the cluster is causing incorrectly submitted jobs to (often times), abort. 
 A side effect is that the user account related to that job will become locked.  This means we'll have to reset the entire PAM 
database to re-activate that user.  In order to do this safely (so as not to affect OTHER users), that account has to remain locked
 for the duration of the rebuild...this can sometimes take up to 7 days, but may take as little as 24 hours, depending on the user 
ID, position in the database, and the system administrators workload.  

To make sure this doesn't happen to you, only submit your jobs using the qsub job scheduler, which does not suffer from that bug. 
 Thank you."
</USER TROLL MODE>