[SOLVED] Linux GPL Serialized Multi-user Batch Queue?

Bryan88 · 11-02-2011, 06:22 PM

I am looking if anyone knows of a software solution (hopefully GPL) for a simple batch queueing system for linux.

We have a multi-user server where users submit command line script jobs, that take various amounts of time to run (seconds to days). The licensing of a commercial piece of software on the server only allows a fixed number of simultaneous jobs by all users (currently 3). Is there a simple queueing system similar to the old mainframe "batch" queue that will centrally queue user jobs, serialize the submitted jobs and then maintain a fixed number simultaneous jobs using a FIFO order.
The requirements (pretty easy):
GPL License, accepts jobs from multiple users, simple queue viewing and control by users (like a print queue), user permission and working directory retention (like 'at') and execution of up to a fixed number of parallel scripts/jobs.

Has anyone seen or crafted a piece of software like this. Kind of a cross between a printer queue and 'at'.

jthill · 11-02-2011, 09:45 PM

'man batch'. You already have it installed, all but certain. It launches new jobs so long as the load level's manageable, you can tune it.

Bryan88 · 11-03-2011, 10:41 PM

Yes, 'batch' is installed, but it doesn't do what I need, unless there is an undocumented option. Batch looks at load levels before launching a job.

I have plenty of processor cores and memory, but only have a limited number or simultaneous licenses available for a piece of software. Failure to have a license free when launching a job causes the job to fail. This is the reason for controlling the maximum number of simultaneously running queue jobs. The rest of the software on the server has no such license restriction and wouldn't be run from this queue.

Since, there are multiple users with varying job lengths, 'at' or 'cron' are not useful. I looked into 'torque' and 'nqs', but these are hugely overkill. This is running on a single, large, multi-core server, not a distributed cluster. I need something analogous to a printer queue feeding a small printer pool, but instead of printers, they would be command line shells.

Any help with prebuilt software or links to people altering queues for a similar purpose would be great.

jthill · 11-04-2011, 01:49 AM

[hang on, sorry, the answer that was here passed a smoketest but not a real test]

Okay, I thought I could make bash do it but that'll take a better man than me. So what I have instead is a set of brutally simple C programs: one to turn in a ticket for someone to use, one to get a ticket as soon as one's available, and one to collect and distribute tickets. Use them with a simple script that gets a ticket, runs a command and returns the ticket (also included below).

I have tested this, corner cases and all, it's usable as-is if primitive suits your style.

Code:

// llr.c, connect to the ticket-return socket so someone else can use our ticket
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>

struct sockaddr_un retr = { AF_UNIX, "ticket_return_socket" };
int main(int c, char **v)
{
        int returnfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
        if ( returnfd < 0 ) exit(1);
        if ( connect(returnfd, (void*)&retr, sizeof retr) < 0 ) exit(4);
        if ( c <= 1 ) v[1] = "here!";
        write(returnfd,v[1],strlen(v[1]));
        close(returnfd);
        return 0;
}

Code:

// llg.c, get a free ticket as soon as the server has one to hand out
#include <stdlib.h>
#include <sys/socket.h>
#include <sys/un.h>

struct sockaddr_un grant = { AF_UNIX, "ticket_grant_socket" };
int main(int c, char **v)
{
        int grantfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
        if ( grantfd < 0 ) exit(1);
        if ( connect(grantfd, (void*)&grant, sizeof grant) < 0 ) exit(4);
        char buf[64];
        int rc = read(grantfd,buf,sizeof buf);
        close(grantfd);
        return rc<0;
}

Code:

// lla.c, collect tickets from clients connecting to the return socket 
// and hand them out to clients connecting to the grant socket.  
#include <stdlib.h>
#include <stdio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/epoll.h>

struct sockaddr_un grant = { AF_UNIX, "ticket_grant_socket" };
struct sockaddr_un retr = { AF_UNIX, "ticket_return_socket" };

int awaiting[1024] = { 0 };  // open grant request socket list
int awaita = 0, awaitz = 0;  // first used, first following unused, if = then no used.

int main(int n, char **a)
{
        int tickets = 0;

        int grantfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);
        int returnfd = socket(AF_UNIX, SOCK_SEQPACKET, 0);

        if ( grantfd < 0 || returnfd < 0 )
                exit(1);

        if ( bind(grantfd, (void*)&grant, sizeof grant) < 0 ) exit(2);
        if ( bind(returnfd, (void*)&retr, sizeof retr) < 0 ) exit(3);

        chmod(grant.sun_path,0777);
        chmod(retr.sun_path,0777);

        if ( listen(grantfd, 50) < 0 ) exit(4);
        if ( listen(returnfd, 50) < 0 ) exit(5);

        int epollfd = epoll_create1(0); if ( epollfd < 0 ) exit(6);
        struct epoll_event ev;
        ev.events = EPOLLIN;
        ev.data.fd = grantfd;
        if ( epoll_ctl(epollfd, EPOLL_CTL_ADD, grantfd, &ev ) == -1 )
                exit(7);
        ev.events = EPOLLIN;
        ev.data.fd = returnfd;
        if ( epoll_ctl(epollfd, EPOLL_CTL_ADD, returnfd, &ev ) == -1 )
                exit(8);

        while (1) {
                if ( epoll_wait(epollfd, &ev, 1, -1) == -1 )
                        exit(9);
                if ( ev.data.fd == grantfd ) {
                        awaiting[awaitz++] = accept(grantfd,0,0);
                } else
                if ( ev.data.fd == returnfd ) {
                        ev.data.fd = accept(returnfd,0,0);
                        epoll_ctl(epollfd,EPOLL_CTL_ADD,ev.data.fd,&ev);
                } else { // must be a line from someone on a return socket
                        char retcmd[32];
                        retcmd[read(ev.data.fd,retcmd,sizeof retcmd)]=0;
                        close(ev.data.fd);
                        if (!memcmp(retcmd,"kill",4)) break;
                        ++tickets;
                }
                while ( awaitz > awaita && tickets > 0 ) {
                        write(awaiting[awaita],"ok",2);
                        close(awaiting[awaita]);
                        ++awaita, --tickets;
                }
                if ( awaita == awaitz ) 
                        awaita = awaitz = 0;
        }

getout:
        close(grantfd);
        close(returnfd);
        unlink(grant.sun_path);
        unlink(retr.sun_path);
        return 0;
}

Code:

#!/bin/sh
# with_ticket script: get a ticket, run a command, return the ticket.
if llg; then 
        trap 'llr' 0 1 2 3 15
        "$@"
fi

This is probably pushing taking simplicity to a fault, but it sure is simple, and it works. "ticket_grant_socket" and "ticket_return_socket" strings are actual pathnames, so you could e.g. make a /var/tmp/foo_licenses directory readable only by your chosen few, put "/var/tmp/foo_licenses/" in front of each of those strings, and you'd be done with that part of it.

jthill · 11-04-2011, 03:52 PM

... and here's a patch to handle people cancelling a job already waiting for a ticket

Code:

diff --git a/lla.c b/lla.c
index ebc9484..428fe46 100644
--- a/lla.c
+++ b/lla.c
@@ -59,9 +59,10 @@ int main(int n, char **a)
                         ++tickets;
                 }
                 while ( awaitz > awaita && tickets > 0 ) {
-                        write(awaiting[awaita],"ok",2);
-                        close(awaiting[awaita]);
-                        ++awaita, --tickets;
+                       int thisone = awaita++;
+                       if ( write(awaiting[thisone],"ok",2) > 0
+                         && close(awaiting[thisone]) == 0 )
+                               --tickets;
                 }
                 if ( awaita == awaitz ) 
                         awaita = awaitz = 0;
-- 
1.7.7

Bryan88 · 11-04-2011, 04:54 PM

Wow, awesome. Thanks for your help. I was looking for pointers and now I have custom code.

jthill · 11-05-2011, 09:07 PM

I left some "won't happen"s in there and they just will not stop bugging me. So:

Code:

diff --git a/lla.c b/lla.c
index 428fe46..96c4345 100644
--- a/lla.c
+++ b/lla.c
@@ -10,5 +10,6 @@ struct sockaddr_un grant = { AF_UNIX, "ticket_grant_socket" };
 struct sockaddr_un retr = { AF_UNIX, "ticket_return_socket" };
 
-int awaiting[1024] = { 0 };  // open grant request socket list
+#define CAP 1024
+int awaiting[2*CAP] = { 0 };  // open grant request socket list
 int awaita = 0, awaitz = 0;  // first used, first following unused, if = then no used.
 
@@ -54,5 +55,5 @@ int main(int n, char **a)
                 } else { // must be a line from someone on a return socket
                         char retcmd[32];
-                        retcmd[read(ev.data.fd,retcmd,sizeof retcmd)]=0;
+                        retcmd[read(ev.data.fd,retcmd,sizeof retcmd-1)]=0;
                         close(ev.data.fd);
                         if (!memcmp(retcmd,"kill",4)) break;
@@ -67,4 +68,12 @@ int main(int n, char **a)
                 if ( awaita == awaitz ) 
                         awaita = awaitz = 0;
+
+                if ( awaitz == sizeof awaiting / sizeof *awaiting ) {
+                        memmove( awaiting, awaiting+awaita, (awaitz-awaita) * sizeof *awaiting );
+                        awaitz -= awaita;
+                        awaita = 0;
+                        if ( awaitz > CAP)
+                               exit(99);
+                }
         }
 
diff --git a/llg.c b/llg.c
index ccb40c0..56fe025 100644
--- a/llg.c
+++ b/llg.c
@@ -13,4 +13,4 @@ int main(int c, char **v)
         int rc = read(grantfd,buf,sizeof buf);
         close(grantfd);
-        return rc<0;
+        return rc<=0;
 }
diff --git a/with_license b/with_license
index 7804e69..9916ef5 100755
--- a/with_license
+++ b/with_license
@@ -4,3 +4,5 @@ if llg; then
         trap 'llr' 0 1 2 3 15
         "$@"
+else
+       echo "Ticket server shutdown"
 fi

While I was at it I had it handle premature shutdowns more gracefully, but the main thing is now it'll handle the case where you've eternally got jobs waiting for a ticket. It still thinks having more than a thousand jobs waiting at a time is ridiculous, having it not handle an infinite waitlist is intentional.

SysKoll · 03-16-2012, 08:16 AM

Hello,

The program you are looking for is Lluis Batlle i Rossell's Task Spooler for Linux (ts). It's GPL's. Grab it here: http://vicerveza.homeunix.net/~viric/soft/ts/. It has one or more job runners that process entries from a batch queue. Easy to compile.

Man page:

Code:

usage: ./ts [action] [-ngfmd] [-L <lab>] [cmd...]

Env vars:
  TS_SOCKET  the path to the unix socket used by the ts command.
  TS_MAILTO  where to mail the result (on -m). Local user by default.
  TS_MAXFINISHED  maximum finished jobs in the queue.
  TS_ONFINISH  binary called on job end (passes jobid, error, outfile, command).
  TS_ENV  command called on enqueue. Its output determines the job information.
  TS_SAVELIST  filename which will store the list, if the server dies.
  TS_SLOTS   amount of jobs which can run at once, read on server start.

Actions:
  -K       kill the task spooler server
  -C       clear the list of finished jobs
  -l       show the job list (default action)
  -S [num] set the number of max simultanious jobs of the server.
  -t [id]  tail -f the output of the job. Last run if not specified.
  -c [id]  cat the output of the job. Last run if not specified.
  -p [id]  show the pid of the job. Last run if not specified.
  -o [id]  show the output file. Of last job run, if not specified.
  -i [id]  show job information. Of last job run, if not specified.
  -s [id]  show the job state. Of the last added, if not specified.
  -r [id]  remove a job. The last added, if not specified.
  -w [id]  wait for a job. The last added, if not specified.
  -u [id]  put that job first. The last added, if not specified.
  -U <id-id>  swap two jobs in the queue.
  -h       show this help
  -V       show the program version

Options adding jobs:
  -n       don't store the output of the command.
  -g       gzip the stored output (if not -n).
  -f       don't fork into background.
  -m       send the output by e-mail (uses sendmail).
  -d       the job will be run only if the job before ends well
  -L <lab> name this task with a label, to be distinguished on listing.