Ping a Specific Port

Question

Andrew Williams

Asked: 2009-06-13 02:40:22 +0800 CST2009-06-13 02:40:22 +0800 CST 2009-06-13 02:40:22 +0800 CST

Batch processing on Linux

772

We're currently setting up a server to some heavy lifting (ETL) after another process has finished within the business, at the moment we're firing off jobs either via scheduled cron jobs or remote execution (via ssh). Early on this week we hit a issue with too many jobs running side by side on the system which brought all the jobs to a snail pace as they fought for CPU time.

I've been looking for a batch scheduler, a system where we can insert jobs into a run queue and the system will process them one by one. Can anyone advise on a program/system to do this with? Low cost / FOSS would be appreciated due to the shoe-string nature of this project.

10 Answers

Voted

Martin M. · Answer 1 · 2009-06-13T02:58:16+08:00

I'd set up some kind of queueing service. A quick Google on "ready to use" stuff shows this:

http://sqs.sourceforge.net/

Depending on your needs you could simply

create a wrapper where users submit jobs,
the wrapper writes the job to a socket/file/whatever
create a consumer that runs job by job waiting for it to finish
the consumer is then called regularly by cron (every 5 minutes or so)
- of course create some locking mechanism so that only n jobs run at a time (where n=>1)
if there are no more jobs do nothing
if there are more jobs grab the next and wait for it to finish

Actually there's more to it, you could have requirements that implement a priority queue which brings up problems like starving jobs or similiar but it's not that bad to get something up and running quite fast.

If LDP as suggested by womble I'd take that. Having such a system maintained by a larger community is of course better than creating your own bugs for problems others already solved :)

Also the queuing service has the advantage of decoupling the resources from the actual number crunching. By making the jobs available over some network connection you can simply throw hardware at a (possible) scaling problem and have nearly endless scalability.

Dan Carley · Answer 2 · 2009-06-13T02:52:46+08:00

Dan Carley

2009-06-13T02:52:46+08:002009-06-13T02:52:46+08:00

Two solutions spring to mind:

Use xargs -P to control the maximum parallel processes at one time.
Create a Makefile and spawn with make -j.

They are actually both summarised in this SO thread in more detail.

There is a possibility that these may not be applicable to the structure of your scripting.

5

rkthkr · Answer 3 · 2009-06-13T02:59:21+08:00

rkthkr

2009-06-13T02:59:21+08:002009-06-13T02:59:21+08:00

A heavy weight solution to your problem is to use a something like Sun Grid Engine.

Sun Grid Engine (SGE). SGE is a distributed resource management software and it allows the resources within the cluster/machine (cpu time,software, licenses etc) to be utilized effectively.

Here is a small tutorial on how to use SGE.

5

Kjetil Joergensen · Answer 4 · 2009-06-13T02:52:30+08:00

Kjetil Joergensen

2009-06-13T02:52:30+08:002009-06-13T02:52:30+08:00

You could check out some of the batch-systems used for scheduling jobs on clusters, which has the option to monitor resource usage and declare a system to be too loaded to dispatch more workload to it. You could easily also configure them to only run one job at a time, but for that you may be better off with something less complex than a full fledged batch scheduler (in the spirit of keeping things simple).

As for freely available batch/scheduling systems, the two that springs to mind would be OpenPBS/Torque and SGE.

Edited to add: If you're ever going to add more processing capacity in the future in the form of more boxes, a batch/scheduling system like Torque/OpenPBS/SGE may be good choices as they're basically built to manage compute resources and distribute workloads to them.

4

womble · Answer 5 · 2009-06-13T02:43:27+08:00

womble

2009-06-13T02:43:27+08:002009-06-13T02:43:27+08:00

You can always use lpd -- yeah, old school, but it's really a generalised batch processing control system masquerading as a print server.

3

pgs · Answer 6 · 2009-06-13T02:59:16+08:00

pgs

2009-06-13T02:59:16+08:002009-06-13T02:59:16+08:00

From man batch:

batch executes commands when system load levels permit; in other words, when the load average drops below 1.5, or the value specified in the invocation of atd.

I think this might be what you're looking for. It's part of Debian's at package.

3

idelvall · Answer 7 · 2016-11-03T09:32:26+08:00

idelvall

2016-11-03T09:32:26+08:002016-11-03T09:32:26+08:00

wava: a memory-aware scheduler that allows to enqueue batch jobs (submitted with a maximum physical memory usage promise) to be executed when enough physical memory (RSS) is available in the system.

This scheduler has been created originally to enqueue a high number of long-running jobs in machines with a large amount of RAM, and run as most of them concurrently, avoiding memory paging and swapping in order to not penalize the performance of other services running in the system.

1

jouell · Answer 8 · 2017-11-13T18:07:28+08:00

jouell

2017-11-13T18:07:28+08:002017-11-13T18:07:28+08:00

We used Control M for this exact reason with ETLs and such (but a few years back now). Surely it's not free or open source but it had very good flexibility in terms of batch processing (a la if-this-then-that type of execution flow)

1

pauska · Answer 9 · 2009-06-13T02:44:44+08:00

pauska

2009-06-13T02:44:44+08:002009-06-13T02:44:44+08:00

A shell script called up by cron could easily do this, it processess it line-by-line.

0

Bill Lee · Answer 10 · 2009-10-08T17:28:34+08:00

Bill Lee

2009-10-08T17:28:34+08:002009-10-08T17:28:34+08:00

I would use Torque, which is an updated version of the FOSS OpenPBS.

0

Batch processing on Linux

Ping a Specific Port

What port does SFTP use?

Resolve host name from IP address

How can I sort du -h output by size

Command line to list users in a Windows Active Directory group?

What's the command-line utility in Windows to do a reverse DNS look-up?

How to check if a port is blocked on a Windows machine?

What port should I open to allow remote desktop?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?