An Introduction to Workload Management

In this video, we sit down to discuss workload management. What it is, what tools are available, and the unique features of the YellowDog Workload Manager.

An Introduction to Workload Management

So, workload management is all about executing a series of jobs or work, on a set of servers. There are workload management tools that can do this, which have the ability to resubmit any work in case of task failures, and also to launch work in selected sequences – e.g. if there are any dependencies on work that has already been completed.

Some of the grids that are launched could have thousands of nodes on them, so it’s really important to have a tool that has a central console for the management of the various workstreams – especially if you are executing multiple workloads simultaneously.

Another thing that you need to consider is monitoring the actual execution itself. In this context, the execution on servers can be broken down into two things:

‘Compute Farm’ type operations, where there is a loose confederation of machines that are available for workload execution.
Or HPC clusters – High Performance Compute clusters – where you’re looking to combine compute across multiple machines.

Where Are These Servers Located?

There are a number of potential server locations, including:

On-premise.
In private data centres.
In the public cloud.

What Workload Management Tools Are Available?

To give an idea of the workload management tools out there, here is a selection:

Sun Grid Engine, which was forked by a company called Univa.
IBM LSF and IBM Spectrum Symphony.
TIBCO’s DataSynapse.

Alongside these, you also have workload management solutions that come from the cloud providers themselves, for example:

AWS Batch.
Azure CycleCloud.

There are also some open-source solutions available – in particular, Slurm is very, very popular.

All of the above are great options. Some of them are more attuned to working on-premise and some are more attuned to working in the cloud.

What Makes YellowDog’s Workload Manager Different?

Ultimately, YellowDog’s Workload Manager is cloud native – it has been built to work in the cloud.

One of the key things, when working in the cloud, is recognising that the environment is completely different. For example, when you execute workloads on-premise, your machines tend to be up and running, you have a very stable network, and the grid is available to you when you start.

Whereas in the cloud, you have a different atmosphere, different capabilities and if you’re using things like Spot or pre-emptible machine types, there’s a possibility you might not get them, or they might be taken away when you’re in the middle of executing a workload.

So, one of the things that the YellowDog Workload Manager does is assume failure and handle this automatically by resubmitting jobs. Alongside this, it also asks for more compute, when it’s required.

The other thing the YellowDog Workload Manager does is work across multiple machine types and regions. So, you can have, for example, workloads being executed in servers across the UK, France and Germany, which would be shown as a single cluster to the YellowDog Workload Manager.

In addition to this, the Workload Manager can also combine resources from multiple cloud providers, and again, show this as one cluster.

Finally, the YellowDog Workload Manager, uses what we would call a ‘pull’ model, where the workers contact the Scheduler for work, rather than the other way around. When the workers come up, they alert the Scheduler saying, “Hey, look I’m here, I’m available for work.” And the Scheduler will then push work to it accordingly.

Share this:

Latest News Articles

YellowDog enables Nextflow users to go Hybrid and Multi-Cloud 08.11.2023

Developing virtual twin hearts with Hybrid Cloud 24.07.2023

A unique opportunity at ISC 2023 19.05.2023