Pushing the Limits of Possibility – A Multiple Opinion Piece

In this opinion piece, we explore the topic of ‘pushing boundaries’ from a number of different angles. Each one an amalgam of our collective experience at YellowDog, and the insights gained from working with clients at the cutting edge of HPC.

Executive Summary

When it comes to HPC workloads today, ambitions may be limitless, but questions still remain around what’s actually possible. HPC clusters are generally expensive to build and maintain and require a level of expertise that often precludes most organisations. This is why they have long been the preserve of governments, academic institutions and large organisations that are actually able to buy and run them. Yet even here the ‘limits of possibility’ are constrained from the get-go, due to the need to define and deploy a rigid multi-year contract that typically remains static until the agreements ends. An arrangement that’s unable to incorporate any changes of new processor architectures outside of future budget or maintenance cycles.

Not that finance is the only factor. There’s also usage, where the way a cluster is put to work can also place limits on what’s possible. Obviously, the value of any HPC deployment comes from it being utilised, but the way this is managed can also be the source of on-going concerns:

In high-demand clusters, usage is constant, which while great for maximising ROI can also lead to the prioritisation of projects – with ‘less strategic’ projects struggling to get a look in.
Then there are ‘burst workloads’, which deliver more flexibility to organisations in terms of resource accessibility, but can sit idle in between.

The challenge of expanding HPC accessibility is made more immediate by the growing number of ‘non-traditional users’ (start-up companies, limited-duration project teams, temporary government projects etc.) also wanting access to raw processing power. Organisations that lack the financial support needed to build out a comprehensive HPC infrastructure, or to maintain on-going operational expenditures. Factors that further restrict both the ‘limits of possibility’ and their freedom to imagine, unless they’re prepared to take a step back and ask:

Why can’t different teams, departments, and business units access all the processing power they need on demand?
Why can’t they push out a request (for defined processing resource) and quickly access the precise machine allocation needed to complete the job?
And why can’t they scale this demand up and down at speed to optimise cost and management time?

Answering these questions will of course involve the cloud for accessing HPC clusters where and when they’re needed. Attempts to make this happen however have seen many organisations try to replicate their on-premises philosophy for provisioning HPC resources in a public or hybrid cloud environment. The challenge here though is that to still think in terms of limited resource accessibility and prioritisation, or the need to carefully orchestrate burst workloads, only serves to displace the same ‘old’ problems. Worse still, if the move does not include the right management tool, costs can rapidly go north and at a surprising velocity.

While the above situation is predictable, it doesn’t actually have to manifest. Instead, it is possible to offer users the ability to access the exact processing power they need at the exact time they need it – and then to discard it the moment a job is complete. This is the world of intelligent provisioning. VERY intelligent provisioning that enables the most dynamic scaling possible, while also enabling both users and IT to seamlessly manage the process with confidence. Capabilities made available irrespective of the size of job or organisation. A new reality that our multi-cloud workload management solution was designed to empower:

Achieve the full financial advantages of moving HPC requirements from a CapEx to an OpEx model.
Scale machines, including huge amounts of cores when/if needed, at a lightning-fast pace, before taking them down with equal speed.
Run simultaneous workloads from a single control point, while maintaining a real-time view of exactly what’s running and the progress of each job.

What this all means is that those who need access to HPC resource, irrespective of size and complexity, can access it quickly, easily, and in the most cost-effective way possible. Rather than a priority list of different jobs, individual teams can access the precise HPC setup they require simultaneously, without the need to wait and arrange. They’re also instantly aware of the cost involved and can therefore dynamically set a budget at a departmental level – and focus on outcomes rather than operational complexity. What we all know for certain is that powerful HPC capabilities are already available in the cloud. The task now is to access them in the most flexible and cost-effective way possible.

To read the rest of our opinion piece, click here.

Request a Demo Today

Contact the team today and book a demo of the YellowDog Platform.

Learn More

Share this:

Latest News Articles

Running FSI workloads on AWS with YellowDog 10.07.2024

Mark Noctor joins YellowDog as Chief Commercial Officer 09.05.2024

YellowDog enables Nextflow users to go Hybrid and Multi-Cloud 08.11.2023