Workload Prediction and Grid Efficiency Case Study

Learn how YellowDog helped a Global Risk and Analytics company improve its workload prediction and grid efficiency, to drive revenue and lower costs.

Industry:: Digital Transformation in Financial Services

Share this:

Learn how YellowDog helped this Global Risk and Analytics company improve the utilisation of its infrastructure with intelligent workload prediction and enhanced scheduling capabilities.

Background

The Marketing Services Team at this Global Risk and Analytics company supports some of the largest US Banks and Financial Institutions. It provides highly targeted data sets for a range of purposes including marketing, risk, portfolio and credit analyses.

Although the company’s existing infrastructure is significant, the increasing volume of data, generated by the necessity to tailor services to individuals, is pushing it to its limits.

The company was looking for a solution to maximise the efficiency of its data centres and provide internal teams with accurate predictions on the completion time of batch workloads. This would ensure customer SLAs were met and expectations managed, to ultimately drive revenue and lower support costs. In addition, providing these capabilities would help retain valuable customers in a competitive market, and reduce capital expenditure by removing the need to invest in more hardware during times of high demand.

The Problems

Job Prediction

Every month, this company runs thousands of reports on behalf of its customers, each with their own deadline. To gain confidence that the reports would finish on schedule, the company based predictions on historic run times and experience.

However, it could never achieve a level of confidence that meant the predictions could be relied upon in totality. This is a common challenge when sharing infrastructure across a large business. Despite developing several systems internally over the past decade, the company still encountered demand spikes that could not be forecast. This meant manual monitoring and intervention was common, to reprioritise workloads and meet customer requirements.

Getting More For Less

The team were aware that job queue optimisation in their existing scheduler was good, however, the queue strategy was traditional and required requests from sales teams to move workloads up for key clients.

The company wanted to add more intelligence and predictability, to increase production levels and shorten queue times. This would enable more workloads or iterations to be run on the same hardware, bursting to the cloud only when needed.

The Solution

The team provided 13 months of usage data from their existing scheduler logs so YellowDog could train its advanced machine learning models. YellowDog developed an application to automate the extraction of this data, so the models could be re-trained on a regular basis, enabling the prediction engine to maintain its accuracy.

YellowDog delivered:

Enhanced forward prediction: YellowDog ran three Machine Learning Models, over an 8-week period, to predict both the end-to-end run time (Wall Time) of workloads and the time they were actively running on hardware (CPU Time). The evaluation, while limited to only 13 months of data, looked at seasonal trends and how the infrastructure operated on a weekly and monthly basis.
Improved workload scheduling: the YellowDog Scheduler Service was integrated with the company’s existing scheduling tools via an API, to issue commands and manage the workload queue. This additional level of intelligence removed the need for manual intervention and increased the efficiency and quantity of batch processes that could be run on the company’s infrastructure.
Visibility of workloads using dashboards: the information on predicted run time and ongoing performance was visualised via the API, which could be integrated into third-party and proprietary sales dashboards. By leveraging existing interfaces the company could reduce development time and accelerate time to market.

“YellowDog have created functionality that we have wanted for over a decade. This can transform how we run our infrastructure.”

VP and Global Head of Solution Architecture Leading Global Risk & Analytics Business

Results

Prediction

Training the YellowDog prediction models using the data supplied delivered results that exceeded all expectations. The models were able to provide predictions with 96% confidence for CPU Time and 73% confidence for Wall Time. This level of confidence enabled more intelligent prioritisation, helping client satisfaction and optimising hardware utilisation.

Intelligent Scheduling

The prediction data generated by YellowDog was used to intelligently manage the existing scheduler’s queue, reducing the waiting time by a significant amount. Furthermore, the introduction of automated workload reprioritisation removed the need for any manual intervention, allowing staff to concentrate on other higher-value tasks. As this service works alongside the existing scheduler, there is also no need to replace or alter the current setup.

Improving the efficiency of the compute grid and delivering enhanced performance allows for thousands of extra workloads to be run, helping to drive revenue from a fixed cost base and improving the profitability of each workload.

Please upgrade your browser

You are seeing this because you are using a browser that is not supported. The YellowDog website is built using modern technology and standards. We recommend upgrading your browser with one of the following to properly view our website:

Windows

Mac

Please note that this is not an exhaustive list of browsers. We also do not intend to recommend a particular manufacturer's browser over another's; only to suggest upgrading to a browser version that is compliant with current standards to give you the best and most secure browsing experience.

Workload Prediction and Grid Efficiency – Global Risk & Analytics Business

Background

The Problems

Job Prediction

Getting More For Less

The Solution

Results

Prediction

Intelligent Scheduling