Idaho National Lab Case Study

Astronomical increase in compute for same cloud spend in DOE demonstration
Demonstrating what Infrastructure Optimizer can do for a high-performance application.
Introduction
By Taylor Reo, Ashbocker, Nathan Lee, Woodruff, Christopher S Ritter, Originally posted on osti.gov
Technical Report – https://inldigitallibrary.inl.gov/sites/sti/sti/Sort_59631.pdf
Exostellar demonstrated this technology to the Department of Energy (DOE) with the Idaho National Laboratory's MASTODON application, a Multiphysics environment designed to run typical high-performance computing (HPC) simulations for structural dynamics, seismic analysis, and risk assessment.
Exostellar has developed novel technologies that dynamically allocate idle or over-sized application containers and virtual machines (VMs) to take advantage of deeply discounted server space. Infrastructure Optimizer spawns application containers on discounted VMs and orchestrates containers between such instances based on availability and price. Spot instances can be reclaimed by the cloud provider at any time with very short notice, but Infrastructure Optimizer dramatically improves spot-instance usability for critical applications with comparable reliability and cost reductions upwards of 70% compared to regular on-demand instances.
Partnership
Text: In Partnership WithLinks: Idaho National Laboratory and U.S. Department of Energy
The Application
Exostellar demonstrated this technology with the Idaho National Laboratory's MASTODON application, a Multiphysics environment designed to run typical high-performance computing (HPC) simulations for structural dynamics, seismic analysis, and risk assessment. The MASTODON application was packaged into a container using Docker, deployed on Amazon ECS, and managed through a custom Scale-Out Compute on AWS (SOCA) implementation.
Experiment and Results
The experimental setup utilized a master and worker node configuration on AWS with the m5.large and m5.8xlarge instance types, respectively. This worker node is configured with 32 cores and 128 GiB memory. Both Infrastructure Optimizer and Infrastructure Optimizer Migrate strategies were tested, where the Migrate strategy is distinct for its use of hybrid cloud environments.
This experiment produced results in three separate trials, each involving a MASTODON workload of varying intensity:
1-hour workload: Infrastructure Optimizer achieved a 70% cost reduction while forfeiting only a 6% overhead. For Infrastructure Optimizer Migrate, the results were similar with a 69% reduction at only a 7% overhead.
4-hour workload: Infrastructure Optimizer yielded a 73% cost reduction with a 4.2% performance enhancement. For Infrastructure Migrate, the results were similar with a 72% reduction and a 3.5% performance enhancement.
16-hour workload: Infrastructure Optimizer achieved a 73% cost reduction with a 2.8% performance enhancement. For Infrastructure Optimizer Migrate, the results were similar with a 72% cost reduction and a 2.7% performance enhancement.
Key Metrics
- 73% Cost Reduction
- 4.3% Performance Enhancement
- ZERO Downtime
Furthermore
This research extended the Infrastructure Optimizer platform to support intensive workloads like image processing and machine learning, by creating a peripheral component interconnect (PCI) pass-through to a graphics processing unit (GPU)-based instance. This support enables large, highly compute-intensive Department of Energy applications to reliably run uninterrupted while taking advantage of the spot market's significant discounts.
Exostellar also developed X-Consolidate, which addresses idle resource waste in the cloud by packing idle containers onto a small number of VMs during the idle period, thereby minimizing the number of active machines and reducing the cost of keeping services online. When the workload increases, X-Consolidate relocates containers onto different VMs without any service interruption.
Next Steps
The next steps include expanding the supported cloud environments to include the GovCloud, expanding to Microsoft Azure, evaluating support of interactive workloads where optimizations can be significant, and evaluating the feasibility of supporting GPUs based migration that will allow accelerators to benefit from the cloud optimizations that Exostellar provides.