ICE ClusterWare Solution Brief

Page 1


ICE ClusterWare Features

Rapid image-based provisioning for fast cluster boot times

Comprehensive performance, status, and health monitoring of all cluster components

Heterogeneous OS and hardware support

Cluster configurations and logical resource isolation to enable multi-tenancy

Web GUI and rich set of CLIs driven by REST APIs for administrator management

Dashboards for cluster observability, visualization, and analytics

High interoperability to integrate software stacks of choice

Ansible-based orchestration and automation for DevOps and infrastructure-as-code

Support for variety of industry standard batch or containerbased workload managers

HA configuration enabling any head node to serve any compute node

Node management for Nvidia

Cumulus Linux and Edge-core SONiC switches

Support for strictest security protocols required in secure, controlled environments

ICE ClusterWare™

AI and HPC Cluster Management Software

ICE ClusterWareTM is a hardware-agnostic, intelligent cluster management platform that transforms bare-metal hardware, networking, and software resources into reliable high-performance infrastructure. It simplifies deployment and management, delivers real-time health monitoring, and, with ClusterWare AIM services, enables peak performance for enterprise-scale AI and HPC workloads.

ICE ClusterWare provides comprehensive management functionality, including node provisioning, boot and runtime management, real-time monitoring, automated image deployment, and resilient data storage. It also serves as a platform for additional software and schedulers. With these capabilities, customers can seamlessly scale operations, automate best practices, and optimize AI and HPC cluster performance with confidence.

The ICE ClusterWare web console features a node display, allowing admins to quickly gain a status overview. Admins can drill down into nodes to see the associated attributes and other details.

ICE ClusterWare ships with Grafana for powerful, customizable health monitoring and alerting. Standard cluster-wide and node dashboards are provided.

Tame the Complexity of AI and HPC Infrastructure at Scale

AI is advancing rapidly, and companies are challenged to adopt, manage, and optimize their AI infrastructure. Unlike traditional HPC architectures, AI clusters present unique challenges. Job requirements, admin experience, cluster size, and security needs add layers of complexity. ICE ClusterWare’s intuitive tools simplify the deployment and management of thousands of nodes, streamline administration, and optimize compute resources for both experienced admins and new admins alike.

Scalable Deployment and Management for any Environment

Penguin Solutions ICE ClusterWare simplifies cluster deployment and management for administrators and system architects with the following key features:

Rapid, image-based provisioning – Minimizes steps and reduces errors when adding new nodes

Flexible management options – Intuitive web-based GUI, powerful command-line interface (CLI), and RESTful API for seamless control

Advanced resource allocation – Logical resource provisioning, dynamic workload partitioning, and multi-tenancy governance to ensure secure, efficient cluster usage

Scheduler-agnostic integration – Natively supports Slurm, Torque, OpenPBS, Kubernetes, and other workload managers

Containerization for IT efficiency – Enables easy integration with IT management tools and accelerates workload deployment

Seamless CI/CD pipeline integration – Automates image management, software deployment, and hardware driver updates using Git-based continuous integration/continuous delivery (CI/CD)

Stateless compute architecture – Supports diskless booting, ensuring immutable configurations, high reliability, and effortless scalability

Robust Monitoring and Health Management

Continuous monitoring and health management provides administrators with the visibility and status required for on-going cluster optimization, development, and resilience. Features include:

Real-time performance monitoring – Customizable alerts and notifications via Grafana, email, and webhooks to proactively address issues

Centralized logging and auditing – Comprehensive system logs, authentication tracking, and RBAC activity monitoring for enhanced security and compliance

Comprehensive performance metrics – Monitors compute nodes, networking, and GPU/CPU utilization, providing deep insights into cluster health

Automated health management – Advanced error detection with detailed per-node metrics, enabling proactive issue resolution and self-healing capabilities

Highly Secure Operations

ICE ClusterWare supports strict security protocols and restricted environment requirements to maintain high security standards through features, including intranode encryption. ClusterWare security includes support for:

SELinux enforcement – Supports both Targeted and Multi-Level Security (MLS) policies for granular access control FIPS 140-2 compliance – Ensures cryptographic security in sensitive computing environments

Security Technical Implementation Guides (STIGs) – Adheres to rigorous security configurations for hardened deployments

Air-gapped deployment support – Enables fully isolated, offline environments for maximum security

Trusted Platform Module (TPM) encryption – Enhances disk security with LUKS-based encryption

About Penguin Solutions

The most exciting technological advancements are also the most challenging for companies to adopt. At Penguin Solutions®, we support our customers in achieving their ambitions across our computing, memory, and LED lines of business. With our expert skills, experience, and partnerships, we turn our customers’ most complex challenges into compelling opportunities.

For more information, visit https://www.penguinsolutions.com.

Learn More

Sign Up for a Demo or Evaluation of ClusterWare

Reach out to us at sales@penguincomputing.com

Visit www.penguinsolutions.com to learn more about our software.

© 2025 Penguin Solutions, Inc. All rights reserved. Penguin Computing and Relion are trademarks or registered trademarks of Penguin Solutions. All other product names, trademarks and registered trademarks are the property of their respective owners. All company, product and service names used in this document are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.

Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.