Datasheet
ICE ClusterWare™
AI and HPC Cluster Management Software

Overview
Penguin Solutions ICE ClusterWareTM software is a hardware-agnostic, intelligent cluster management platform that transforms bare-metal hardware, networking, and software resources into reliable high-performance infrastructure. It simplifies deployment and management, delivers real-time health monitoring and advanced performance optimization, and enables administrators to allocate and provision secure computing resources for multiple user communities within the cluster.
With these capabilities, customers can seamlessly scale operations, automate best practices, and optimize AI and HPC cluster performance with confidence.

Key Features
Rapid Deployment
• High-speed image-based provisioning for fast cluster boot times
• H eterogeneous OS and hardware support
• Ansible-based orchestration and automation for DevOps and infrastructure-as-code
• High availability configuration enabling any head node to serveany compute node
Health Management and Performance Assurance
• Comprehensive performance, status, and health monitoring of cluster components
• Advanced performance optimization with anomaly detection and automated remediation
Secure Resource Sharing
• N etwork-isolated multi-tenancy for robust workload and data security
Security Standards and Governance
• S upport for strictest security protocols required in secure, controlled environments

The ICE ClusterWare user interface (UI) includes a node display to provide administrators with a clear, at-a-glance view of system status. Admins can also drill down into nodes to see the associated attributes and other details.

ICE ClusterWare comes with Grafana to provide powerful, customizable dashboards for cluster-wide health monitoring and alerting.
Achieve operational excellence at every stage of your AI journey
As organizations scale AI deployments from pilots to enterprise-wide production, infrastructure requirements evolve. Production environments demand peak cluster performance and secure, multi-tenant resource sharing across multiple user groups or customers. Whether you’re starting an AI initiative or scaling across the enterprise, ICE ClusterWare simplifies deployment and management, streamlines administration, and optimizes compute resources for peak efficiency.
Built on operational intelligence from billions of GPU runtime hours, ICE ClusterWare enables teams to optimize AI infrastructure and sustain peak cluster performance—elevating operations from infrastructure management to operational excellence.
Accelerate cluster deployment
Rapid, image-based provisioning
Penguin Solutions ICE ClusterWare simplifies cluster deployment, management, and performance optimization for administrators through the following key features:
Minimizes steps, reduces errors, and dramatically cuts new node provisioning time
Hardware-agnostic
Heterogeneous OS and hardware support with no vendor lock-in or open-source complexity
Flexible management options
Intuitive web-based GUI, powerful command-line interface (CLI), and RESTful API for seamless control
Highly secure resource sharing
Network-isolated multi-tenancy for robust workload and data security, regulatory governance and compliance, and advanced resource allocation
Scheduler-agnostic integration
Supports Slurm, OpenPBS, Kubernetes, and other workload managers
Containerization for IT efficiency
Enables easy integration with IT management tools and accelerates workload deployment
Seamless CI/CD pipeline integration
Automates image management, software deployment, and hardware driver updates using Git-based continuous integration/continuous delivery (CI/CD)
Stateless compute architecture
Supports diskless booting, ensuring immutable configurations, high reliability, and effortless scalability
Optimize cluster performance
Real-time performance monitoring
Continuous monitoring and health management, combined with advanced performance optimization, provides administrators with the tools required for ongoing cluster optimization, development, and resilience. Key features include:
Customizable alerts and notifications via Grafana, email, and webhooks to proactively address issues
Comprehensive performance metrics
Monitors compute nodes, networking, and GPU/CPU utilization, providing deep insights into cluster health
SELinux enforcement
Supports both Targeted and Multi-Level Security (MLS) policies for granular access control
Security Technical Implementation Guides (STIGs)
Adheres to rigorous security configurations for hardened deployments
FIPS 140-2 compliance
Ensures cryptographic security in sensitive computing environments
Trusted Platform Module (TPM) encryption
Enhances disk security with LUKS-based encryption
Air-gapped deployment support
Enables fully isolated, offline environments for maximum security
Contact Us
Centralized logging and auditing
Comprehensive system logs, authentication tracking, and RBAC activity monitoring for enhanced security and compliance
Advanced performance optimization
Intelligent automation that proactively identifies and resolves hidden issues that degrade cluster performance
ICE ClusterWare supports strict security protocols and restricted environment requirements to maintain high security standards. It’s security includes support for:

For sales queries, please contact sales@penguinsolutions.com
To learn more about the ICE ClusterWareTM and other Penguin Solutions products, please visit: www.penguinsolutions.com