Nextflow on OCI: Hybrid x86, Arm & GPU Pipeline for Cost-Performance

by Rachel Kim – Technology Editor

Oracle Cloud Infrastructure (OCI) is now supporting a single Nextflow pipeline capable of utilizing x86, Arm, and GPU processors concurrently, a configuration designed to optimize cost and performance for complex computational tasks. The capability, detailed in a recent Oracle blog post, leverages label-based routing to direct each stage of a workflow to the most appropriate OCI compute shape.

The system dynamically provisions resources using Infrastructure as Code (IaC), streamlining deployment, and scaling. Artifacts generated during pipeline execution are stored in OCI Object Storage. A demonstration pipeline detailed by Oracle involves initial quality control steps performed on x86 processors, followed by a fanout stage utilizing Arm-based compute, and concluding with GPU-accelerated consolidation.

This heterogeneous computing approach addresses a challenge highlighted by Nextflow users, particularly those employing services like AWS Batch. A recent issue raised on the Nextflow GitHub repository (#5570) detailed limitations in splitting workloads across multiple GPUs within a single instance. Users reported that submitting multiple GPU-enabled tasks to Batch could result in all tasks attempting to utilize all GPUs simultaneously, leading to collisions and memory issues. Current workarounds include using specific machine sizes limited to a single GPU, employing environment variables to assign tasks to individual GPUs, or restricting the number of concurrent tasks to one.

The Nextflow issue suggests a need for a mechanism to dedicate specific GPU ranges to tasks, allowing for more granular control over resource allocation. While the discussion explored potential implementations, such as process arrays or executor-specific solutions, the core issue remains the lack of a native Nextflow variable to facilitate this assignment.

Implementing GPU acceleration within Nextflow requires specific configuration parameters depending on the container engine used. Docker requires the use of the `–gpus all` flag, while Singularity utilizes the `–nv` flag to access local GPU drivers, according to a recent article on Medium. Nextflow itself allows the container engine to be specified at runtime.

OCI’s support for this mixed-processor approach is bolstered by bare metal compute instances, low-latency RDMA cluster networking, and GPU acceleration, creating an environment that aims to replicate the performance characteristics of on-premises infrastructure, as demonstrated in a recent YouTube presentation.

You may also like

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.