Public and private sectors join forces to solve HPC software problem

Software deployment in high performance computing is becoming more fragmented as organizations choose tools in their walled garden environments.

But a new organization formed under the Linux Foundation could bring some order to the chaos.

The non-profit organization at Supercomputing 2023 announced its intention to create the High-Performance Software Foundation (HPSF), which will encourage the development and sharing of development tools for massive computing resources.

Public-private participation should also promote software innovation through collaboration.

U.S. national laboratories that are part of the Department of Energy’s Exascale Computing Project are joining the project and will make contributions, said Lori Diachin, project director at the DoE.

Private sector members include Intel, Kitware and Nvidia, all major players in the HPC market.

HPC began in the 1940s and has fragmented over time due to limited access to computing resources due to security concerns, researchers from Lawrence Livermore National Laboratory and the National Center for Supercomputing Applications said in a research paper published this year.

Government organizations working on applications related to national security developed their systems to restrict access and protect sensitive information, the researchers said.

But things started to change with developer social sites like Github and Gitlab as coders started sharing resources.

HPC is also shifting towards accelerated computing, which adds more software layers to the development process. Developers download code from archives, but have firewalls to ensure the programs are safe to use.

The national labs include a continuous integration model where code proposals are tested before being added to the development cycle. Labs also typically own proprietary applications that they don’t want to share.

READ MORE  75% of software engineers faced retaliation the last time they reported misconduct, ETHRWorldSEA

Labs already provides many open source tools for HPC. However, development environments become complicated, with various accelerators added to HPC systems.

For example, the upcoming Jupiter exascale supercomputer in Europe may include quantum systems along with Nvidia GPU accelerators. There are tools to seamlessly decompose the code for execution between the processors, but add new layers of separate libraries and compilers.

Nvidia’s GPUs require proprietary, free CUDA tools and compilers to create binaries that take advantage of the full computing power of its GPUs.

The typical HPC software development cycle starts with the application, which is then moved to libraries and decomposed to the infrastructure layer (such as Docker). It then goes to the compiler/toolchain (LLVM, GCC or OneAPI) and OS (Linux) and finally reaches the hardware systems and accelerators, which could include GPUs or FPGAs.

An ARES multi-physics codebase at Lawrence Livermore National Lab (LLNL) has 31 internal proprietary packages, 13 of which are open source packages developed at LLNL. These rely on 72 external open source software packages.

The added layers of hardware, compilers, and other tools create a matrix of complicated software dependencies that become difficult to audit. It can add many vulnerabilities that national labs want to keep closed, primarily to maintain code integrity and protect system access from malicious code.

“Technical, security, and policy issues make it extremely difficult to integrate externally developed open source software with internal applications and machines. Although many HPC software projects are developed in the open, they must run on closed HPC resources, and this is increasingly difficult to ensure that the vast majority of modern open source applications will run reliably on HPC systems,” the researchers from LLNL and NCSA. said.

READ MORE  The role of generative AI in software testing and quality assurance

HPSF seeks to solve this problem and create a common and stable open source computing environment that can be used reliably across HPC computing environments.

HPSF will officially form in May 2024. The open software packages that will be part of the project include Spack, the popular package manager, Kokkos, AMReX, WarpX, TrilinosApptainer, VTK-m, HPCToolkit; and E4S, which is the Extreme Scale Software Stack.

HPSF aims to standardize the open software stack, providing an easier path to deploying software packages.