Since 2012 I transition from storage to high-performance computing (HPC). Today I work with many clients who need to transition from file servers and Network Attached Storage (NAS) to HPC to store and analyze acquired data effectively. In this HPC meets AI series I will introduce basic HPC concepts and show how HPC can enable and improve data-intensive science.
In 2019 the largest super computers of the world comprise thousands of compute nodes and 100s of petabytes of file storage. They are architected, operated and used by teams which have long and deep expertise in HPC.
Nowadays data-intensive science requires HPC resources to be integrated into scientific experiments and into engineering processes. Cameras, genome sequencers, super microscopes and other sensors generate huge amounts of data that need to be stored and analyzed to distill insight out of the acquired data. Whilst the largest super computers of the world transform to super facilities, many organizations need to inaugurate their first HPC system to enable cutting edge science or engineering. Hence the advent of data-intensive science forces IT architects, system administrators and scientists that have no experience with HPC to architect, operate and use HPC systems.
For many use cases a small or a medium size HPC system is enough. Small HPC systems comprise up to a few ten compute nodes and up to a few petabytes of file storage. Medium size HPC systems comprise up to a few hundred compute nodes and up to a few ten petabyte of file storage. Small and medium size HPC systems are not rocket science, but they can get problematic if your business depends on it although you have no experience with HPC.
Starting with HPC I was not aware that there are elementary HPC concepts, techniques and best practices that I need to know. I never looked for education material on HPC basics, because I just was not aware that HPC is a specialization in computer science that needs to be learned like any other specialization such as programming, data bases or large-scale web applications.
In other words: my approach to HPC was naïve, because I was not aware that I need to learn HPC. Fortunately, I work with many experts who are in HPC for decades and who took and take the time to make me aware about my misconception and to answer me any question.
Meanwhile I am aware about the HPC for Dummies guide that is available for free download. This guide is a little bit dated but still a good start into HPC. Please leave a comment if you know any other good introduction to HPC.
Getting familiar with HPC I now advise customers who are transitioning to HPC and are struggling, because like me they are not aware that they need to learn HPC basics and follow HPC best practices. Therefore, I decided to start this HPC meets AI series to share some of my learnings to help IT professionals to transition into HPC. I intend to look at HPC from the view of a seasoned IT architect or a seasoned system administrator that needs to transition to HPC to enable data-intensive science.
Let’s get started. In the next post I will set the context by explaining data-intensive science in more detail.