As organizations that can’t employ public clouds to drive artificial intelligence (AI) applications start looking toward supercomputers based on graphical processor units (GPUs), many of them are encountering I/O issues that their legacy storage systems are unable to address. With that issue in mind, NVIDIA and NetApp have partnered to create NetApp ONTAP AI architecture based on NVIDIA DGX supercomputers and NetApp AFF A800 all-flash storage.
Octavian Tanase, senior vice president for ONTAP at NetApp, says the NetApp approach makes it possible for all-Flash storage systems based on NVMe interfaces connected to NVIDIA supercomputers to pull data from the cloud when needed to run an AI model four times faster than competing storage systems. NVIDIA has established a technology lead over rivals such as Intel by focusing on deep learning algorithms, also known as neural networks, that are employed to run more complex AI applications. Intel, in contrast, dominates when it comes to processing machine learning algorithms.
Tanase says NetApp views AI as a major opportunity because AI applications require access to massive amounts of data to be trained. That requirement creates significant I/O challenges that are best addressed by network-attached storage systems capable of handling terabytes of data at high speeds. In addition, Tanase notes that NetApp has also embraced NVMe-over-Fabric to drive storage performance across multiple systems connected on the same network.
That approach means data scientists don’t have to spend nearly as much time configuring IT infrastructure to run advanced AI models, adds Tanase.
“NetApp ONTAP AI is a lot more than just a reference architecture,” says Tanase.
For example, Tanase says NetApp and NVIDIA are also working on developing blue prints that organizations can follow to set up data pipelines needed to train and drive AI applications.
As organizations continue to experiment with AI, it’s clear that data scientists driving these projects are looking toward internal IT departments to take on more of the heavy lifting when it comes to managing the IT infrastructure that drives these applications. In many cases, that means managing external cloud services. But over time, AI models will be built and deployed in on-premises environments, as well. It should, however, be already obvious to all involved that legacy IT infrastructure is in no way up to the AI task at hand.