GPUs became the hardware of choice for deep learning largely by coincidence. The chips were initially designed to quickly render graphics in applications such as video games. Unlike CPUs, which have four to eight complex cores for doing a variety of computation, GPUs have hundreds of simple cores that can perform only specific operations–but the cores can tackle their operations at the same time rather than one after another, shrinking the time it takes to complete an intensive computation.
It didn’t take long for the AI research community to realize that this massive parallelization also makes GPUs great for deep learning. Like graphics-rendering, deep learning involves simple mathematical calculations performed hundreds of thousands of times. In 2011, in a collaboration with chipmaker Nvidia, Google found that a computer vision model it had trained on 2,000 CPUs to distinguish cats from people could achieve the same performance when trained on only 12 GPUs. GPUs became the de facto chip for model training and inferencing–the computational process that happens when a trained model is used for the tasks it was trained for.
But GPUs also aren’t perfect for deep learning. For one thing, they cannot function as a standalone chip. Because they are limited in the types of operations they can perform, they must be attached to CPUs for handling everything else. GPUs also have a limited amount of cache memory, the date storage area nearest a chip’s processors. This means the bulk of the data is stored off-chip and must be retrieved when it is time for processing. The back-and-forth data flow ends up being a bottleneck for computation, capping the speed at which GPUs can run deep-learning algorithms.
In recent years, dozens of companies have cropped up to design AI chips that circumvent these problems. The trouble is, the more specialized the hardware, the more expensive it becomes.
So Neural Magic intends to buck this trend. Instead of tinkering with the hardware, the company modified the software. It redesigned deep-learning algorithms to run more efficiently on a CPU by utilizing the chips’ large available memory and complex cores. While the approach loses the speed achieved through a GPU’s parallelization, it reportedly gains back about the same amount of time by eliminating the need to ferry data on and off the chip. The algorithms can run on CPUs “at GPU speeds,” the company says–but at a fraction of the cost. “It sounds like what they have done is figured out a way to take advantage of the memory of the CPU in a way that people haven’t before,” Thompson says.
Neural Magic believes there may be a few reasons why no one took this approach previously. First, it’s counterintuitive. The idea that deep learning needs specialized hardware is so entrenched that other approaches may easily be overlooked. Second, applying AI in industry is still relatively new, and companies are just beginning to look for easier ways to deploy deep-learning algorithms. But whether the demand is deep enough for Neural Magic to take off is still unclear. The firm has been beta-testing its product with around 10 companies–only a sliver of the broader AI industry.
“We want to improve not just neural networks but also computing overall.”
Neil Thompson
Neural Magic currently offers its technique for inferencing tasks in computer vision. Clients must still train their models on specialized hardware but can then use Neural Magic’s software to convert the trained model into a CPU-compatible format. One client, a big manufacturer of microscopy equipment, is now trialing this approach for adding on-device AI capabilities to its microscopes, says Shavit. Because the microscopes already come with a CPU, they won’t need any additional hardware. By contrast, using a GPU-based deep-learning model would require the equipment to be bulkier and more power hungry.
Another client wants to use Neural Magic to process security camera footage. That would enable it to monitor the traffic in and out of a building using computers already available on site; otherwise it might have to send the footage to the cloud, which could introduce privacy issues, or acquire special hardware for every building it monitors.
Shavit says inferencing is also only the beginning. Neural Magic plans to expand its offerings in the future to help companies train their AI models on CPUs as well. “We believe 10 to 20 years from now, CPUs will be the actual fabric for running machine-learning algorithms,” he says.
Thompson isn’t so sure. “The economics have really changed around chip production, and that is going to lead to a lot more specialization,” he says. Additionally, while Neural Magic’s technique gets more performance out of existing hardware, fundamental hardware advancements will still be the only way to continue driving computing forward. “This sounds like a really good way to improve performance in neural networks,” he says. “But we want to improve not just neural networks but also computing overall.”