Imagination Technologies has launched a scalable neural network accelerator IP core optimised for automotive and autonomous systems but also aimed at industrial designs.
The Series4 Neural Network Accelerator (NNA) core has been optimised for the YOLOv3 neural network framework, for processing large, rectangular images, rather than a general purpose execution unit.
It is aimed at developer of system-on-chip devices for sensor fusion in high performance autonomous vehicles such as robotaxis, last mile delivery and automated street sweepers.
The NNA core achieves 12.5TOPS of performance through 4096 multiply accumulate (MAC) units in 1mm2 on a 5nm process technology, all connected by a 256 network on chip (NOC). This that is over 20x faster than an embedded GPU and 1000x faster than an embedded CPU for AI inference says the company.
Up to 8 cores can be combined in a low latency cluster with 100TOPS, while multiple clusters can be placed on chip for even higher performance for Level 3 and Level 4 autonomous operation. It has been designed as part of an ISO26262 automotive safety process.
“We have already licensed one of these cores into a system on chip design,” said Andrew Grant, Senior Director for Artificial Intelligence at Imagination Technologies.
The core also uses a technique called Tensor Tiling that reduces bandwidth up to 90 percent by splitting input data tensors into multiple tiles for efficient data processing. This exploits local data dependencies to keep intermediate data in on-chip memory.
“It’s a tiling algorithm that allows you to group the network layers, looking at the workloads and using the on-chip SRAM tightly coupled to segment the workloads and adjust for the maximum workload,” said Grant.
For higher performance than 100TOPS a chip can use multiple clusters linked via the AXI bus. “You need to minimise the traffic between clusters so it’s more of a system design. When you go to 600TOPS you have to work with the customer to coordinate all