Fastest AI inference chip for edge systems announced

October 20, 2020 //By Rich Pell
Fastest AI inference chip for edge systems announced
Embedded FPGA IP and software company Flex Logix has announced working silicon of what it claims is the fastest and most efficient AI edge inference chip.

The InferX X1 inference chip accelerates performance of neural network models such as object detection and recognition, and other neural network models, for robotics, industrial automation, medical imaging, gene sequencing, bank security, retail analytics, autonomous vehicles, aerospace and more. It runs YOLOv3 object detection and recognition 30% faster than Nvidia's Jetson Xavier and runs other real-world customer models up to ten times faster, according to the company.

Many customers plan to use YOLOv3 in their products in robotics, bank security, and retail analytics because it is the highest accuracy object detection and recognition algorithm, says the company, while additional customers have custom models they have developed for a range of applications where they need more throughput at lower cost. The company says it has benchmarked models for these applications and demonstrated to these customers that InferX X1 provides the needed throughput and lower cost.

"Customers with existing edge inference systems are asking for more inference performance at better prices so they can implement neural networks in higher volume applications," says Geoff Tate, CEO and co-founder of Flex Logix. "InferX X1 meets their needs with both higher performance and lower prices. InferX X1 delivers a 10-to-100 times improvement in inference price/performance versus the current industry leader."

The InferX X1 silicon area is 54mm 2 - 1/5th the size of a penny. Its high-volume price, says the company, is as much as 10 times lower than Nvidia's Xavier NX, enabling high-quality, high-performance AI inference for the first time to be implemented in mass market products selling in the millions of units.

Technology details and specifications include the following:

  • High MAC utilization up to 70% for large models/images translates into less silicon area/cost
  • 1-Dimensional Tensor Processors (1D TPUs) are a 1D systolic array
    • 64 byte input tensor
    • 64 INT8 MACs
    • 32 BF16 MACs
    • 64 byte x 256 byte weight matrix
    • One dimensional systolic array produces an output tensor every 64 cycles using 4096

Vous êtes certain ?

Si vous désactivez les cookies, vous ne pouvez plus naviguer sur le site.

Vous allez être rediriger vers Google.