Eta Compute calls its device “the world’s most energy-efficient edge AI processor” and is targeting it squarely at the AIoT, or artificial intelligence in internet of things devices. Typical applications are performing sensor fusion, sound classification, image classification or person detection without sending data to the cloud, to minimize power spent on wireless transmission. But with the limited power budgets these IoT endpoints have, power consumption of the chip really has to be less than a milliwatt to make sense, Tewksbury said.

“By virtue of the fact that we have a hundred to a thousand times greater energy efficiency [than competitors], we can do a hundred to a thousand times more inferences for a given battery life, or for the same level of functionality we can extend the battery life by that same factor,” Tewksbury said.

Secret Sauce
How does Eta Compute achieve this level of power consumption with off-the-shelf cores? There are three key ingredients to the company’s secret sauce.

First, a proprietary voltage and frequency scaling technique on which Eta Compute holds seven patents (the company also has eight more pending). Continuous voltage and frequency scaling (CVFS) allows the voltage and clock frequency of both the DSP and the MCU core to be adjusted to meet the variable workloads of IoT devices.

“The internal supply voltage [can be adjusted] commensurate with that clock rate. So when the clock rate is low, we can reduce the voltage all the way down to the minimum required to sustain that clock rate, and when frequency goes up, we increase the voltage. Since power goes as voltage squared, we get an enormous reduction in the power consumption,” Tewksbury said.

Eta Compute block diagram
Eta Compute is using a combination of an Arm Cortex-M3 and an NXP CoolFlex DSP in its ultra-low power AI processor (Image: Eta Compute)

Traditional dynamic voltage and frequency scaling methods are achieved by changing the state of a PLL (phase locked loop), which takes time. Eta Compute’s CVFS technique is achieved without a PLL, since the clock frequency is determined internally via a self-timed architecture.

“Since we don’t have PLLs… we can do this very quickly and continuously, both in terms of time as well as in voltage. So every single clock cycle we’re monitoring the workload and adjusting that clock in such a way that we minimize the energy per inference,” Tewksbury said. “We’re also continuously changing that voltage, so that it’s not just a discrete number of voltages as some of our competitors have, but it can change anywhere from 0.54V all the way up to 1.2V in a continuous manner.”

Another key ingredient is the chip’s hybrid multi-core architecture, a combination of an Arm Cortex-M3 MCU core and an NXP CoolFlex DSP core. The CVFS technique is used on both cores, independently — that is, they can run at different voltages and frequencies to minimize the energy used.

Either (or both) cores can be used for the AI/ML workload, said Tewksbury, pointing out that workloads such as signal conditioning and feature extraction are better suited to the DSP. Workloads are allocated between the cores by software.

The final ingredient in Eta Compute’s secret sauce is optimization of neural networks for specific applications that can increase power efficiency by an order of magnitude compared to designs from the standard TensorFlow framework.

Product Iteration
Eta Compute, founded in 2015, employs 35 people in the US and India and has raised $19 million funding to date.

The ECM3532 is Eta Compute’s first production product. A forerunner, the ECM3531, was only available as engineering samples – it used the same cores, but both SRAM and Flash have been increased in the new version. The previous version also operated CVFS on the microcontroller core, but in the ECM3532 Eta Compute has expanded this technique to both the microcontroller and the DSP cores.

Samples of the ECM3532 are available now and mass production is expected to start in Q2 2020.