Author: Tronserve admin
Tuesday 27th July 2021 02:40 PM
New Optimization Chip Tackles Machine Learning
Engineers at Georgia Tech say they've come up with a programmable prototype chip that appropriately solves a big class of optimization problems, including those needed for neural network training, 5G network routing, and MRI image reconstruction. The chip’s architecture embodies a specific algorithm that breaks up one huge problem into many small problems, works on the subproblems, and shares the results. It does this over and over until it creates the best answer. Compared to a GPU running the algorithm, the prototype chip—called OPTIMO—is 4.77 times as power efficient and 4.18 times as fast.
The training of machine learning systems and an array of other data-intensive work can be cast as a set of mathematical problem called constrained optimization. In it, you are trying to lessen the value of a function under some constraints, explains Georgia Tech professor Arijit Raychowdhury. As an illustration, training a neural net could involve seeking the lowest error rate under the constraint of the size of the neural network.
“If you can accelerate [constrained optimization] using smart architecture and energy-efficient design, you will be able to accelerate a large class of signal processing and machine learning problems,” states Raychowdhury. A 1980s-era algorithm called alternating direction method of multipliers, or ADMM, grown to be the solution. The algorithm solves massive optimization problems by breaking them up and then reaching a solution over several iterations.
“If you want to solve a large problem with a lot of data—say one million data points with one million variables—ADMM permits you to break it up into smaller subproblems,” he says. “You can cut it down into 1,000 variables with 1,000 data points.” Each subproblem is solved and the results incorporated in a “consensus” step with the other subproblems to reach an interim solution. With that interim solution now incorporated in the subproblems, the process is repeated over and over until the algorithm arrives at the optimal solution.
In a common CPU or GPU, ADMM is limited because it needs the movement of a great deal of data. So instead the Georgia Tech group developed a system with a “near-memory” architecture.
“The ADMM framework as a method of solving optimization problems maps easily to a many-core architecture where you have memory and logic in close proximity with some communications channels in between these cores,” says Raychowdhury.
The test chip was comprised of a grid of 49 “optimization processing units,” cores designed to perform ADMM and containing their own high-bandwidth memory. The units were connected to each other in a way that speeds ADMM. Parts of data are circulated to each unit, and they set about solving their individual subproblems. Their outcomes are then accumulated, and the data is adapted and resent to the optimization units to perform the next iteration. The network that connects the 49 units is particularly engineered to speed this gather and scatter process.
The Georgia Tech team, which included graduate student Muya Chang and professor Justin Romberg, showcased OPTIMO at the IEEE Custom Integrated Circuits Conference last month in Austin, Tex.
The chip might be scaled up to do its work in the cloud — adding more cores — or shrunk down to solve problems closer to the edge of the Internet, Raychowdhury says. The primary constraint in optimizing the number of cores in the prototype, he jokes, was his graduate students’ time.
Source: IEEE Spectrum