Light-based machine learning system can produce more powerful, efficient large language models MIT News

ChatGPT has made headlines around the world for its ability to write papers, emails, and computer code based on a few prompts from users. Now, an MIT-led team reports a system that can produce machine learning programs that are orders of magnitude more powerful than the program behind ChatGPT. The system they developed also uses orders of magnitude less energy than the most advanced supercomputers behind today’s machine learning models.

In the July 17 issue Nature Photonics, researchers report the first experimental demonstration of a new system that uses lasers on the order of hundreds of microns to perform calculations based on the movement of light rather than the movement of electrons. The team reports that the new system is more than 100 times more energy efficient and more than 25 times more computationally dense (a measure of a system’s capabilities) than state-of-the-art machine learning digital computers.

Toward the future

In the paper, the team also noted that “future improvements will require improvements of several orders of magnitude.” As a result, the authors continue, the technology “opens a pathway for large-scale optoelectronic processors to accelerate machine learning tasks from data centers to distributed edge devices.” In other words, phones and other small devices may be able to run tasks that currently only allow Programs computed in large data centers.

Furthermore, because the system’s components can be created using manufacturing processes already in use today, “we expect it to be expanded to commercial use within a few years. For example, the laser arrays involved are widely used in mobile phone facial recognition and data communications.” ” said first author Zaijun Chen. Assistant Professor at the University of Southern California.

“The scale of ChatGPT is limited by the capabilities of today’s supercomputers,” said Dirk Englund, associate professor in MIT’s Department of Electrical Engineering and Computer Science and leader of the work. “It is not economically feasible to train larger models. Our new Technology can make it possible to move beyond machine learning models that would otherwise be unattainable in the near future.”

He continued, “We don’t know what the next generation of ChatGPT would be capable of if it were 100 times more powerful, but this is the discovery mechanism this technology could allow.” Englund is also the MIT Quantum Photonics Experimenter Head of the laboratory, affiliated with RLE and Materials Research Laboratory.

the drumbeat of progress

The current work is the latest culmination of ongoing progress made by Englander and many colleagues over the past few years. For example, in 2019, the Englund team reported the theoretical work that led to the current demonstration. The paper’s first author, Ryan Hamerly (now at RLE and NTT Research Inc.), is also a co-author.

Other current co-authors Nature Photonics Papers include Alexander Sludds, Ronald Davis, Ian Christen, Liane Bernstein and Lamia Ateshian, all from RLE; and Tobias Heuser, Niels Heermeier, James A. Lott and Stephan Reitzenststein from TU Berlin.

The deep neural network (DNN) behind ChatGPT is based on a massive machine learning model that simulates how the brain processes information. However, even as the field of machine learning continues to advance, the digital technology behind today’s DNNs has reached its limits. Additionally, they require large amounts of energy and are mostly limited to large data centers. This drives the development of new computing paradigms.

Using light instead of electrons to run DNN calculations has the potential to break through current bottlenecks. For example, calculations using optics may consume less energy than calculations based on electrons. Additionally, Chen said, with optics, “you can have much greater bandwidth,” or computational density. Light can convey more information in a smaller area.

But current optical neural networks (ONNs) face huge challenges. For example, they use a lot of energy because they are inefficient at converting incoming data based on electrical energy into light. Furthermore, the components involved are bulky and take up a lot of space. While ONNs are very good at linear computations (such as addition), they are not good at nonlinear computations (such as multiplication and “if” statements).

In the current work, the researchers introduce a compact architecture that addresses all of these challenges for the first time, and two more simultaneously. The architecture is based on state-of-the-art vertical surface emitting laser (VCSEL) arrays, a relatively new technology used in applications such as lidar remote sensing and laser printing. The specific VCEL reported in the report Nature Photonics The paper was developed by the Reitzenstein group at the Technical University of Berlin. “This is a collaborative project and it wouldn’t be possible without them,” Hamery said.

Logan Wright, an assistant professor at Yale University who was not involved in the current study, commented: “The work of Zaijun Chen et al. is encouraging to me and to the idea that systems based on modulated VCSEL arrays may be a viable way to implement large-scale, high-speed optical neural networks.” and many other researchers in the field. Of course, the current state of the art is still far from the scale and cost required for actually useful devices, but I’m optimistic about what can be achieved in the next few years, especially given the acceleration that these systems must Potential for very large-scale, very expensive AI systems such as those used in popular text “GPT” systems such as ChatGPT.”

Chen, Hamerly and Englund have patented this work, which was sponsored by the U.S. Army Research Office, NTT Research, the U.S. Defense Science and Engineering Graduate Fellowship Program, the U.S. National Science Foundation, the Natural Sciences and Engineering Research Council of Canada and the Volkswagen Foundation.

Source link

Leave a Reply

Your email address will not be published. Required fields are marked *