Tech News

Researchers upend AI status quo by eliminating matrix multiplication in LLMs

June 25, 2024

113

[ad_1]

Enlarge / Illustration of a mind within a lightweight bulb.

Researchers declare to have developed a brand new option to run AI language fashions extra effectively by eliminating matrix multiplication from the method. This basically redesigns neural community operations which are at present accelerated by GPU chips. The findings, detailed in a recent preprint paper from researchers on the College of California Santa Cruz, UC Davis, LuxiTech, and Soochow College, may have deep implications for the environmental impact and operational prices of AI methods.

Matrix multiplication (typically abbreviated to “MatMul”) is on the center of most neural community computational duties as we speak, and GPUs are significantly good at executing the mathematics rapidly as a result of they’ll carry out giant numbers of multiplication operations in parallel. That potential momentarily made Nvidia the most valuable company on the earth final week; the corporate at present holds an estimated 98 percent market share for knowledge heart GPUs, that are generally used to energy AI methods like ChatGPT and Google Gemini.

Within the new paper, titled “Scalable MatMul-free Language Modeling,” the researchers describe making a {custom} 2.7 billion parameter mannequin with out utilizing MatMul that options comparable efficiency to standard giant language fashions (LLMs). Additionally they display operating a 1.3 billion parameter mannequin at 23.8 tokens per second on a GPU that was accelerated by a custom-programmed FPGA chip that makes use of about 13 watts of energy (not counting the GPU’s energy draw). The implication is {that a} extra environment friendly FPGA “paves the way in which for the event of extra environment friendly and hardware-friendly architectures,” they write.

The paper would not present energy estimates for typical LLMs, however this post from UC Santa Cruz estimates about 700 watts for a standard mannequin. Nonetheless, in our expertise, you may run a 2.7B parameter model of Llama 2 competently on a house PC with an RTX 3060 (that makes use of about 200 watts peak) powered by a 500-watt energy provide. So, if you happen to may theoretically fully run an LLM in solely 13 watts on an FPGA (and not using a GPU), that might be a 38-fold lower in energy utilization.

The approach has not but been peer-reviewed, however the researchers—Rui-Jie Zhu, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, and Jason Eshraghian—declare that their work challenges the prevailing paradigm that matrix multiplication operations are indispensable for constructing high-performing language fashions. They argue that their strategy may make giant language fashions extra accessible, environment friendly, and sustainable, significantly for deployment on resource-constrained {hardware} like smartphones.

Taking away matrix math

Within the paper, the researchers point out BitNet (the so-called “1-bit” transformer approach that made the rounds as a preprint in October) as an vital precursor to their work. In response to the authors, BitNet demonstrated the viability of utilizing binary and ternary weights in language fashions, efficiently scaling as much as 3 billion parameters whereas sustaining aggressive efficiency.

Nonetheless, they be aware that BitNet nonetheless relied on matrix multiplications in its self-attention mechanism. Limitations of BitNet served as a motivation for the present examine, pushing them to develop a totally “MatMul-free” structure that might preserve efficiency whereas eliminating matrix multiplications even within the consideration mechanism.

[ad_2]

Source link

Researchers upend AI status quo by eliminating matrix multiplication in LLMs

Taking away matrix math

Recent Posts

Greece months away from investment-grade rating, says central bank chief

Chinese bank executive in line to take the helm at PBoC

Former Bitcoin Dev Gavin Andresen Revises 2016 Blog Post, Calls Trust in Craig Wright...

Winner of Canadian Lottery Jackpot Says Impostors Using His Name to Steal Bitcoins –...

In Sudden Alarm, Tech Doyens Call for a Pause on ChatGPT

Urban Company Lured Women Into the Gig Economy—Then Pushed Them Out | WIRED

FirstFT: Global regulators set sights on stricter banking rules

America’s already-dreadful maternal mortality rate looks set to rise

Solana (SOL) Struggles As Single-digit Price Knocks; Will Bulls Buy Below $10?

Would You Sell Your Vacation Days for Cash?

POPULAR POSTS

29 of the Best SEO Tools for Auditing & Monitoring Your...

Fruit and veg shortages push UK food inflation to new high

DNA Confirms Oral History of Swahili People

POPULAR CATEGORY