How LLMs on the Edge Could Help Solve the AI Data Center Problem
Locally run AI systems, known as LLMs on the edge, could help ease the strain on data centers, but it may take some time before this approach goes mainstream.
September 18, 2024
There has been plenty of coverage on the problem AI poses to data center power. One way to ease the strain is through the use of ‘LLMs on the edge’, which enables AI systems to run natively on PCs, tablets, laptops, and smartphones.
The obvious benefits of LLMs on the edge include lowering the cost of LLM training, reduced latency in querying the LLM, enhanced user privacy, and improved reliability.
If they’re able to ease the pressure on data centers by reducing processing power needs, LLMs on the edge could have the potential to eliminate the need for multi-gigawatt-scale AI data center factories. But is this approach really feasible?
With growing discussions around moving the LLMs that underpin generative AI to the edge, we take a closer look at whether this shift can truly reduce the data center strain.
Smartphones Lead the Way in Edge AI
Michael Azoff, chief analyst for cloud and data center research practice at Omdia, says the AI-on-the-edge use case that is moving the fastest is lightweight LLMs on smartphones.
Huawei has developed different sizes of its LLM Pangu 5.0 and the smallest version has been integrated with its smartphone operating system, HarmonyOS. Devices running this include the Huawei Mate 30 Pro 5G.
Samsung, meanwhile, has developed Gauss LLM that is used in Samsung Galaxy AI, which operates in its flagship Samsung S24 smartphone. Its AI features include live translation, converting voice to text and summarizing notes, circle to search, and photo and message assistance.
Samsung has also moved into mass production of its LPDDR5X DRAM semiconductors. These 12-nanometer chips process memory workloads directly on the device, enabling the phone’s operating system to work faster with storage devices to more efficiently handle AI workloads.
Smartphone manufacturers are experimenting with LLMs on the edge.
Overall, smartphone manufacturers are working hard to make LLMs smaller. Instead of ChatGPT-3’s 175 billion parameters, they are trying to reduce them to around two billion parameters.
Intel and AMD are involved in AI at the edge, too. AMD is working on notebook chips capable of running 30 billion-parameter LLMs locally at speed. Similarly, Intel has assembled a partner ecosystem that is hard at work developing the AI PC. These AI-enabled devices may be pricier than regular models. But the markup may not be as high as expected, and it is likely to come down sharply as adoption ramps up.
“The expensive part of AI at the edge is mostly on the training,” Azoff told Data Center Knowledge. “A trained model used in inference mode does not need expensive equipment to run.”
He believes early deployments are likely to be for scenarios where errors and ‘hallucinations’ don't matter so much, and where there is unlikely to be much risk of reputational damage.
Examples include enhanced recommendation engines, AI-powered internet searches, and creating illustrations or designs. Here, users are relied on to detect suspect responses or poorly represented images and designs.
Data Center Implications for LLMs on the Edge
With data centers preparing for a massive ramp-up in density and power needs to support the growth of AI, what might the LLMs on the edge trend mean for digital infrastructure facilities?
In the foreseeable future, models running on the edge will continue to be trained in the data center. Thus, the heavy traffic currently hitting data centers from AI is unlikely to wane in the short term. But the models being trained within data centers are already changing. Yes, the massive ones from the likes of OpenAI, Google, and Amazon will continue. But smaller, more focused LLMs are in their ascendency.
“By 2027, more than 50% of the GenAI models that enterprises use will be specific to either an industry or business function – up from approximately 1% in 2023,” Arun Chandrasekaran, an analyst at Gartner, told Data Center Knowledge. “Domain models can be smaller, less computationally intensive, and lower the hallucination risks associated with general-purpose models.”
The development work being done to reduce the size and processing intensity of GenAI will spill over into even more efficient edge LLMs that can run on a range of devices. Once edge LLMs gain momentum, they promise to reduce the amount of AI processing that needs to be done in a centralized data center. It is all a matter of scale.
For now, LLM training largely dominates GenAI as the models are still being created or refined. But imagine hundreds of millions of users using LLMs locally on smartphones and PCs, and the queries having to be processed through large data centers. At scale, that amount of traffic could overwhelm data centers. Thus, the value of LLMs on the edge may not be realized until they enter the mainstream.
LLMs on the Edge: Security and Privacy
Anyone interacting with an LLM in the cloud is potentially exposing the organization to privacy questions and the potential for a cybersecurity breach.
As more queries and prompts are being done outside the enterprise, there are going to be questions about who has access to that data. After all, users are asking AI systems all sorts of questions about their health, finances, and businesses.
To do so, these users often enter personally identifiable information (PII), sensitive healthcare data, customer information, or even corporate secrets.
The move toward smaller LLMs that can either be contained within the enterprise data center – and thus not running in the cloud – or that can run on local devices is a way to bypass many of the ongoing security and privacy concerns posed by broad usage of LLMs such as ChatGPT.
“Security and privacy on the edge are really important if you are using AI as your personal assistant, and you're going to be dealing with confidential information, sensitive information that you don't want to be made public,” said Azoff.
Timeline for Edge LLMs
LLMs on the edge won’t become apparent immediately – except for a few specialized use cases. But the edge trend appears unstoppable.
Forrester’s Infrastructure Hardware Survey revealed that 67% of infrastructure hardware decision-makers in organizations have adopted edge intelligence or were in the process of doing so. About one in three companies will also collect and perform AI analysis of edge environments to empower employees with higher- and faster-value insight.
“Enterprises want to collect relevant input from mobile, IoT, and other devices to provide customers with relevant use-case-driven insights when they request them or need greater value,” said Michele Goetz, a business insights analyst at Forrester Research.
“We should see edge LLMs running on smartphones and laptops in large numbers within two to three years.”
Pruning the models to reach a more manageable number of parameters is one obvious way to make them more feasible on the edge. Further, developers are shifting the GenAI model from the GPU to the CPU, reducing the processing footprint, and building standards for compiling.
As well as the smartphone applications noted above, the use cases that lead the way will be those that are achievable despite limited connectivity and bandwidth, according to Goetz.
Field engineering and operations in industries such as utilities, mining, and transportation maintenance are already personal device-oriented and ready for LLM augmentation. As there is business value in such edge LLM applications, paying more for an LLM-capable field device or phone is expected to be less of an issue.
Read more of the latest data center hardware news
Widespread consumer and business use of LLMs on the edge will have to wait until hardware prices come down as adoption ramps up. For example, Apple Vision Pro is mainly deployed in business solutions where the price tag can be justified.
Other use cases on the near horizon include telecom and network management, smart buildings, and factory automation. More advanced used cases for LLMs on the edge – such as immersive retail and autonomous vehicles – will have to wait five years or more, according to Goetz.
“Before we can see LLMs on personal devices flourish, there will be a growth in specialized LLMs for specific industries and business processes,” the analyst said.
“Once these are developed, it is easier to scale them out for adoption because you aren’t training and tuning a model, shrinking it, and deploying it all at the same time.”
About the Author
You May Also Like