Android

Brands Should Let Us Restrict AI Processing To On-device Only

July 24, 2024

Traditional artificial intelligence processing needs the cloud. Mobile devices simply don’t have the computational muscle to handle it locally. But on-device artificial processing is an untapped game changer. With 73% of US companies using AI, according to PwC, it’s important for data security and control over sensitive business information. But not just business. Your everyday tasks matter too.

Think about it. Your phone and other edge devices can run GPT-4 and other models without network connection and the data doesn’t leave the device to a remote server.

Table of Contents

Traditional vs local large language models

Traditional Large Language Models (LLMs) like GPT-3 and GPT-4 are hosted on cloud servers, and have too many roundabouts. First, your device collects data and sends it over the internet to the remote server. Then the server runs the language model and processes the data. The processed results finally come back to your device, still over the internet. The obvious problem with this system is the delay, dependence on internet connection, and being less secure.

Person using Circle to Search feature on Google Pixel phone — Image: Google

I think we’ve seen enough of Google’s Live Translate feature to know why delay isn’t good for anyone in a fast-paced tech world. Any language translator should be able to provide real-time translation during a conversation. If it relies on cloud processing, then God forbid you have network connectivity issues or there’s high latency. The flow of the conversation can easily take an awkward turn. Plus, there’s way too much time for malicious entities to intercept your data or attack the cloud server itself — and you can’t trust that the providers have the best security protocols in place.

In contrast, local language models are more about on-device processing. Everything happens directly on the smartphone or PC. AI uses the device’s resources to handle data processing, which includes hardware like the Central Processing Unit (CPUs), Graphics Processing Unit (GPUs), and Neural Processing Unit (NPUs). Frankly, 100% on-device processing hasn’t happened yet. What we have now is a hybrid approach where models like Google’s Gemini Nano combine on-device processing with the occasional cloud interaction to give you real-time updates. Initially, even the Pixel 8 phone couldn’t run it. But it has since been optimised to improve its on-device capabilities.

Why are we not seeing more LLMs on-device?

It takes a powerful device to handle local LLMs. These models often use complex calculations and large amounts of data, which can strain mobile device resources. While there are serious advancements in mobile hardware, they haven’t yet reached the point where they can match the processing capabilities of powerful cloud servers.

Brands Should Let Us Restrict AI Processing To On-device Only 3 — Image: Google

For example, GSMA states that any AI mobile device is required to have a minimum processing power in Tera Operations Per Second (TOPS) for both int8 and float16 data types. Basically, the device should handle at least one trillion 8-bit integer operations per second and manage at least half a trillion 16-bit floating-point operations per second.

Integer operations involve calculations with whole numbers, like adding 5 and 3 or multiplying 2 and 6. They are key for many AI tasks, especially in image processing, where pixel values are represented as integers. Meanwhile, Floating-point operations involve calculations with numbers that include decimal parts, such as dividing 10 by 3 or calculating the square root of 2.

AI needs them for tasks requiring higher precision, such as training complex neural networks and running simulations. For AI mobile devices, both types of operations are important. Integer operations handle tasks that don’t need high precision, while floating-point operations are necessary for tasks requiring greater accuracy. Basically, on-device AI processing demands a lot of what we can’t afford to provide now.

Phones can’t run LLMs well yet, but chips are getting better

Samsung presentation at CES 2024 with "Artificial Intelligence" showing on a display. — Image Credit: Samsung

Currently, most high-end smartphones can’t even run large language models (LLMs) entirely on-device. They are better suited for handling specific AI tasks like image processing, facial recognition, and basic natural language processing using smaller, optimized models.

To get big language models to run on mobile devices, you have to make the models smaller with quantization and pruning. Quantization uses fewer bits to represent numbers in the model and reduces its memory use to speed up computations. On the other hand, pruning removes less important connections to simplify the model while keeping its main functions intact. Connections are the links between neurons (basic units that process information), each with a weight.

On the bright side, chipsets are improving and top manufacturers like Qualcomm are working hard to push more of them into the market. The Snapdragon 845 came out in 2017, and was a high-end chip at the time. It uses a Hexagon 685 Digital Signal Processor (DSP), a specialized microprocessor that handles audio, video, and communication processing tasks. It also uses the Adreno 630 GPU, and Kryo 385 CPU with 2-3 times faster AI processing than previous models.

Brands Should Let Us Restrict AI Processing To On-device Only 4 — Image: Qualcomm

The Hexagon DSP, now accelerates neural networks and AI inference. More recently, Qualcomm’s is working on a new Snapdragon SoC with Oryon cores that is meant to improve on-device AI capabilities. They’re also optimizing large AI models to fit mobile devices, with advancements in quantization.

On-device AI will make things faster, cheaper, and more efficient

As technology keeps advancing, on-device AI will get even better. Right now, we’re using a mix of local and cloud processing, but soon we’ll be able to run more complex models right on our phones and other devices. Beyond your mobile devices, expect offline functionality for education and areas with poor or no connectivity. You’ll also see faster response times for real-time apps and cost reduction.

Developing and operating large language models (LLMs) is extremely costly, especially for anyone offering AI services to many users. It often exceeds $100 million. Hosting and serving these models require extensive cloud infrastructure to handle the computational load and ensure fast response times for users, adding to the overall cost. Companies and developers will save money when they reduce their reliance on these cloud infrastructure for AI data processing and storage. As a user, you’ll save on data costs in the long run since less data needs to be transferred to and from the cloud.

Source link

Traditional vs local large language models

Why are we not seeing more LLMs on-device?

Phones can’t run LLMs well yet, but chips are getting better

On-device AI will make things faster, cheaper, and more efficient

RELATED ARTICLESMORE FROM AUTHOR

Nubia’s Z60 Ultra and Pro Pack Flagship SoCs And AI Cameras

Ultimate Tower Defense Simulator Free Codes

Best Galaxy S22 Ultra controllers that actually fit

RELATED ARTICLES MORE FROM AUTHOR