Gemma 4 on Arm brings fast, privacy-preserving, power-efficient AI directly onto Android devices, helping developers deliver richer real-time app experiences to billions of users without relying on the cloud.
Real-time assistance, seamless communication, and greater personalization are now baseline expectations for billions of smartphone users worldwide. Highly capable on-device AI that operates in the power envelope of modern smartphones is essential to delivering instant, intelligent experiences at scale, while unlocking AI’s future potential.
Google’s launch of Gemma 4 accelerates the ongoing shift to on-device AI, enabling developers to seamlessly access optimized performance and bring increasingly capable AI experiences directly into the apps people use every day. Unlocking these benefits at a global smartphone scale depends on the underlying compute foundation, with one constant that is ubiquitous across the entire Android ecosystem: Arm.
What’s new for Gemma 4
Gemma 4 further advances on-device AI by delivering improved performance and efficiency, while expanding support for the kinds of multimodal experiences that matter most on Arm-based devices, including reasoning, agentic workflows, and vision-and-audio enabled use cases. With enhanced capabilities across text, audio, and image, broader language support, and a foundation for real-time assistive experiences, it enables more responsive, context-aware interactions directly on-device without increasing memory footprint.
Exploring Gemma 4 performance on Arm CPUs
In early Arm engineering tests, SME2 shows promising performance gains for running Gemma 4 E2B (Effective 2 Billion) workloads. Initial tests on the Gemma 4 2B model demonstrate an average of 5.5x speedup in prefill (processing user input) and up to 1.6x faster decode (generating responses), highlighting the potential of Armv9 CPU innovations for on-device AI workloads. These engineering tests include upcoming patches to Google XNNPACK and Arm KleidiAI.
As an early example of what is possible with these improvements, Envision, an accessibility-focused app for blind and low-vision users, evaluated an on-device approach for delivering more of its experience locally. Historically, Envision’s scene interpretation relied on cloud connectivity. In this prototype, Gemma 4 was evaluated running locally on Arm CPUs with SME2 capabilities, enabling users to capture a photo and receive a detailed scene description directly on-device without requiring a network connection or sending sensitive data off-device.
These explorations on Arm CPUs highlight the broader flexibility of the Arm compute platform and the potential for continued innovation across CPU and heterogeneous compute pathways.
The result is lower latency, stronger privacy, and more consistent user experiences regardless of connectivity conditions. This shift from cloud dependency to local inference is critical for mobile applications. It has the potential to reduce infrastructure costs for developers, improve reliability for users, and unlock new categories of real-time applications.
“Envision is excited to work with Arm and Google to bring powerful accessibility experiences directly onto smartphones. Running visual understanding models like Gemma 4 on-device on SME2-enabled Arm CPUs opens the door to reliable, low-latency scene description and visual Q&A for blind and low-vision users. For our community, the ability to access these capabilities offline is incredibly meaningful because it ensures the technology works wherever they are, while also improving privacy by keeping more processing on the device itself.” – Karthik Mahadevan, CEO, Envision
Envision is an early example of what’s possible when Gemma 4 meets the Arm compute platform at mobile scale. As more developers integrate Gemma 4, on-device AI will increasingly become the default architecture rather than the exception.
Why Arm matters for on-device AI at Android scale
The Armv9 architecture is the most secure, pervasive and advanced ISA ever. Arm Scalable Matrix Extension 2 (SME2) – a set of advanced CPU instructions in the Armv9 architecture – is a key technology, as it accelerates matrix-heavy AI workloads within the power envelope of smartphones. Already built into Arm C1 CPUs that are integrated into the latest Android smartphone devices, SME2 unlocks higher sustained performance and improved efficiency.
Through Arm KleidiAI – Arm’s software acceleration layer integrated into leading runtime libraries, like Google’s XNNPACK, and frameworks, like Google LiteRT and MediaPipe – the benefits of SME2 are readily accessible to mobile developers with no changes required to existing code, models or deployment pipelines. As a result, developers automatically access out-the-box performance optimizations simply by targeting Arm-based Android devices built on SME2.
In practice, these software-level gains translate directly into better on-device experiences. Users benefit from faster responses, smoother sustained interactions, and more reliable on-device AI, all while maintaining battery life and thermal stability, even as models grow more capable.
“Delivering Gemma 4 efficiently across the Android ecosystem requires deep collaboration across hardware and software. Our work with Arm reflects a shared commitment to advancing on-device AI, combining the benefits of the Armv9 architecture and built-in acceleration technologies, like SME2, with the Android operating system to unlock greater performance and efficiency at scale. Together, we’re making it easier for developers to bring fast, responsive, and privacy-preserving AI experiences to our users, without needing to modify their existing applications.” – Sandeep Patil, Engineering Director, Android
Arm and Google: Building the future of on-device AI together
As more applications move AI on-device, Arm and Google are committed to supporting developers with accessible performance optimizations and clear guidance that help Gemma 4 accelerate application experiences across all Arm-based mobile devices.
The future of mobile AI will not be defined solely by larger models, but by how efficiently, securely, and pervasively they run at scale across the Android ecosystem. Through this colloboration, the benefits of on-device AI will be felt by billions of Android smartphone users worldwide.
