Apple released a set of optimizations to Core ML to enable running the Stable Diffusion text-to-image model on Apple Silicon-powered devices running the latest iOS or macOS versions, respectively iOS 16.2 and macOS 13.1.
Core ML Stable Diffusion, as Apple named it, is comprised of a Python command line tool, python_coreml_stable_diffusion
, which is used to convert Stable Diffusion PyTorch models to Core ML, and a Swift package developers can use in their apps to easily enable image generation capabilities.
Once you have converted the version of Stable Diffusion you would like to use to the Core ML format using the CLI, generating an image in a Swift app from a given prompt is as easy as:
import StableDiffusion
...
let pipeline = try StableDiffusionPipeline(resourcesAt: resourceURL)
let image = try pipeline.generateImages(prompt: prompt, seed: seed).first
With its Core ML Stable Diffusion toolkit, Apple is bringing attention to the benefits to both users and developers of deploying an inference model on-device versus a server-based approach:
First, the privacy of the end user is protected because any data the user provided as input to the model stays on the user’s device. Second, after initial download, users don’t require an internet connection to use the model. Finally, locally deploying this model enables developers to reduce or eliminate their server-related costs.
The key factor enabling on-device deployment is speed, Apple says. For this reason they developed an approach to optimize the Stable Diffusion model, made of 4 different neural networks including about 1.275 billion parameters, to run efficiently on its Apple Neural Engine available on Apple Silicon.
While Apple hasn’t yet provided any perfomance data about the improvements brought by Core ML Stable Diffusion, they found that the popular Hugging Face DistilBERT model worked out of the box with 10x speed improvement and 14x reduced memory consumption. It must be noted here that Stable Diffusion is significantly more complex than Hugging Face DistilBERT, which in Apple’s view makes the case for on-device inference optimizations even more compelling in order to take advantage of the models of growing complexity that the community is generating.
According to Apple, Stability Diffusion has enabled the creation of “unprecedented visual content with as little as a text prompt” and raised lots of interest from the community of artists, developers and hobbyists. Besides that, Apple sees a growing effort to use Stable Diffusion for image editing, in-painting, out-painting, super-resolution, style transfer and other conceivable applications.
Based on a latent diffusion model developed by the CompVis group at LMU Munich, Stability Diffusion has been released by a collaboration of Stability AI, CompVis LMU, and Runway with support from EleutherAI and LAION. Its code and model weights have been released publicly and you can easily try it out the at HuggingFace or using DreamStudio AI.