Just over a year ago I started hearing about Stable Diffusion and Mid-Journey and the ability to create images from scratch. Just string a few words together and a generative AI model on the server will convert those written words into graphic images. magic.
From then on, everything moved so fast and so crazy. Suddenly, I was standing in the middle of the MediaTek booth at MWC, looking at an Android phone running the Dimensity 9300 chipset and generating AI images on the fly.
The model generated and improved the image in real time based on each letter I entered.
Every letter and word I type triggers the stable diffusion model and changes the image to more accurately match my description. real time. Zero latency, zero waiting, zero servers. Everything is local and offline. I was dumbfounded.
Just last year, Qualcomm was pleased to demonstrate (also at MWC) a stable diffusion model that can natively generate AI images in 15 seconds. At the time we found this impressive, especially compared to Midway’s more time-consuming and server-demanding iteration.
But now that I’ve seen live generation in action, those 15 seconds seem like a bit of a lag. Oh, what a difference 12 months makes!
Now that I’ve seen real-time AI generation in action, anything else feels like lag.
The Dimensity 9300 is built from the ground up to withstand more AI capabilities on devices, so this isn’t the only demo MediaTek is touting. However, other features are less impressive and compelling: local AI summarization, photo extensions, and Magic Eraser-like photo manipulation. Most of these features have become commonplace now, with Google and Samsung boasting them in their Pixel software and Galaxy AI suite respectively.
Robert Triggs/Android Authority
Then there’s the local video generation model, which creates an image and animates it into a series of GIFs to make it into a video. I tried several times. It took over 50 seconds and wasn’t always accurate, so as you can imagine it didn’t grab my attention like the live image model did.
MediaTek also showed off a real-time AI avatar maker that uses the camera to capture live footage of a character and animate it in a variety of styles. The animation is a second or two behind her actual movements, so it’s not that laggy, but the resulting image reminds me of the early days of Dall-E. Again, this is running locally and offline, which explains the issues. Sure, it’s still impressive technology, but it doesn’t “exist” like real-time image generation models.
As you can tell by now, I really enjoyed the first demo. It feels like technology has finally arrived. The fact that you can do this locally without the additional server cost and privacy concerns of sending the request online makes it much more practical to me.