Apple model combines vision understanding and image generation »

Apple researchers have published a study on a new model "Manzano" that represents improved quality and performance compared to current versions.

January 14, 2026
5:00 pm

From Marcus Mendes on 9to5Mac:

In the study titled MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer, a team of nearly 30 Apple researchers details a novel unified approach that enables both image understanding and text-to-image generation in a single multimodal model.

Mendes gives a detailed explanation of the paper’s results – Apple’s image generation is “comparable” to GPT-4o in some tests, for example:

As a result of this approach, “Manzano handles counterintuitive, physics-defying prompts (e.g., ‘The bird is flying below the elephant’) comparably to GPT-4o and Nano Banana,” the researchers say.

We can basically see here that Apple’s models are a year behind where they want to be – but potentially catching up thanks to new research.

View the original.

Apple Intelligence, ImageGen, Linked, Models

Posts You Might Like

Links

10 Shortcuts to Save your iPhone Battery »

Stephen Robles shared these great shortcuts and automatinos for interacting with your device's settings for managing battery life.

Links, News

Amazon Alexa+ AI assistant now available free for Prime members »

Amazon has rolled out Alexa+ (free if you already have Prime) and shared details like how usage is up 2x.

Apps, Custom Shortcuts, Links

Shortcut to redirect YouTube links into the iOS app »

How to make YouTube videos redirect into the app – thanks to this shortcut from Stephen Robles.

Links, Offsite

90 Moments Testing Final Cut Pro for iPad Live

During my initial tests of Final Cut Pro for iPad, I discovered many new features, bugs, and workflow adaptations necessary for using Apple's newest professional app to the fullest – tune into the livestream.