Apple model combines vision understanding and image generation »

Apple researchers have published a study on a new model "Manzano" that represents improved quality and performance compared to current versions.

From Marcus Mendes on 9to5Mac:

In the study titled MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer, a team of nearly 30 Apple researchers details a novel unified approach that enables both image understanding and text-to-image generation in a single multimodal model.

Mendes gives a detailed explanation of the paper’s results – Apple’s image generation is “comparable” to GPT-4o in some tests, for example:

As a result of this approach, “Manzano handles counterintuitive, physics-defying prompts (e.g., ‘The bird is flying below the elephant’) comparably to GPT-4o and Nano Banana,” the researchers say.

We can basically see here that Apple’s models are a year behind where they want to be – but potentially catching up thanks to new research.

View the original.

Posts You Might Like

New iPad Pro and iPad Air Product Bezels Available For Your Designs
Apple designer Mike Stern has shared update Apple Design resources that contain product bezels for the new iPads – now waiting on a new Apple Frames shortcut…
Elite Hoops Coach Whiteboard Adds Widgets and Shortcuts For Basketball Teams of Any Size
Jordan Morgan has updated his whiteboard coaching app Elite Hoops with a focus on iOS features like widgets and Shortcuts support.
Creative Neglect: What About the Apps in Apple? »
Joe Rosensteel writes at Six Colors on his concerns for Apple's apps – how Clips, Pixelmator, and Final Cut Pro are growing worries (and I would add Shortcuts in too).
Apple Intelligence Shortcuts for Real Life »
Stephen Robles has 17 new real-life use cases for putting the Use Model action for Apple Intelligence to work in Shortcuts.