Apple model combines vision understanding and image generation »

Apple researchers have published a study on a new model "Manzano" that represents improved quality and performance compared to current versions.

From Marcus Mendes on 9to5Mac:

In the study titled MANZANO: A Simple and Scalable Unified Multimodal Model with a Hybrid Vision Tokenizer, a team of nearly 30 Apple researchers details a novel unified approach that enables both image understanding and text-to-image generation in a single multimodal model.

Mendes gives a detailed explanation of the paper’s results – Apple’s image generation is “comparable” to GPT-4o in some tests, for example:

As a result of this approach, “Manzano handles counterintuitive, physics-defying prompts (e.g., ‘The bird is flying below the elephant’) comparably to GPT-4o and Nano Banana,” the researchers say.

We can basically see here that Apple’s models are a year behind where they want to be – but potentially catching up thanks to new research.

View the original.

Posts You Might Like

Automatically switch to a different app every time you open Twitter »
Jay Robinson shared this great Automation idea for building better social networking habits – automatically switch to a different app when you use a particular network.
Apple posts full video “Behind the scenes: An Apple Event shot on iPhone” »
Apple has posted a behind-the-scenes look at how they shot heir latest Apple Event on iPhone – I highlighted my favorite line from the clip.
FINPAC Dual Sleeve Carrying Case For iPad and MacBook »
Tim Chaten recommended this dual-sleeve laptop/tablet bag – I'm testing it out at WWDC.
Personal Link App Plinky Launches To Help You Enjoy The Vast Internet
Developer Joe Fabisevich has launched his new app Plinky for saving and organizing personal links for later – check out his launch thread and get the app.