Apple Quietly Reveals MM1, a Multimodal LLM

Apple logo

Researchers from Apple quietly published a paper describing the company’s work on MM1, a set of multimodal LLMs (large language models) designed for captioning images, answering visual questions, and natural language inference. It indicates that Apple, which had been silent on AI as the rest of the industry seized on it as the next wave, has made some advances and could soon play a major role.

“In this work, we discuss building performant Multimodal Large Language Models (MLLMs),” the description of MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training on arxiv.org reads. “We demonstrate that for large-scale multimodal pre-training using a careful mix of image-caption, interleaved image-text, and text-only data is crucial for achieving state-of-the-art few-shot results across multiple benchmarks, compared to other published pre-training results.”

Windows Intelligence In Your Inbox

Sign up for our new free newsletter to get three time-saving tips each Friday — and get free copies of Paul Thurrott's Windows 11 and Windows 10 Field Guides (normally $9.99) as a special welcome gift!

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

The paper describes MM1 as a family of multimodal models that support up to 30 billion parameters and “achieve competitive performance after supervised fine-tuning on a range of established multimodal benchmarks.” As the Apple researchers put it, MLLMs (multimodal large language models) have emerged as “the next frontier in foundation models” after traditional LLMs, and they “achieve superior capabilities.”

The Apple researchers believe they’ve made a breakthrough when it comes to training models with both images and text, and that these findings will help others trying to scale these models to ever-larger sets of data with better performance and reliability. Of course, for now, all we have to go on is the paper, as MM1 is not available for testing.

And it may never be: Apple is rumored to be working on an LLM framework code-named “Ajax” as part of a $1 billion AI R&D push. And the firm allegedly acquired the DarwinAI startup earlier this year to help goose those efforts.

“We view AI and machine learning as fundamental technologies, and they’re integral to virtually every product that we ship,” Apple CEO Tim Cook said during a post-earnings conference call in February after a year of silence on the topic. “We’re excited to share the details of our ongoing work in that space later this year.”

Since then, the company also highlighted the AI prowess of its recently announced MacBook Air M3 refresh. But the big push will likely come in June, when Apple is expected to host the next rendition of its annual WWDC developer show. It’s reasonable to expect that event to focus on AI, as will coming Google (I/O) and Microsoft (Build) developer shows.

Share post

Please check our Community Guidelines before commenting

Windows Intelligence In Your Inbox

Sign up for our new free newsletter to get three time-saving tips each Friday

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Thurrott © 2024 Thurrott LLC