Build 2024: Phi-3-Vision Brings Multimodality to Microsoft’s Open SLM Family

Microsoft Phi open models

At its Build 2024 developer show, Microsoft announced that its first multimodal small language model (SLM), Phi-3-vision, is now available in preview.

“We are excited to add new models to the Phi-3 family of small, open models developed by Microsoft,” Microsoft corporate vice president Misha Bilenko writes in the announcement post. “We are introducing Phi-3-vision, a multimodal model that brings together language and vision capabilities.”

Windows Intelligence In Your Inbox

Sign up for our new free newsletter to get three time-saving tips each Friday — and get free copies of Paul Thurrott's Windows 11 and Windows 10 Field Guides (normally $9.99) as a special welcome gift!

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Microsoft introduced the first three members of its Phi family of SLMs via an elaborate PR campaign back in April. As with other SLMs, the Phi models are designed to run locally on devices like smartphones and PCs, and they provide ever-increasing levels of performance and capabilities similar to those of the previous generation of large language models (LLMs) that run in the cloud. But they work offline, are less expensive to operate, and offer privacy benefits.

The first three Phi models are Phi-3-mini, Phi-3-small, and Phi-3-medium, and each arrived with various advantages over competing models, though these comparisons vary almost week-to-week these days. To date, Phi-3-mini has been the most interesting in many ways because of its unique combination of small size and capabilities. But with Phi-3-vision, Microsoft is taking its SLMs to new heights.

Phi-3-vision is the first multi-modal model in the Phi-3 family, meaning that it supports multiple modes, and not just text, in this case, both text and images. It can reason over real-world images and extract and reason over text from images, and it’s been optimized to understand charts and diagrams, and it can generate insights and answer questions.

As with the other Phi-3 models, developers can get started most easily using the Azure AI Playground. But you can also build and customize Phi-3-vision and the other Phi-3 models using Azure AI Studio. You can learn more about Phi-3 on the Phi-3 Open Models website.

Share post

Please check our Community Guidelines before commenting

Windows Intelligence In Your Inbox

Sign up for our new free newsletter to get three time-saving tips each Friday

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Thurrott © 2024 Thurrott LLC