Build 2024: Phi-3-Vision Brings Multimodality to Microsoft’s Open SLM Family

Paul Thurrott
May 21, 2024
1

Microsoft Phi open models

At its Build 2024 developer show, Microsoft announced that its first multimodal small language model (SLM), Phi-3-vision, is now available in preview.

“We are excited to add new models to the Phi-3 family of small, open models developed by Microsoft,” Microsoft corporate vice president Misha Bilenko writes in the announcement post. “We are introducing Phi-3-vision, a multimodal model that brings together language and vision capabilities.”

Windows Intelligence In Your Inbox

Sign up for our new free newsletter to get three time-saving tips each Friday — and get free copies of Paul Thurrott's Windows 11 and Windows 10 Field Guides (normally $9.99) as a special welcome gift!

"*" indicates required fields

Microsoft introduced the first three members of its Phi family of SLMs via an elaborate PR campaign back in April. As with other SLMs, the Phi models are designed to run locally on devices like smartphones and PCs, and they provide ever-increasing levels of performance and capabilities similar to those of the previous generation of large language models (LLMs) that run in the cloud. But they work offline, are less expensive to operate, and offer privacy benefits.

The first three Phi models are Phi-3-mini, Phi-3-small, and Phi-3-medium, and each arrived with various advantages over competing models, though these comparisons vary almost week-to-week these days. To date, Phi-3-mini has been the most interesting in many ways because of its unique combination of small size and capabilities. But with Phi-3-vision, Microsoft is taking its SLMs to new heights.

Phi-3-vision is the first multi-modal model in the Phi-3 family, meaning that it supports multiple modes, and not just text, in this case, both text and images. It can reason over real-world images and extract and reason over text from images, and it’s been optimized to understand charts and diagrams, and it can generate insights and answer questions.

As with the other Phi-3 models, developers can get started most easily using the Azure AI Playground. But you can also build and customize Phi-3-vision and the other Phi-3 models using Azure AI Studio. You can learn more about Phi-3 on the Phi-3 Open Models website.

Please check our Community Guidelines before commenting

About author

Paul Thurrott

Paul Thurrott is an award-winning technology journalist and blogger with 30 years of industry experience and the author of 30 books. He is the owner of Thurrott.com and the host of three tech podcasts: Windows Weekly with Leo Laporte and Richard Campbell, Hands-On Windows, and First Ring Daily with Brad Sams. He was formerly the senior technology analyst at Windows IT Pro and the creator of the SuperSite for Windows from 1999 to 2014 and the Major Domo of Thurrott.com while at BWW Media Group from 2015 to 2023. You can reach Paul via email, Twitter or Mastodon.

View Articles

Currently on Forums
Visit the forums
- John Gruber The Talk Show Live @ WWDC 2024
  Posted by shameer_mulji
  
  0
  comment
- Asus got it’s butt kicked with this guy!
  Posted by Sarah Duguay
  
  3
  comments
- These don’t work either. How to fix?
  Posted by Sarah Duguay
  
  1
  comment
- Is this an improvement from my last AI background swap?
  Posted by Sarah Duguay
  
  5
  comments
Podcasts
Podcast Hub
- First Ring Daily 1612: Recallogists
  
  Aired on June 14, 2024 by Brad Sams with 1 Comment
- Unmasking Windows Recall: Microsoft’s Security Overhaul
  
  Aired on June 14, 2024 by Russell Smith with 0 Comments
- Windows Weekly 885: Open the Kokomo
  
  Aired on June 13, 2024 by Paul Thurrott with 0 Comments
- First Ring Daily 1611: A Bit Ironic
  
  Aired on June 13, 2024 by Brad Sams with 2 Comments
Join the crowd where the love of tech is real - become a Thurrott Premium Member today!

Explore Premium Benefits

Build 2024: Phi-3-Vision Brings Multimodality to Microsoft’s Open SLM Family

Windows Intelligence In Your Inbox

Share post

John Gruber The Talk Show Live @ WWDC 2024

Asus got it’s butt kicked with this guy!

These don’t work either. How to fix?

Is this an improvement from my last AI background swap?

First Ring Daily 1612: Recallogists

Unmasking Windows Recall: Microsoft’s Security Overhaul

Windows Weekly 885: Open the Kokomo

First Ring Daily 1611: A Bit Ironic

Windows Intelligence In Your Inbox

Sections

About Thurrott

Contact

Our Other Sites

Subscribe