Tip: Use Word for the Web to Transcribe Audio and Video

Paul Thurrott
Jun 02, 2022
5

Word for the web offers a useful feature not found in the desktop version: it can transcribe the speech in audio and video files. This is a Microsoft 365 feature that my wife uses regularly for the Zoom-based interviews she performs each week, and I’ve used it recently to transcribe Microsoft event videos for which YouTube doesn’t have transcriptions.

Side-tip: to see if a YouTube video offers a transcription, click the “…” link at the end of the line of icons below the video’s title and description. If there’s a “Show transcript” option, you’re good to go.

To transcribe an audio or video file, open Word on the web (office.com > Word in a web browser) and create a new, blank document. Then, in the simplified ribbon, find the drop-down control next to the Dictate icon (which looks like a microphone). There, you will find a Transcribe item.

Windows Intelligence In Your Inbox

Sign up for our new free newsletter to get three time-saving tips each Friday — and get free copies of Paul Thurrott's Windows 11 and Windows 10 Field Guides (normally $9.99) as a special welcome gift!

"*" indicates required fields

When you click that, the Transcribe pane opens. There are two basic choices here: Upload audio and Start recording. I use Upload audio to transcribe audio and (despite the name) video files. Word for the web will need to upload the file first; they’re stored in a Transcribed Files folder that will appear in the root of OneDrive and can be deleted later.

And then it will be transcribed. The time each operation takes will be based on the size/length of the recording.

When it’s done, the transcription will appear in the Transcribe pane.

You can edit the names of the speakers if you’d like. And you can replay the audio in the Transcribe pane, and it will indicate the location in the transcription as you skip around.

From here, you can add the transcription to the new document. This can be done with just the text, with the speakers labeled, with timestamps, or with speakers and timestamps.

There are a few limitations with this feature: it requires a Microsoft 365 subscription, as noted. It only works with US English. And there is a monthly limit of 300 minutes that I don’t see documented anywhere; my wife hits that almost every month and so she’ll send me the recorded Zoom files so I can transcribe them for her. But it’s an incredibly useful feature.

Tagged with

Word for web

Please check our Community Guidelines before commenting

Conversation 5 comments

ggolcher
Premium Member
02 June, 2022 - 9:20 am

This is an amazing tip! We’ll use it immediately to transcribe user research calls. Thank you!

Log in to Reply
Daekar

02 June, 2022 - 9:31 am

I know exactly how I am going to use this, can’t wait to see how well it works. Thanks for the tip! 

Log in to Reply
andreluis77x

02 June, 2022 - 1:48 pm

It worked with Brazilian Portuguese too.

Log in to Reply
Jim Lewis

02 June, 2022 - 6:27 pm

It works great. I’ve been using it since January this year to transcribe the recorded audio from neighborhood Board meetings as Board Secretary. How well it works really depends on the clarity of the audio and any speaker’s style, as is true for voice recognition in general. It does a pretty good job of identifying different speakers. But it sometimes screws up in identification and may include the beginning or end of one person’s utterance as belonging to the previous speaker’s or next speaker’s transcription snippet. So, careful checking is essential. The interface, though, is a bit wonky. The transcription also breaks any speaker’s continuous utterances down into shorter phrases and puts a period at the end of each phrase, capitalizing the beginning of the next phrase, even though the whole set of utterances might be part of one sentence. So, to make a readable text, one has to go through and decide whether one needs to remove those periods and capitals to make one smooth, logical bit of speech. Basically, what you have to deal with was summed up years ago by a famous little book on punctuation and grammar, Eats, Shoots & Leaves, by Lynne Truss. Just like a comma in the wrong place can be disastrous, a single bad word choice can make for an embarrassing transcription, too – so careful checking is essential!!! The transcription service also shows no signs of learning from one’s corrections. To really make it a must-have service, Microsoft ought to make the AI backend learn from one’s edits over time. The same speakers attend all our Zoom meetings, and the degree of voice recognition hasn’t improved in ~the past half year. Someone uses a laptop microphone and slurs his very casual speech-not so great a result. I use a Blue Yeti X microphone, was "coerced" into clearly enunciating the King’s English growing up, and my voice is pretty accurately transcribed. So, what goes in comes out in the transcription. The amount of time left in the 300 minutes allowance for a month usually shows up when you first fire up the voice transcription feature to upload an .MP3 audio. I’ve uploaded up to 75 minutes of audio at one time (~90 Mb) but the transcription service can be painfully slow at times, so I usually split the audio into 30 min segments to do one-by-one. I use Magix’s Sound Forge Audio Studio 15 (a castoff of Sony’s) to process the recording first to increase audio peak heights as my Sony ICD-PX470 recorder hooked up to a Sony ECM-R300 table mic with noise cancellation turned on doesn’t produce a very strong signal. I also convert the would-be stereo recording to monoaural before transcribing. If you have any extended periods of silence in the recording, you want to edit them out before sending the .MP3 to the Word for Web service as the timestamps will reflect the inclusion of that silence and you can’t edit the silence out of the recording after transcription without screwing the correspondence of the audio time to the transcript timestamps (don’t ask me how I know this!).

Log in to Reply
bluvg

02 June, 2022 - 10:05 pm

Agreed, this is a great feature, but wish it didn’t have the 300 min/mth limit. To be fair, though, it is documented here: https: //support.microsoft.com/en-us/office/transcribe-your-recordings-7fc2efec-245e-45f0-b053-2a97531ecf57

Log in to Reply

About author

Paul Thurrott

Paul Thurrott is an award-winning technology journalist and blogger with over 25 years of industry experience and the author of 30 books. He is the owner of Thurrott.com and the host of three tech podcasts: Windows Weekly with Leo Laporte and Richard Campbell, Hands-On Windows, and First Ring Daily with Brad Sams. He was formerly the senior technology analyst at Windows IT Pro and the creator of the SuperSite for Windows from 1999 to 2014 and the Major Domo of Thurrott.com while at BWW Media Group from 2015 to 2023. You can reach Paul via email, Twitter or Mastodon.

View Articles

Currently on Forums
Visit the forums
- Drunk purchase “microsoftyousuck.com”
  Posted by andybarzyk
  
  0
  comment
- Ask Paul for this Friday, April 19
  Posted by Paul Thurrott
  
  11
  comments
- Questions for 4 / 19?
  Posted by Brad Sams
  
  4
  comments
- What does Microsoft know that we don’t?
  Posted by Sarah Duguay
  
  3
  comments
Podcasts
Podcast Hub
- First Ring Daily 1585: Extensioned
  
  Aired on April 18, 2024 by Brad Sams with 0 Comments
- Windows Weekly 877: The Tiger in the Grass
  
  Aired on April 18, 2024 by Paul Thurrott with 1 Comment
- First Ring Daily 1584: Did Copilot Do That?
  
  Aired on April 17, 2024 by Brad Sams with 1 Comment
- First Ring Daily 1583: The Plumbing
  
  Aired on April 16, 2024 by Brad Sams with 1 Comment
Join the crowd where the love of tech is real - become a Thurrott Premium Member today!

Explore Premium Benefits

Tip: Use Word for the Web to Transcribe Audio and Video

Windows Intelligence In Your Inbox

Tagged with

Share post

Conversation 5 comments

Drunk purchase “microsoftyousuck.com”

Ask Paul for this Friday, April 19

Questions for 4 / 19?

What does Microsoft know that we don’t?

First Ring Daily 1585: Extensioned

Windows Weekly 877: The Tiger in the Grass

First Ring Daily 1584: Did Copilot Do That?

First Ring Daily 1583: The Plumbing

Windows Intelligence In Your Inbox

Sections

About Thurrott

Contact

Our Other Sites

Subscribe