Tip: Use Word for the Web to Transcribe Audio and Video

Word for the web offers a useful feature not found in the desktop version: it can transcribe the speech in audio and video files. This is a Microsoft 365 feature that my wife uses regularly for the Zoom-based interviews she performs each week, and I’ve used it recently to transcribe Microsoft event videos for which YouTube doesn’t have transcriptions.

Side-tip: to see if a YouTube video offers a transcription, click the “…” link at the end of the line of icons below the video’s title and description. If there’s a “Show transcript” option, you’re good to go.

To transcribe an audio or video file, open Word on the web (office.com > Word in a web browser) and create a new, blank document. Then, in the simplified ribbon, find the drop-down control next to the Dictate icon (which looks like a microphone). There, you will find a Transcribe item.

Windows Intelligence In Your Inbox

Sign up for our new free newsletter to get three time-saving tips each Friday — and get free copies of Paul Thurrott's Windows 11 and Windows 10 Field Guides (normally $9.99) as a special welcome gift!

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

When you click that, the Transcribe pane opens. There are two basic choices here: Upload audio and Start recording. I use Upload audio to transcribe audio and (despite the name) video files. Word for the web will need to upload the file first; they’re stored in a Transcribed Files folder that will appear in the root of OneDrive and can be deleted later.

And then it will be transcribed. The time each operation takes will be based on the size/length of the recording.

When it’s done, the transcription will appear in the Transcribe pane.

You can edit the names of the speakers if you’d like. And you can replay the audio in the Transcribe pane, and it will indicate the location in the transcription as you skip around.

From here, you can add the transcription to the new document. This can be done with just the text, with the speakers labeled, with timestamps, or with speakers and timestamps.

There are a few limitations with this feature: it requires a Microsoft 365 subscription, as noted. It only works with US English. And there is a monthly limit of 300 minutes that I don’t see documented anywhere; my wife hits that almost every month and so she’ll send me the recorded Zoom files so I can transcribe them for her. But it’s an incredibly useful feature.

Tagged with

Share post

Please check our Community Guidelines before commenting

Conversation 5 comments

  • ggolcher

    Premium Member
    02 June, 2022 - 9:20 am

    <p>This is an amazing tip! We’ll use it immediately to transcribe user research calls.</p><p><br></p><p>Thank you!</p>

  • Daekar

    02 June, 2022 - 9:31 am

    <p>I know exactly how I am going to use this, can’t wait to see how well it works. Thanks for the tip! </p>

  • andreluis77x

    02 June, 2022 - 1:48 pm

    <p>It worked with Brazilian Portuguese too.</p>

  • Jim Lewis

    02 June, 2022 - 6:27 pm

    <p>It works great. I’ve been using it since January this year to transcribe the recorded audio from neighborhood Board meetings as Board Secretary. How well it works really depends on the clarity of the audio and any speaker’s style, as is true for voice recognition in general. It does a pretty good job of identifying different speakers. But it sometimes screws up in identification and may include the beginning or end of one person’s utterance as belonging to the previous speaker’s or next speaker’s transcription snippet. So, careful checking is essential. The interface, though, is a bit wonky. The transcription also breaks any speaker’s continuous utterances down into shorter phrases and puts a period at the end of each phrase, capitalizing the beginning of the next phrase, even though the whole set of utterances might be part of one sentence. So, to make a readable text, one has to go through and decide whether one needs to remove those periods and capitals to make one smooth, logical bit of speech. Basically, what you have to deal with was summed up years ago by a famous little book on punctuation and grammar, <strong>Eats, Shoots &amp; Leaves</strong>, by Lynne Truss. <span style="color: rgb(0, 0, 0);"> Just like a comma in the wrong place can be disastrous, a single bad word choice can make for an embarrassing transcription, too – so careful checking is essential!!! </span>The transcription service also shows no signs of learning from one’s corrections. To really make it a must-have service, Microsoft ought to make the AI backend learn from one’s edits over time. The same speakers attend all our Zoom meetings, and the degree of voice recognition hasn’t improved in ~the past half year. Someone uses a laptop microphone and slurs his very casual speech-not so great a result. I use a Blue Yeti X microphone, was "coerced" into clearly enunciating the King’s English growing up, and my voice is pretty accurately transcribed. So, what goes in comes out in the transcription. The amount of time left in the 300 minutes allowance for a month usually shows up when you first fire up the voice transcription feature to upload an .MP3 audio. I’ve uploaded up to 75 minutes of audio at one time (~90 Mb) but the transcription service can be painfully slow at times, so I usually split the audio into 30 min segments to do one-by-one. I use Magix’s Sound Forge Audio Studio 15 (a castoff of Sony’s) to process the recording first to increase audio peak heights as my Sony ICD-PX470 recorder hooked up to a Sony ECM-R300 table mic with noise cancellation turned on doesn’t produce a very strong signal. I also convert the would-be stereo recording to monoaural before transcribing. If you have any extended periods of silence in the recording, you want to edit them out before sending the .MP3 to the Word for Web service as the timestamps will reflect the inclusion of that silence and you can’t edit the silence out of the recording after transcription without screwing the correspondence of the audio time to the transcript timestamps (don’t ask me how I know this!).</p>

  • bluvg

    02 June, 2022 - 10:05 pm

    <p>Agreed, this is a great feature, but wish it didn’t have the 300 min/mth limit. To be fair, though, it is documented here: https: //support.microsoft.com/en-us/office/transcribe-your-recordings-7fc2efec-245e-45f0-b053-2a97531ecf57</p>

Windows Intelligence In Your Inbox

Sign up for our new free newsletter to get three time-saving tips each Friday

"*" indicates required fields

This field is for validation purposes and should be left unchanged.

Thurrott © 2024 Thurrott LLC