guides

Removing Ums and Ahs from Podcasts: When, How, and When to Leave Them

PodRewind Team
8 min read
person editing audio waveforms on computer screen
Photo via Unsplash

TL;DR: Don't remove every um and ah—speech stripped of all filler sounds robotic and exhausting to hear. Remove filler words that cluster together, disrupt flow, or become distracting patterns. Leave isolated fillers that give listeners processing time. Automated tools work for heavy cleanup; manual editing gives the most natural results.


Table of Contents


The Filler Word Dilemma

Filler words feel like the enemy of polished podcasts. The temptation is to eliminate them all.

Here's the thing: completely filler-free speech sounds unnatural, even unsettling. Listeners subconsciously expect occasional verbal pauses. When they're all gone, something feels wrong—even if listeners can't identify what.

Why We Use Filler Words

Filler words serve communication purposes:

Cognitive processing: "Um" signals the speaker is thinking. Listeners' brains use this pause to process what was just said.

Turn-holding: "Uh" indicates the speaker isn't finished. Without it, listeners might interrupt.

Hedging: Fillers soften statements, making speakers sound less aggressive or certain.

Natural rhythm: Speech has a cadence. Removing all pauses flattens that rhythm.

The Over-Edited Problem

Aggressive filler removal creates new problems:

Machine-gun speech: Ideas hit listeners without breaks, overwhelming comprehension.

Lost context: Fillers often signal topic transitions or emphasis shifts.

Uncanny valley: Speech pattern feels "off" without identifiable reason.

Listener fatigue: No breathing room makes extended listening exhausting.

Finding the Balance

The goal: remove distracting fillers while maintaining natural speech patterns.

Remove: Clusters, patterns, extended ums that disrupt flow Keep: Isolated fillers, transition pauses, thinking moments that aid comprehension


When to Remove Filler Words

Target filler words that actively harm the listening experience.

Clusters of Fillers

Multiple filler words in sequence disrupt comprehension.

Problem example: "So, um, you know, like, basically, um..."

Why it's distracting: The listener is waiting for content, getting only verbal noise.

Action: Remove the cluster, leaving one natural pause or a single filler if it serves as transition.

Recurring Patterns

Speakers often have signature filler patterns that become noticeable over time.

Common patterns:

  • "You know" every third sentence
  • "Um" at the start of every response
  • "Like" before every adjective
  • "So" beginning every new thought

Why patterns are distracting: Once listeners notice the pattern, they can't unhear it. It becomes a distracting verbal tic.

Action: Remove enough instances that the pattern breaks. You don't need to eliminate every occurrence—just reduce frequency until it's no longer noticeable.

Extended Filler Sounds

Brief "um" feels natural. Three-second "uuuuuum" draws attention.

Why long fillers are distracting: They break the rhythm and suggest the speaker is lost or unprepared.

Action: Either remove entirely or shorten to typical filler length (under one second).

Fillers Before Important Points

Filler words just before key information weaken impact.

Problem example: "The most important thing is, um, uh, you know, commitment."

Why it's problematic: Builds anticipation, then deflates with filler before the payoff.

Action: Tighten so the key point lands with impact.

Failed Sentence Starts

Speakers sometimes start sentences, realize they're going wrong, and restart.

Problem example: "I think the, um, the thing is that, well actually, what I mean is..."

Why it's distracting: Listeners have to untangle what the speaker is actually saying.

Action: Remove false starts, keep the successful version.


When to Keep Filler Words

Some fillers serve the communication and should stay.

Isolated Fillers Between Ideas

A single "um" between thoughts gives listeners processing time.

Example: "We launched the feature in March. Um, the response was immediate."

Why to keep it: The pause separates two distinct thoughts. Removing it would create jarring immediacy.

Fillers Indicating Thought

When speakers pause to think, the filler signals that process.

Example: "What's the most important lesson? Um... probably persistence."

Why to keep it: The "um" and pause show genuine reflection, making the answer feel more authentic than an instant response.

Fillers in Emotional Moments

Hesitation during emotional content signals authenticity.

Example: "When she told me she was sick, I, um, I didn't know what to say."

Why to keep it: The filler conveys emotional weight. Removing it makes the delivery feel rehearsed.

Conversational Turn Signals

Fillers that hold the floor during conversation belong to natural dialogue.

Example during crosstalk: "Well, uh, that's not exactly—" [other speaker interjects]

Why to keep it: The filler is part of the conversational dance. Removing it creates unnatural precision.

Guest Comfort Indicators

Especially with nervous guests, some fillers signal relaxation progression.

Pattern: More fillers early in interview, fewer as guest warms up.

Why to keep some: A completely filler-free guest from the start sounds like they're reading a script. Natural improvement through the conversation is authentic.


Manual Filler Word Removal

Manual editing gives the most control and natural results.

Basic Technique

  1. Identify the filler in the waveform (short pause, low-energy sound)
  2. Select the filler including surrounding silence
  3. Delete and close the gap (ripple delete)
  4. Listen to the result in context
  5. Undo if it sounds unnatural

Selecting Cleanly

Poor selections create audible problems.

Include in your selection:

  • The filler sound itself
  • Breath before the filler (if present)
  • Silence after the filler
  • Don't cut into adjacent words

Finding selection points:

  • Look for zero crossings (where waveform crosses the center line)
  • Find natural silence gaps
  • Zoom in far enough to see individual waveform cycles

Preserving Natural Gaps

Don't eliminate all space where fillers were removed.

The mistake: Selecting filler plus all surrounding silence, leaving no gap.

The result: Words run together unnaturally.

Better approach: Leave 0.3-0.5 seconds of silence where the filler was. Speech needs breathing room.

Workflow for Efficient Manual Editing

Don't hunt for every filler. Instead:

  1. Play through at normal speed
  2. Note (or mark) only the distracting fillers
  3. Return and edit only those
  4. Leave unobtrusive fillers alone

This is faster than stopping for every "um" and produces more natural results.

Using Markers

Most DAWs support markers—use them to tag fillers for batch editing.

Marker workflow:

  1. Play through episode once, adding markers at problem fillers
  2. Return to markers and edit each
  3. Clear markers as you complete edits

Separates identification (listening) from editing (technical) for efficiency. This systematic approach fits well within a broader editing workflow.


Automated Filler Removal Tools

Software can identify and remove filler words automatically.

How Automated Tools Work

Tools analyze audio for:

  • Specific phonetic patterns ("um," "uh," "like")
  • Low-energy sections matching filler characteristics
  • Silence patterns typical of filler word placement

They then remove or silence the identified sections.

Descript: Transcription-based editing with filler word identification and one-click removal.

Adobe Podcast: Browser-based tool with "Enhance Speech" including filler removal.

Auphonic: Automatic post-production including filler reduction.

CapCut: Video editor with audio filler word removal.

DAW plugins: Various plugins detect and remove or reduce fillers in traditional DAWs.

Setting Automated Tool Sensitivity

Most tools have sensitivity or aggressiveness settings.

High sensitivity: Catches more fillers but may remove non-filler content Low sensitivity: Misses some fillers but fewer false positives

Recommendation: Start with medium-low sensitivity. You can always remove remaining fillers manually—it's harder to restore content that was incorrectly removed.

Pros and Cons of Automation

Advantages:

  • Much faster than manual editing
  • Consistent application across episodes
  • Good for heavy filler use
  • Catches fillers you might miss

Disadvantages:

  • May remove legitimate words
  • Can create unnatural gaps or transitions
  • Less control over which fillers stay
  • May process entire words that sound like fillers

Best Use of Automation

Use automated tools for initial cleanup, then listen through and:

  • Remove remaining obvious fillers manually
  • Restore any incorrectly removed content
  • Adjust gaps for natural flow

Automation as first pass + manual refinement = efficient natural results.


Creating Natural-Sounding Results

Technical removal is only half the challenge. Results need to sound human.

Listening in Context

Always evaluate edits in context, not isolation.

Test method:

  • Start playback 5-10 seconds before the edit
  • Play through the edited section
  • Continue 5-10 seconds after

Edits that sound fine in isolation sometimes create flow problems in context.

Maintaining Rhythm

Speech has natural cadence. Edits should preserve it.

Signs rhythm is broken:

  • Words feel rushed together
  • Pauses feel artificially uniform
  • Energy doesn't match surrounding speech

Fix: Adjust gap length after edits. Different thought transitions need different pause lengths.

Matching Energy

Filler words often correlate with energy changes.

Problem: Removing filler that bridged energy levels creates an abrupt shift.

Solution: If removing a filler creates a jarring energy jump, either:

  • Keep the filler
  • Add a short crossfade or silence to smooth the transition

The Less-Is-More Principle

When in doubt, remove fewer fillers.

Conservative approach:

  • Remove clusters and obvious problems
  • Leave isolated fillers
  • Keep anything that sounds worse when removed

Listeners accept occasional fillers much more readily than they accept obviously over-edited audio.


FAQ

How many fillers should I remove per minute of audio?

There's no specific number—judge by how distracting they are, not quantity. A speaker using two "ums" that form a pattern might need both reduced to break the pattern. Another speaker using ten isolated fillers might not need any removed. Focus on removing distractions, not hitting a removal quota.

Should I remove "you know" and "like" the same way as "um"?

Yes, apply the same principles. Remove clusters and patterns; keep isolated instances that aid natural flow. "You know" and "like" often serve conversational purposes—requesting agreement, softening statements—that pure filler sounds like "um" don't serve. Be even more conservative with meaningful-adjacent phrases.

Will listeners notice if I don't remove filler words?

Most listeners don't consciously notice moderate filler use unless you've trained them to expect polished speech. Podcast audiences are generally more tolerant of natural speech patterns than, say, audiobook listeners. Only chronic heavy filler use becomes noticeably distracting to typical listeners.

How do I remove fillers without making edits audible?

Cut at zero crossings, maintain appropriate gap lengths, use short crossfades when needed, and always check edits in context. Most audible edits come from cutting into words, removing too much silence, or creating jarring energy transitions. If an edit sounds obvious, undo and try a different approach.

Should I mention filler words to guests before recording?

Generally, no. Making guests self-conscious about fillers usually increases filler use as they monitor their own speech. If a guest uses fillers very heavily, you might gently mention that you'll clean up the audio, which can help them relax knowing the recording will be edited.



Ready to Polish Your Podcast Speech?

Filler word editing walks a line between polished and natural. Remove what distracts; keep what serves comprehension. The goal is speech that sounds human and unhurried, not robotic perfection.

Your thoughtfully edited episodes deserve permanent preservation. Transcription transforms natural-sounding audio into searchable archives where every carefully preserved word becomes findable.

Try PodRewind free and make every well-edited episode part of a searchable archive.

editing
filler-words
speech-patterns
post-production

Ready to Get Started?

Search your podcast transcripts, chat with your archive, and turn episodes into content. Start for free today.

Try PodRewind free