🎵 Acoustics · Cognitive Science
📅 March 2026⏱ 11 min🟡 Intermediate

Psychoacoustics: How We Hear Sound

A 440 Hz vibration moves through the air, enters your ear canal, vibrates the eardrum, moves three tiny bones, shifts fluid in a coiled tube, bends 15,000 hair cells, and triggers a neural code that your brain interprets as "the note A". Psychoacoustics is the science connecting this physical process to subjective perception — and its findings power everything from MP3 compression to concert hall design.

1. The Ear as Spectrum Analyser

The cochlea performs a biological Fourier-like frequency analysis. Its basilar membrane is a tapered structure: wide and flexible at the apex (responds to low frequencies), narrow and stiff at the base (responds to high frequencies):

Tonotopic map (characteristic frequencies along basilar membrane): Base (basal): ~20,000 Hz Middle: ~1,000 Hz Apex: ~20 Hz Approximately: log-spaced — each octave takes equal membrane length (~3.5 mm) Total membrane: ~35 mm → ~10 octaves → ~3.5 mm per octave Inner hair cells (IHC): primary auditory receptors (~3,500 per cochlea) Deflection of stereocilia → opens K⁺ and Ca²⁺ channels (tip links) → receptor potential → glutamate release → spiral ganglion neuron firing Each IHC contacts ~10-15 afferent fibres → high fidelity encoding Outer hair cells (OHC): amplifier cells (~12,000 per cochlea) Prestin protein in lateral wall → electromotility (up to 70,000 MHz!) Active process → amplifies basilar membrane motion by ~40 dB (100×) Lost in noise-induced hearing loss first → sensitivity + frequency resolution decline simultaneously

2. Loudness Perception

Sound pressure level (dB SPL): L_p = 20 · log₁₀(p / p_ref) where p_ref = 20 μPa (threshold of hearing at 1 kHz) Key levels: 0 dB SPL: threshold of hearing at 1 kHz 20 dB SPL: whisper 60 dB SPL: normal conversation 85 dB SPL: hearing damage with prolonged exposure (NIHL) 120 dB SPL: threshold of pain 194 dB SPL: theoretical maximum in air (overpressure = atmospheric pressure) Fletcher-Munson equal-loudness contours (1933), standardised as ISO 226: At 1 kHz: loudness level (phon) = dB SPL by definition. At other frequencies: more dB needed to achieve same perceived loudness. At 1 kHz (threshold): 0 dB SPL = 0 phon At 100 Hz (threshold): ~40 dB SPL required to reach threshold → We're much less sensitive to low frequencies at low volumes. Practical consequence: bass boosting at low listening levels ("loudness" button on amplifiers) compensates for reduced sensitivity at low frequencies. Sone scale (perceived loudness magnitude): 1 sone = loudness of 1 kHz tone at 40 dB SPL Doubling loudness (phons +10): sones double Approximate: S = 2^((P-40)/10) where P = loudness in phons

3. Pitch Perception

Pitch is the perceptual correlate of fundamental frequency — yet the relationship is not simple:

Mel scale (pitch perception is compressive and non-linear): Mels approximate equal perceived pitch intervals. m = 2595 · log₁₀(1 + f/700) Pitch increases logarithmically with frequency (each piano octave = 2× frequency). JND (just noticeable difference) in frequency: At 1 kHz: Δf ≈ 3 Hz (0.3%) At 8 kHz: Δf ≈ 40 Hz (0.5%) Trained musicians: ~1–2 cents (1 cent = 1/100th of a semitone ≈ 0.06% at 1 kHz)

4. Critical Bands and Masking

Critical bandwidth (CBW): The frequency range over which masking and certain perceptual grouping effects operate. Related to the integrating bandwidth of the basilar membrane filter. Bark scale (Zwicker 1961): 24 critical bands across the audible range. Each critical band spans: ~100 Hz at low frequencies (below 500 Hz) ~20% of centre frequency at higher frequencies Bark formula: z(f) = 13·arctan(0.76f/kHz) + 3.5·arctan((f/7.5kHz)²) Simultaneous masking: A masker tone at frequency f_m masks (makes inaudible) nearby tones. Masking most effective for tones WITHIN the same critical band. "Upward spread of masking": lower-frequency tones mask higher ones more easily than the reverse (asymmetric spreading). Temporal masking: Pre-masking: masker presented AFTER target but still masks it (retroactive) Duration: ~20 ms (forward-in-time brain processing) Post-masking (forward): masker silence leaves residual masking for ~100-200 ms Application — MP3 / AAC psychoacoustic compression: Perceptual model identifies tones/noise below masking threshold. These are below hearing threshold → can use fewer bits to encode them. Typical 128 kbps MP3 achieves ~1:11 compression ratio with minimal perceptible quality loss (eliminates psychoacoustically inaudible content)

5. Binaural Hearing and Localisation

Two ears provide multiple acoustic cues for sound localisation in three dimensions:

"Cone of confusion": Points equidistant on an imaginary cone around the interaural axis all produce the same ITD and ILD — the "cone of confusion". The pinna's spectral cues resolve this ambiguity. Without functioning pinna (or with plugged ears), front-back confusion and elevation errors increase dramatically. This is why in-ear vs over-ear headphones differ in spatial audio quality.

6. The Cocktail Party Effect

At a noisy party with many conversations, you can focus on and follow one speaker while filtering out others — even when the acoustics favour no individual voice in isolation. This remarkable ability involves multiple perceptual mechanisms:

7. Auditory Illusions