Psychoacoustics: How We Hear Sound
A 440 Hz vibration moves through the air, enters your ear canal, vibrates the eardrum, moves three tiny bones, shifts fluid in a coiled tube, bends 15,000 hair cells, and triggers a neural code that your brain interprets as "the note A". Psychoacoustics is the science connecting this physical process to subjective perception — and its findings power everything from MP3 compression to concert hall design.
1. The Ear as Spectrum Analyser
The cochlea performs a biological Fourier-like frequency analysis. Its basilar membrane is a tapered structure: wide and flexible at the apex (responds to low frequencies), narrow and stiff at the base (responds to high frequencies):
2. Loudness Perception
3. Pitch Perception
Pitch is the perceptual correlate of fundamental frequency — yet the relationship is not simple:
- Place theory (von Helmholtz, 1863): Pitch determined by WHERE on the basilar membrane maximum vibration occurs. Explains frequency discrimination at high frequencies well (>5 kHz). But place alone predicts discrimination far worse than observed at low frequencies.
- Temporal (timing) theory: For low frequencies (<4-5 kHz), hair cell firing is phase-locked to the stimulus — firing preferentially at certain phases. The brain reads the inter-spike interval pattern → extracts period → determines pitch. Explains "missing fundamental" illusion (complex tone with f₀ removed, but harmonics present — pitch is still perceived at f₀).
- Modern synthesis: Duplex theory — both place and timing contribute. Low frequencies: timing dominant. High frequencies: place dominant. Middle frequencies: both contribute.
4. Critical Bands and Masking
5. Binaural Hearing and Localisation
Two ears provide multiple acoustic cues for sound localisation in three dimensions:
- Interaural Time Difference (ITD): A sound from the right arrives ~700 μs earlier at the right ear than the left (for 90° azimuth). The auditory brainstem (superior olivary complex, Jeffress delay-line model) detects ITDs as small as 10–20 μs. Dominant cue for azimuth at low frequencies (<1500 Hz).
- Interaural Level Difference (ILD): The head shadows higher frequencies → amplitude difference between ears. Dominant cue for azimuth at high frequencies (>2000 Hz).
- Head-Related Transfer Function (HRTF): The pinna (outer ear) acts as a direction-dependent filter. Spectral coloration from the pinna provides elevation cues and front/back disambiguation. Personalised HRTFs enable convincing 3D audio (spatial audio in headphones, Apple AirPods Spatial Audio).
6. The Cocktail Party Effect
At a noisy party with many conversations, you can focus on and follow one speaker while filtering out others — even when the acoustics favour no individual voice in isolation. This remarkable ability involves multiple perceptual mechanisms:
- Spatial attention: Binaural cues (ITD/ILD) segregate sources by location. Sounds from different directions activate different neural populations → attention can select by spatial stream.
- Auditory stream segregation (ASA): Bregman (1990) showed that simultaneous sounds group into perceptual "streams" based on frequency proximity, temporal coherence, timbre similarity, and spatial origin. Once a target stream is formed, competing streams are attenuated by top-down attention.
- Top-down prediction: Linguistic knowledge, expected prosody, and semantic context provide strong predictions that enhance target detection in noise (noise-filled gaps are perceptually completed using context — "phonemic restoration").
- Neural mechanisms: Auditory attention selectively enhances neural responses to attended sounds in auditory cortex (~10 dB equivalent of SNR improvement). The frontal eye fields and parietal cortex exert top-down control via corticofugal projections to medial geniculate body.
7. Auditory Illusions
- Shepard tone: A superposition of tones an octave apart, all ramped in amplitude. As the tones slowly rise in frequency, the amplitude envelope is fixed — so when they start going out of the high range, they're inaudible, while new tones appear at the bottom. Perceptual result: the pitch appears to rise infinitely. Christopher Nolan used it throughout "Dunkirk" to create unending tension.
- Missing fundamental: Complex tone consisting of harmonics 200, 300, 400, 500 Hz (but NOT 100 Hz). Perceived pitch: 100 Hz. The auditory system infers the fundamental from the harmonic pattern — not from direct stimulation. Used intentionally by telephone engineers (voice compressed to 300–3400 Hz still carries pitch information via harmonics).
- Tritone paradox (Deutsch): Two tones a tritone apart (½ octave) — some people perceive the pattern as ascending, others as descending. The percept depends on the listener's learned tonal region — revealing language-accent effects on pitch perception.
- Haas effect / Precedence effect: When an identical sound reaches both ears but one version is delayed by 1–40 ms, the perceived sound comes from the direction of the first arrival only (even if the second is slightly louder — up to 10 dB). Exploited in PA systems to maintain perceived sound from the stage while front fills reinforce volume.