How the Visual Cortex Works — Hubel-Wiesel, V1–V5 Hierarchy, and Neural Coding

About 30% of the human cerebral cortex is dedicated to vision. The visual system extracts edges, motion, color, depth, and object identity from the noisy, low-quality image that arrives through a 130-million-photoreceptor retina. This article traces the path from retinal ganglion cells to high-level visual areas, covering the Nobel Prize-winning work of Hubel and Wiesel, the functional organization of V1, the two processing streams, and what we know about how the brain codes visual information.

1. From Retina to Primary Visual Cortex

Visual signals travel from the retina along the optic nerve. At the optic chiasm, fibers from the nasal half of each eye cross: fibers from the left visual field converge in the right hemisphere, and vice versa. This partial crossing means each cerebral hemisphere processes the contralateral visual field.

The first central relay is the lateral geniculate nucleus (LGN) of the thalamus. LGN has six layers:

Magnocellular (M) layers 1–2: Large cells, fast conduction, sensitive to luminance contrast and motion. Project mainly to the dorsal stream.
Parvocellular (P) layers 3–6: Smaller cells, slower, sensitive to color and fine spatial detail. Project mainly to the ventral stream.
Koniocellular (K) sublayers: Between M/P layers, involved in color (blue-yellow chromatic signals, S-cone input).

Retinotopic mapping: The spatial organization of the retina is preserved throughout the visual hierarchy. Adjacent retinal positions project to adjacent cortical positions — the cortex contains a "map" of the visual field. Near the fovea (high acuity), cortical magnification factor is ~6 mm/degree of visual angle; in the periphery it drops to ~0.5 mm/degree.

2. V1 — Primary Visual Cortex

V1 (primary visual cortex, Brodmann area 17) in the occipital lobe is the main cortical target of LGN output. It contains the most precise retinotopic map and is the largest single visual area in humans (~25 cm² per hemisphere). V1 is organized in six layers:

Layer 4C: Main input layer from LGN ├── Layer 4Cα: Receives M-cell (magno) input └── Layer 4Cβ: Receives P-cell (parvo) input Layer 2/3: Intra-V1 connections \to V2, V3, V4, MT (feedforward) Layer 5: Output to pulvinar, SC, LGN (feedback / motor) Layer 6: Feedback output back to LGN (thalamic modulation)

Functional Columns

V1 neurons are organized into vertical columns (~50 µm diameter) that share functional properties. Two types of modules tile the cortical surface:

Orientation columns: Cells within a column respond preferentially to edges of the same orientation (e.g., 45°, 90°, 135°). Orientation preference shifts smoothly across the cortex in ~180° "hypercolumns" (~750 µm across).
Ocular dominance columns: Alternating ~500 µm wide bands driven preferentially by the left or right eye (although both eyes project to each neuron).
Cytochrome oxidase blobs: ~200–300 µm round patches in layers 2/3, rich in mitochondria, contain color-opponent (double-opponent) cells, receive K-cell input.

3. Hubel and Wiesel's Nobel Discoveries (1981)

David Hubel and Torsten Wiesel spent the 1960s–70s recording single neurons in the primary visual cortex of cats and monkeys with microelectrodes. Their discoveries reshaped neuroscience:

Simple Cells

Simple cells have elongated, subdivided receptive fields with distinct ON and OFF subregions. They respond maximally to an oriented edge or bar at a specific position and orientation in the visual field. Their response can be modeled as a linear spatiotemporal filter:

Simple cell response model (Gabor filter): f(x, y) = exp(-(x'²/2σ_x² + y'²/2σ_y²)) \cdot cos(2πfx' + φ) where x' = x\cdotcos(θ) + y\cdotsin(θ) (rotated coordinate) y' = -x\cdotsin(θ) + y\cdotcos(θ) θ — preferred orientation f — preferred spatial frequency φ — spatial phase (ON-center at φ=0, OFF-center at φ=π) Responses are roughly linear with contrast and approximately separable in space and time (spatiotemporal separability).

Complex Cells

Complex cells respond to oriented edges anywhere within their (larger) receptive field — they are PHASE INVARIANT. Moving an oriented edge across the receptive field produces sustained firing; the cell doesn't distinguish whether the edge is in an ON or OFF region. Modeled as a sum of squared simple cell outputs at opposite spatial phases:

Complex cell energy model: R = (S_even)² + (S_odd)² where S_even, S_odd are simple-cell responses 90° out of phase. This gives orientation selectivity without spatial-phase sensitivity — like a local edge detector robust to exact position.

Critical Period and Plasticity

Hubel and Wiesel discovered that monocular deprivation of kittens during a critical period (roughly postnatal weeks 3–8) causes permanent loss of responsiveness of V1 neurons to the closed eye — structural reorganization of ocular dominance columns. This revealed experience-dependent plasticity in early visual development and explained the basis for amblyopia (lazy eye) treatment windows.

4. V2, V3, V4, MT — The Hierarchy

Area	Location	Key Functions	Notable Properties
V1	Occipital pole (calcarine sulcus)	Edges, orientation, spatial frequency, binocularity	Finest retinotopy; 6-layer LGN input
V2	Adjacent to V1 (lunate sulcus)	Complex edges, color, disparity (depth), illusory contours	Stripe architecture (thick/thin/pale); feedback to V1
V3/V3A	Above/below V2	Complex shapes, global form	Large receptive fields; less well-studied
V4/hV4	Ventral occipital	Color, form, recognition	Color constancy; V4 lesions → achromatopsia
MT/V5	Posterior superior temporal sulcus	Motion direction, speed, optic flow	Directionally selective; MT lesions → akinetopsia
MST	Adjacent to MT	Complex optic flow, heading, smooth pursuit	Large receptive fields; >50% area of visual field

Receptive field size scaling: Receptive fields grow with each successive area. V1 cells have receptive fields of ~0.1–1° of visual angle; MT cells integrate over 10–30°; high-level inferotemporal neurons may have receptive fields covering the whole visual field.

5. The Two Visual Streams

Ungerleider and Mishkin (1982) proposed that visual signals bifurcate after V1 into two parallel processing pathways:

Ventral stream ("WHAT" pathway): V1 \to V2 \to V4 \to Inferior temporal cortex (IT) \to Ventral prefrontal cortex Function: Object recognition, face perception, color, fine detail Lesion effects: Prosopagnosia (face blindness), visual agnosia, achromatopsia Primary input: P-cell (parvocellular) LGN \to V1 blob/interblob regions Processing bias: High spatial frequency, slow temporal, central field Dorsal stream ("WHERE/HOW" pathway): V1 \to V2 \to MT/V5 \to MST \to Posterior parietal cortex \to Dorsal prefrontal cortex Function: Spatial location, motion, visuomotor control, action Lesion effects: Optic ataxia (misguided reaching), akinetopsia (motion blindness) Primary input: M-cell (magnocellular) LGN \to V1 layer 4B Processing bias: Low spatial frequency, high temporal, peripheral field Modern refinement (Milner & Goodale, 1995): Dorsal = "vision for action" (online motor control from parietal cortex) Ventral = "vision for perception" (conscious recognition in IT) Evidence: Patient DF — severe ventral lesion, intact grasping (dorsal) but cannot consciously estimate object size, orientation, or shape.

6. Spatial Frequency Channels

The visual system decomposes images into multiple spatial frequency bands (cycles per degree of visual angle), analogous to a Fourier decomposition:

High spatial frequency (>8 cpd): Fine detail, letter features, texture Medium (~2-8 cpd): Faces, global shapes Low spatial frequency (<2 cpd): Overall form, expression, emotion V1 cells tuned to spatial frequency: typical bandwidth = 1-1.5 octaves Peak frequency: ~2 cpd in fovea, decreasing in periphery Psychophysical masking experiments (Blakemore & Campbell, 1969): Adapting to a grating at one spatial frequency elevates detection threshold only for similar frequencies (~\pm1 octave) — channel independence. Practical implication: Faces are recognized primarily by low spatial frequency channels (overall shape, eyebrow height, jaw line). Fine detail (pores, wrinkles) is processed separately and used for discrimination within a known identity. Face recognition: Mooney two-tone faces (black/white blobs) reveal low-SF global processing. Hybrid images (Schyns & Oliva, 1999): two overlaid images at different SFs, each visible at different viewing distances — low SF from far, high SF close.

7. Neural Coding and Sparse Representations

A key question: How does a population of neurons represent visual information?

Rate Coding vs. Temporal Coding

Rate coding: Information carried by mean firing rate over 100–200 ms windows. Simple and robust; most V1 neurons are understood this way. Temporal coding: Information in precise spike timing, spike synchrony, or oscillation phase — important in some higher areas, especially for binding features across cortex.

Sparse Coding

Olshausen and Field (1996) showed that if you train a neural network to represent natural image patches using a small number of active units, the learned basis functions spontaneously become Gabor-like — oriented, localized, bandpass. This sparse coding hypothesis suggests V1 evolved to efficiently represent natural images:

Sparse coding objective: Minimize: ‖I - Σᵢ aᵢ φᵢ‖² + λ Σᵢ |aᵢ| where I = image patch, φᵢ = basis functions (to be learned), aᵢ = activities, λ = sparseness penalty (L1 norm) Solution: basis functions φᵢ become oriented Gabor filters! Matching the receptive fields of real V1 simple cells. At any time, only ~5-10% of V1 neurons are actively firing (sparse). Benefit: Fewer units active \to less metabolic cost, less noise in downstream read-out, decorrelated representation across neurons.

fMRI and Population Receptive Fields

fMRI measures the BOLD (Blood Oxygenation Level Dependent) signal — a hemodynamic proxy for local neural activity, delayed by ~4–8 s. Retinotopic mapping using phase-encoded stimuli (rotating wedge / expanding ring) reveals V1–V4 and MT in individual subjects. Population receptive field (pRF) models estimate the retinal location and size best driving each cortical voxel — showing systematic magnification and progression of pRF sizes across the visual hierarchy.

🧠 Explore Biology →