How DNA Works
Coiled inside almost every one of your 37 trillion cells is about 2 metres of DNA — a molecule that carries complete instructions for building and running a human being. Written in an alphabet of just four chemical letters, it is the most information-dense storage medium ever discovered.
Structure of DNA
DNA (deoxyribonucleic acid) is a polymer — a long chain of repeating units called nucleotides. Each nucleotide consists of three parts:
- A deoxyribose sugar (5-carbon)
- A phosphate group
- One of four nitrogenous bases: Adenine (A), Thymine (T), Guanine (G), Cytosine (C)
Nucleotides link together via phosphodiester bonds between the phosphate of one and the sugar of the next, forming the "backbone" of one strand. Two antiparallel strands coil around each other held together by hydrogen bonds between the bases — this is the famous double helix, first described by Watson and Crick in 1953 based on Rosalind Franklin's X-ray crystallography data.
The Base-Pairing Rules
The two DNA strands are complementary: the sequence of one strand determines the sequence of the other, via strict base-pairing rules first deduced from Erwin Chargaff's measurements in 1950:
Because the base-pairing rules are so rigid, knowing the sequence of one strand tells you the sequence of the other exactly. This property is what makes DNA replication and information transfer possible.
DNA Replication
Before a cell divides, it must copy all of its DNA so each daughter cell gets a complete genome. DNA replication is semi-conservative: each new double helix consists of one original strand and one newly synthesised strand.
Transcription — DNA to mRNA
Cells don't use DNA directly to make proteins. Instead, the relevant section of DNA is first copied into a single-stranded messenger RNA (mRNA) molecule — a process called transcription, performed by RNA polymerase.
RNA differs from DNA in two ways: it uses ribose (not deoxyribose) as its sugar, and instead of Thymine (T) it has Uracil (U), which pairs with Adenine.
In eukaryotes (animals, plants, fungi), the pre-mRNA is processed in the nucleus: introns (non-coding segments) are spliced out, a 5'-cap and poly-A tail are added, and the mature mRNA exits to the cytoplasm.
Translation — mRNA to Protein
Ribosomes read the mRNA strand in groups of three nucleotides called codons. Each codon specifies one amino acid, or a start or stop signal. This mapping is the genetic code.
| Codon (mRNA) | Amino acid | Note |
|---|---|---|
| AUG | Methionine | Start codon — translation begins here |
| UUU / UUC | Phenylalanine | |
| GAA / GAG | Glutamic acid | |
| GGU / GGC / GGA / GGG | Glycine | Four synonymous codons |
| CCU / CCC / CCA / CCG | Proline | |
| UAA / UAG / UGA | (stop) | Terminates translation |
Transfer RNA (tRNA) molecules carry the correct amino acid and have an anticodon that base-pairs with the mRNA codon. The ribosome catalyses the formation of peptide bonds between successive amino acids, building the polypeptide chain which then folds into a functional protein.
Genes and the Genome
A gene is a sequence of DNA that encodes a functional molecule — usually a protein, sometimes a functional RNA. The complete set of DNA in an organism is its genome.
Key numbers for the human genome:
- 3.2 billion base pairs (per haploid set)
- ~20,000 protein-coding genes — only ~1.5% of the total DNA
- ~48% transposable elements (mobile DNA sequences)
- ~8% regulatory and structural non-coding RNA genes
- The remaining ~40%: introns, repetitive sequences, and regions with still-unclear functions
Mutations
A mutation is a permanent change in the DNA sequence. Mutations are the raw material of evolution — without them, all life would be genetically identical. Most are neutral; some are harmful; rare ones are beneficial.
Types of mutation
- Substitution: One base swapped for another. A synonymous substitution may not change the amino acid (due to codon redundancy). A missense substitution changes the amino acid. A nonsense substitution creates a premature stop codon.
- Insertion / Deletion (indel): One or more bases added or removed. If not in multiples of 3, causes a frameshift mutation that scrambles all downstream codons — usually catastrophic.
- Chromosomal rearrangements: Large segments duplicated, inverted, or moved to a different chromosome.
CRISPR — Editing the Code
CRISPR-Cas9 (Clustered Regularly Interspaced Short Palindromic Repeats) is a molecular tool repurposed from bacterial immune systems that allows scientists to cut DNA at a precise location and edit the sequence. Jennifer Doudna and Emmanuelle Charpentier were awarded the 2020 Nobel Prize in Chemistry for its development as a gene-editing tool.
A guide RNA (gRNA) is designed to match the target DNA sequence. The Cas9 protein follows the gRNA, finds the matching sequence in the genome, and cuts both strands of the double helix. The cell then repairs the break using one of two pathways:
- NHEJ (non-homologous end joining): error-prone repair that typically disrupts (knocks out) the gene.
- HDR (homology-directed repair): if a repair template is supplied, the gene can be precisely corrected or replaced.
Try It Yourself
The cellular automata simulation shows how complex self-replicating patterns can emerge from simple binary rules — a beautiful analogy for how genetic information unfolds:
Reaction-diffusion models the kind of chemical signalling that controls gene expression in developing embryos (Turing patterns):