Sonification of DNA & Amino Acids


by Stephen Andrew Taylor (2016, updated 1 August 2020)


The four nucleotide bases of DNA - A, T, C, G - are each given a single sound, to signify the genetic code they carry. Even though they look fairly similar to the amino acids shown below, their importance lies in the information they carry in their sequence (for example, the start codon ATG), rather than their physical structure. As a mnemonic, each sound is made from a different kind of instrument:

A (adenine) is a metal bell
T (thymine) is a wooden guiro (made from a tree)
C (cytosine) is a ceramic vase
G (guanine) is a wine glass

I made a few other musical choices: the two larger molecules, A and G, have sounds with a long decay, to signify their greater size; the two smaller molecules, T and C, are staccato. Also, A/T and C/G are bound together to create the double helix; but the A/T bond is not as strong, so those sounds (bell and guiro) have a softer onset than C and G (ceramic and glass). Finally, when DNA is transcribed to RNA, T is replaced with U (uracil); this is why the guiro sound for T is so different from the other three sounds.


Click each image below to hear its corresponding sound.
DNA

A (adenine)

metal

T (thymine)

wood

C (cytosine)

ceramic

G (guanine)

glass

The DNA sequence for the Indian Hedgehog gene begins like this:

gcccccgcctggagccccccggagccacccggacgcctgagcccccgcagcgctcccgtcgacgcgcctgcccatcagcccaccaggagacctcgcccgccgctcccccgggctccccggccATGTCTCCCGCCCGGCTCCGGCCCCGACTGCACTTCTGCCTGGTCCTGTTGCTGCTGCTGGTGGTGCGGCA...

And sounds like this:

You can also make your own interactive DNA sonifications, to sonify any gene or sequence of DNA or RNA.

Genes consists of exons (which code for amino acids to make a protein) and introns (informally known as "junk" DNA). In the exons, with DNA in upper-case letters, we can hear the amino acids coded for by triplets of DNA (a codon). For instance, the sequence ATG codes for the amino acid Methionine; underlined above, this is the actual start of the gene (you can hear this at 20 seconds in the recording above, followed by several other amino acid codons). Ribosomes read dna codons to join together (or elongate) amino acids, which form a chain (a peptide), which eventually folds into a protein (see this Wikipedia article for more information). Some ribosomes work as fast as 10 codons per second; some are slower, two codons per second; often they change their speed. In the sonification above the electronics play at triplet = 120, or two codons (producing two amino acids) per second.

Amino Acids

Coded for by the four DNA bases, the twenty amino acids below fit together, like Lego blocks, to build proteins. To represent their physical structure, for each amino acid, every "step" in the side chain (carbon, OH, NH, etc.) gets a note (the diagrams below start at the right; side chains stretch out to the left). A ring of atoms gets a chord. If an amino acid has sulfur, you can hear a hissing sound. If the amino acid is hydrophilic, the notes go up; if the amino acid is hydrophobic, they descend. The more extreme an amino acid is (based on this Wikipedia article on hydrophobicity scales), the farther up and down the notes go; if it's electrically charged (the five amino acids colored in lavender), reverb is added. Many thanks to Daniel Stelzer for help with programming, and to saxophonist Nicki Roman and flutist Melody Chua, for providing the samples I used to make these sounds - although they sound kind of like pizzicato strings, they are actually woodwinds! The descending (hydrophobic) sounds are mostly saxophone; ascending (hydrophilic) sounds are flute.

[Update, 1 August 2020] The sounds below do a decent job representing amino acids; but they take too long to build a really big protein (for example, the spike protein from the novel coronavirus). To address this problem of scale, Data-Driven Music from the Coronavirus uses a single note for each amino acid, going from low (hydrophobic) to high (hydrophilic). The only exception are the acids that end with a ring of atoms, like W (tryptophan), Y (tyrosine), F (phenylalanine) and P (proline); these molecules are represented by two-note chords (dyads). The electrically charged hydrophilic acids have long decays (like a harp), while hydrophobic acids have an "oily" sound, like a violin bowed with the wood of the bow (col legno).

Click each image below to hear its corresponding sound; the notes follow each carbon atom in a side chain, going from right to left.

Amino acids - special cases

G Glycine

(no side chain; just a single note)

P Proline

(a single ring, played as a chord)

Hydrophilic amino acids with polar uncharged side chains - ascending

S Serine

T Threonine

N Asparagine

Q Glutamine

Hydrophilic with positively charged side chains - with reverb

H Histidine

R Arginine

K Lysine

Negatively charged - more reverb

D Aspartic Acid

E Glutamic Acid

Amino acids with hydrophobic side chains - descending

A Alanine

V Valine

C Cysteine

I Isoleucine

L Leucine

M Methionine

F Phenylalanine

Y Tyrosine

W Tryptophan