The Theory of Information

"Mary, you really are a great person. I hope we can keep in correspondence. I said I would write.”

- Your friend always, Jonathon, Nova Scotia, 1985.

That was a message found in a half-broken bottle that washed up ashore near a Croatian beach. It had spent nearly 23 years at sea from the time of writing to the time it was finally found. Who Jonathon and Mary were, and what the message actually means, we may never know. But this sorta thing has captivated us all at one time or another. The romantic element of a message in a bottle that might only be read years, maybe decades later, if at all is well known.

However, really, where the interest lies is in the desire to get a thought across. A message.

Maybe it’s to someone you love, or maybe it’s to a complete stranger. And we don’t just care about what it is that we are saying, but also how it is received by the other person. If Jonathan, say, really liked Mary, not only would he want to make sure that his words reached her in this letter through time, but he would also want to make sure that, when they do find their way to her, that she gets the complete message. 

While we no longer send hopeful messages in bottles these days, our concerns have not changed all that much; have they? It’s still just as important to us that that special someone gets our message quickly and efficiently. Today, we might take the speed and accuracy of texting for granted, but the history of how we arrived here, like the man who made it all possible, can sometimes go unnoticed. 

One Claude Shannon was thinking about the problem of information, at a time when the most consequential of professions was cryptography. Shannon was one of the brightest in the field and the importance of his work often meant that he had a direct line to both the US and UK heads of state. Yet, these perks of the job were hardly of value to Shannon. He was more interested in the theoretical aspects of reality. The crux of it all, the fundamental principles that govern what we call “information.” These thoughts began a 10 year period in which Shannon worked, alone, on a unified theory of information that would not only open the doors to a new field, but also pose most of the important questions and answer them as well. 

Shannon thought of information not as knowledge or intelligence, but rather anything intentional that you can distinguish from noise. How can you send a signal from one point to another and recover completely, or at least approximate, the signal from the noise of the medium? These were the questions Shannon was interested in. The key to answering those questions was to boil information down to one simple idea. Any information can be boiled down to a combination of  “yes” or “no” questions. True or false. On or off. Heads or tails. A 1 or a 0.  

And thus, the binary digit, or bit was introduced to the world as the unit of information.
To communicate a sentence for example, each character would have to be converted into multiple yes or no questions, or multiple bits. But that wasn’t it. Shannon also wanted to investigate the contents of the message. He, presumably informed by his cryptography work, realized that English like all languages, has many patterns and tendencies. Thanks to these patterns, we have learned to make sense of the language even when a lot of characters have been erased. It’s the reason why texting feels so effortless despite the prevalence of shortened, incomplete words and phrases. What that also means, is that language carries a lot of information that it really doesn’t need, especially when we are considering things like bandwidth and signal noise, and where we have to send a message as efficiently as possible. 

But this idea extends to other forms of information as well. The common image format “jpeg” uses both contextual awareness and probabilities to store minimal information about the picture being stored. Of course, when you involve probability into anything, you forfeit the right to say anything with certainty, you are approximating. Jpeg, too, loses some confidence in its data and some pixels may not be quite what they were in the actual image. But, to the naked eye, these compressed images often seem identical to the original. It’s why the jpeg format is able to achieve a 10:1 (and sometimes even 20:1) compression ratio. These compression fundamentals were crucial in the space race and the voyager missions before they finally made their way into our pockets.  

The video you are watching also goes through layers of compression before it reaches the screen you are watching this video on. Tech Youtuber Marques Brownlee once experimented with this very phenomenon by uploading and then re-uploading the same video a 1,000 times to see how the compression slowly rips the video apart. Each time, the conversion leads to a slightly worse approximation of the original piece of information than the last time, and before you know it, you are only left with blotches of color that are hardly considered proper footage. 

But back to the fundamentals of information. If I were to send you a message saying “what goes up must come down” 24 times, how much of that message would truly be considered information? Instead of writing the same sentence 24 times, turning them all into bits, sending it over and then converting them all back to legible English sentences, what would be more efficient is if I simply wrote the sentence once and left a command for your device to copy the line 24 times when it reaches you. Of course, you could shorten the line even further by removing certain letters using the concept of redundancy I just mentioned. The 23 other iterations of the first sentence carry no information because it’s not something I don’t already know. 

Think about it, if the weather in a region is always exactly the same, is there any reason for you to report on the weather? It wouldn’t be informative at all. From that perspective, the amount of information in a message is directly proportional to the amount of surprise in a message, or the stuff you don’t already know - meaning everything that you cannot reduce due to a pattern, because whatever you can reduce, will be reduced. You are then only left with the disorganized, out-of-pattern parts that will need to be stored as whole. Of course, most messages don’t repeat the same line 24 times. This is just a simple idea to demonstrate the phenomenon. But, really, English is said to be around 75% redundant! In that sense, it is not wrong to say that nonsense tweets are more informative than Shakespere. 

It is no wonder, then, that Shannon likes to equate information and entropy - something you and I usually call disorder. Shannon’s entropy is the absolute minimum amount of information that we need to store to fully capture the contents of a message. You would imagine that disorder is typically the complete opposite of being informative. After all, in the physical sense, a disordered arrangement is the lowest energy, least useful state something can be in, and we spend a lot of effort trying to prevent that from happening. Yet, in the world of information, disorder is key.

However, the same trend does still apply. A tendency to move towards disorder is still a bad thing. While we want surprising, disordered bits to be conveyed correctly, too much disorder will be indistinguishable from noise. Completely and utterly useless.

Trying to think of information at such a fundamental level allowed Shannon to see its use in fields one may not expect to. Thanks to an encouraging and challenging graduate advisor, Shannon was forced to take a small detour into genetics during his time as a masters student. He saw that genetics is just another way of transferring information biologically. Our bodies are anything but immune to the effects of aging. And at a deeper level, this aging is really the loss of genetic information due to entropy. 

At the forefront of anti-aging research is Dr. David Sinclair. In his book Lifespan: Why we age and why we don’t have to, Dr. Sinclair refers to Shannon's mathematical theory of communication as a way to understand the aging process. Shannon, in his time, was quite occupied with the problem of sending information over a noisy channel and ensuring that the information gets transmitted properly. Dr. Sinclair says that the aging problem is similar.

Whereas in a communication, say, between 2 computers, you would have a source of information, a transmitter, and a receiver, in the aging analogy, the sources of genetic information would be the egg and the sperm, the transmitter would be the epigenome - reader of the genome - and the receiver would be your body in the future. 

We all start off with a clean copy of the source information. We grow into our teens and then some more in seemingly perfect condition until slowly but surely, every successive division, thanks to entropy, starts making mistakes. Keeping things ordered becomes harder and harder over time. Each successive copy of the source information is a slightly worse copy of the one you had before - just like the Marques Brownlee video - till eventually you are left with a copy that barely resembles its old self - wrought with illnesses, frail, and about to perish.

To go back to the computer examples, it turns out we have a solution to that problem. Your webpages don’t seem to load any worse if you keep reloading them, do they? Well, how do they compress things and still manage to keep all the data then? They use something called a TCP/IP protocol, which essentially retains a perfect copy of the original website you meant to load, which can then be used to correct any errors during transmission. Wouldn’t it be great if we had a TCP/IP protocol for our genome and epigenome? It turns out we do. Known as four Yamanaka factors, scientists Shinya Yamanaka and John Gurdon found 4 gene combinations that could turn old adult cells into potent stem cells. They went on to win the Nobel prize in Medicine and Physiology in 2012 for their discovery. What this basically means for the anti-aging fight is that we now have the perfect copy that we have all been looking for. It’s now a matter of how we can access it and allow our bodies to rediscover their younger selves. 

But for all the bad rep that disorder has gotten so far, let me be devil's advocate for once.

Studies show that human to human, we are 99.9 percent identical in our genetic make up. What does the remaining 0.1% account for? Let’s put it this way, in the words of Ricardo Sabatini, “a printed version of your entire genetic code would occupy some 262,000 pages, or 175 large books. Of those pages, just about 500 would be unique to us.” 

Those 500 pages would be your essence, your likeness, your voice. And according to Shannon’s theory, they are the disordered out-of-pattern, unique pieces you share with no one else. In other words, it's the disorder in those 500 pages that truly makes you, you. Try your best to preserve those pages against the noisy tides of life. And if you can manage that, who knows, maybe someday someone might find them in a broken bottle somewhere far, far away. 

Because remember, it’s the little things in life that truly count.

- MA, MM