What’s in a sentence?

Machines have their work cut out for them when it comes to understanding not just language, but the point at which language becomes meaning.

Back in 1948 Claude Shannon wrote his landmark paper on information theory. Out of that paper grew a whole field looking at the ways that we quantify, store, and convey information across different channels. A channel can be a number of things: a molecule, a Wi-Fi signal, or a sentence.

Shannon, who was a Bell Labs employee at the time, was most interested in communications channels like copper wire or radio waves – and the fundamental limits of these. How much information can these channels handle at a given time? What effect does outside interference have?

But with his focus mostly on the channels themselves, he didn’t give much time to the other side of the equation: the actual information being transmitted. We’ve established the physical limits of these communication channels – but what about the limits of language? After all, we might be able to estimate the maximum number of words or sounds transmitted within a certain time frame, but how do we estimate how much information is being conveyed?

Language and compression

You could do some data crunching and say that this blog post contains about 700 bytes of information. But what does that mean exactly? Is the storage capacity needed to recreate the characters in this piece of text equivalent to the meaning it conveys? The answer is no, because language is inherently compressed.

Any communications we engage in as people are based heavily on existing experience, knowledge, and context. A sentence is never just a sentence, because it’s imbued with all the additional assumptions we bring to it. Let’s say I greet someone with the word “Hi.” Simple enough, right?

Unfortunately not. Because that one word is conveying information about me, about the person I’m talking to, about mood, rank, background, past interactions and so on. I’m using “Hi” in place of other potential greetings such as “G’day,” “Hello, Fred,” “Good evening, sir,” “How do you do, your majesty,” “Ciao!” and so on. And to know why, you need context.

Finding informational gaps

This is why getting computers to understand language is tough. Not only do they not have the context that we do, but we ourselves struggle to understand just how much context goes into creating a simple sentence – let alone a whole conversation. So where do we even begin?

One place to start is with a generative approach. Basically this involves taking a sentence, and considering to what extent your view of the world changes as a result of your comprehension of that sentence. Assuming perfect understanding, the information contained within it is your understanding of the world after hearing that sentence minus your understanding of it before hearing the sentence. This “gap” is the information conveyed by the sentence. Each additional sentence creates additional “gaps” that can be measured along the way.

Let’s say we ask a stranger about what they did on their weekend, and they respond simply: “it was boring.” This response doesn’t explicitly tell us what they did, but it does communicate information about what they probably did or didn’t to. Most likely they didn’t go skydiving or swimming with sharks. They probably didn’t get married or fly to France.

As we continue to talk to them, we can learn more about what their weekend involved, what made it boring, and whether their boring is the same as our boring. We update our worldview accordingly, with the gap between anterior and posterior understanding illustrating just how much information was conveyed with each utterance.

All communication is not created equally

However, we still need to allow room in our model for information density – and information loss. Invariably, information is lost in a conversation. That’s because context can never been communicated in its totality. And with each utterance, we lose a little more of it. Information density is also variable. It may take five minutes to learn about a stranger’s weekend, but no more than a smile or raised eyebrow to determine whether your partner’s weekend was boring.

Clearly machines have their work cut out for them when it comes to understanding not just language, but the point at which language becomes meaning – and how that meaning is encoded. How can a machine learn to understand the depth of meaning built into the simplest communicative act, and how can they learn how to respond appropriately?

The route that our computational implementations take will be very different from our own biological ones. But I think it’s inevitable that the two will inform each other, giving us greater understanding of human communication. Advances in neuroscience will give context to algorithmic understanding, and advances in AI will provide insight into ourselves. Human communication stretches back millennia – but we’re standing at the brink of all-new understanding of it.

This article is published as part of the IDG Contributor Network. Want to Join?