Four Score And…

We’ve been discussing the nature of randomness and how it affects communications. Last week I proposed a method of transmitting a message such that an enemy would not be able to intercept it.

Suppose you want to send a secure message from the moon. You know the enemy can listen in to whatever message you send because your beam will spread out on the way to Earth, and they likely have broken all your codes. So you signal your contact on the ground that you have a message to send. The contact can transmit on a narrow beam that cannot be intercepted by the enemy. Your contact starts to receive the message by asking, “Is the first letter A?” You respond yes or no. The contact proceeds this way guessing what the next letter in the message is and recording the correct guesses in the proper order until you signal the end of message. Questions:

  1. On the average, how many guesses will your contact have to make for each letter in your message? Assume the message is in normal conversational English and you are both native speakers.
  2. Is this a truly secure way to transmit a message? If not, how could it be broken?
  3. Would this system work with binary data instead of alphanumeric symbols?

The rather surprising answer to the first question is that a typical English speaker receiving a reasonably long message will make about 1.8 – 1.9 guesses per letter. That is, most of the time, the receiver will guess correctly what the next letter will be. Even with a primitive automated computer program doing the guessing instead of a human, the number of guesses can be held to less than four per letter. Of course these estimates are deliberately stated in a rather hazy fashion because the results for a given message will depend on many factors including the expectations of the receiver. If the receiver has a context in which to place an incoming message, then the results could be even better. For instance, if the first words of the message were “Four score and,” the receiver would likely guess next “seven years ago,” etc.

The ability to guess the next letter is enabled by the built-in redundancy in the English language, but similar results hold for other major languages because they are all solving the same problem of how to communicate between humans with an acceptable error rate. One way of reducing errors in transmission is to add redundancy. A primitive way of doing this is to simply send the same message more than once. If the errors in transmission are random and not dependent on the nature of the massage, then comparisons of the copies will allow one to throw out obvious noise.

The second question turns the situation around and deliberately tries to prevent communication be making the transmission be identical to a random sequence of yeses and nos. I’m still getting some comments on this aspect of the problem, so we will hold off on discussing it at this time.
The third question was a bit of a spoof. Suppose the message was indeed being sent one bit at a time. Then if the receiver always asked it the next bit was a “1,” then the sender would transmit the actual message with no apparent encoding. But the enemy supposedly does not know the algorithm that the receiver is using, so can the enemy rely on the inferred message? Before you answer, consider that the receiver could ask using an algorithm about the next bit such that the returned answers mimicked a false message.

Sending a message with letters is the same as sending it by bits but with the bits clumped together in a standard form. Clumping bits together facilitates various types of error detecting and even error correcting which can be much more efficient (using less bandwidth) than simple redundancy. Most information transmitted between computers is group in packets and these packets have associated with them a certain number of bits to detect and correct errors. The earliest version of this is simple parity checking which takes a single bit and can detect certain classes of likely errors.

In response to the interest my original tutorial generated, I have completely re – written and expanded it. Check out the tutorial availability through Lockergnome. The new version is over 100 pages long with chapters that alternate between discussion of the theoretical aspects and puzzles just for the fun of it. Puzzle lovers will be glad to know that I included an answers section that includes discussions as to why the answer is correct and how it was obtained. Most of the material has appeared in these columns, but some is new. Most of the discussions are expanded compared to what they were in the original column format.

[tags]decision theory,sherman e. deforest,communication,chaos,randomness[/tags]