Last week I featured some of the strange properties of noise and how they affect probabilities of events. Probabilities are what we use to make decisions. Therefore an understanding of the nature of noise in the data that comes to us is important. Contrary to common belief, not all noise follows a gaussian distribution with a well-defined mean.

The first issue we run into when considering noise is the same issue we had with trying to extract probabilities (future actions) from statistics (past actions). How many statistical data points are necessary to ensure the derived probability is accurate to some desired value? If we poll 200 people on how the will vote, is that enough data to predict the eventual winner of an election? Suppose the election is for the mayor of a town of 1000 people. Now suppose the election is for president of the USA. What is the relative accuracy of the derived probabilities?

In the same sense, to specify a noise distribution in a system such as the bit errors in transferring data from or to a hard drive, sufficient data must be collected. Transfer a gigabyte with no errors. Does that tell us anything about the expected distribution of bit errors?

Collecting sufficient data is particularly important is distinguishing between gaussian distributions from their bell-shaped cousins. Several types of noise commonly encountered in electronic systems follow bell-shaped curves with broader tails than a gaussian. The exact nature of the tails is important in deciding how you deal with the infrequent data points that fall a long way from the mean value. And it can get complex rather quickly. Try searching on "non-gaussian noise" and read a few of the papers that pop up.

Much of image processing is based on various types of noise reduction. Impulsive noise (bad pixels) cannot be easily removed by simple processes such as averaging, but a simple median filter can do a good job of reducing the effect of this type of noise while preserving sharp features such as edges. Median filters are non-linear and the mathematics of just what they do is complex, but anyone can apply a median of various sizes to an image and see the results. This works because the information in the image is spread over areas that are large compared to a single pixel. The same technique would not work to correct errors in transferring data from a hard drive.

Of course many clever people have developed ways to encode information in a digital stream that can both detect and correct errors. These range from simple parity checks to complex encoding as used to return data from deep space probes. All such techniques cost some overhead in terms of bandwidth, but the tradeoff is generally favorable. An interesting question is what is the nature of noise received from an error-correcting system when the noise gets above a certain threshold. In general these systems do not fail gracefully. They are like good tires that hold onto the road better than cheap ones, but have the habit of suddenly letting go completely when stressed too hard. Cheaper tires probably start slipping sooner, but give an unwary driver more warning. In the same way, a complex error-correcting system can suddenly start transmitting pure noise. But it was good while it lasted.

*In response to the interest my original tutorial generated, I have completely rewritten and expanded it. Check out the tutorial availability through Lockergnome. The new version is over 100 pages long with chapters that alternate between discussion of the theoretical aspects and puzzles just for the fun of it. Puzzle lovers will be glad to know that I included an answers section that includes discussions as to why the answer is correct and how it was obtained. Most of the material has appeared in these columns, but some is new. Most of the discussions are expanded compared to what they were in the original column format.*

[tags]decision theory, statistics, probability[/tags]