For instance, we all watch crime shows in which DNA evidence is shown to be overwhelming proof that a person accused of a crime was actually at the scene. Assume a relatively poor DNA sample is taken at a crime scene in a major city. About four hours after the crime was committed, a suspect is apprehended and the forensic evidence indicates to one part in a million that the person is guilty — or at least present at the event. This is pretty incriminating, but…
In a major city, people come and go all the time. We can ask how many people are reasonably within a four-hour radius of the crime scene. That is, of all the people on the Earth, how many can we absolutely eliminate simply because they could not have reasonably been at the scene four hours earlier? Do the estimates as many different ways as you want with as many different assumptions. I think we can agree that 10 million people being within a four-hour radius of a given scene in a major city is reasonable. It might be higher, but probably not too much lower. Then if we examined each individual, since the probability of matching to the evidence is one in a million, we expect that about 10 people will match the evidence. It might be only eight, or it could be 12, but the most likely number is 10. Let us assume one of those 10 is the criminal. The identified suspect has a probability of one in 10 of being guilty. That is a big difference from saying that the identified suspect is guilty beyond a reasonable doubt.
A similar issue comes up when assessing the probability that a person has a disease based on test results. Many medical tests are not as accurate as one would like. They have two ways of being in error: false negatives and false positives. But the test for HIV is particularly good. It has extremely high sensitivity and specificity. Knowing nothing else about a person other than that person tested positive will cause most, if not all, physicians to be certain the person is infected. Many physicians will flatly state that if you test positive, you’ve got it. No question. No room for doubt. You’ve got it. To see if that is true with an excellent test such as the HIV, consider two patients, one from a high risk group and one from a low risk group.
For both patients the parameters of the test are the same.
Sensitivity = 99.9% (Positive response for an infected person)
Specificity = 99.99% (Negative response if not infected)
One patient is from a low risk group with known infection rate of 0.01%
The other patient is from a higher risk group with a known infection rate of 1.5%
What are the probabilities that either patient has an HIV infection assuming both get a positive response?
The higher-risk patient has a probability of 150/151 of being infected — essentially a sure thing.
The low-risk patient has a probability of 50% of being infected.
If there is sufficient interest, I will post the calculations that give these results. They are simple and do not involve any high math. Hint: assume a population 10,000 people to be examined. What are the results?
The point is that by knowing the population that each person comes from, we know more information and that can change the meaning of the test results greatly. The more we know, the better we can compute the probabilities, but most people do not use all the information available to make the correct decision.
The difference in the probability of infection of these two patients is due to what we mean by random sampling. If we knew nothing additional about the patients other than the fact that they had tested positive, then we would have to use the known average infection statistics for humanity as a whole. But by adding additional information such as whether either one avoids risky sexual practices, we can remove some uncertainty.