Sunday, November 29, 2009

Mammograms, Drug Tests, and Bayes

Mammograms, Drug Tests, and Bayes




The nation has been bombarded with hysteria about the new recommendations for mammograms for the last week or so. The real concern behind the recommendations was the high incidence of false positive results and their costs to the people involved.

As I listened to the debates, I realized two things: first, no one had any idea about the numbers involved in the analysis of the problem; and second, the same mathematics are crucial in the debate over drug testing in schools and the work place.

The crucial problem in both issues is that of false positive results in the testing: a mammogram showing that a healthy woman has breast cancer or a drug test showing that an abstinent subject has used drugs.

That problem can only be analyzed through the application of a subtle and sophisticated branch of advanced mathematics known as Bayesean analysis, or conditional probability.

The issue, and why it is a problem, can be demonstrated through a simple example that uses nothing more complicated that addition, subtraction, and decimal arithmatic.

However, if anyone gets the urge to present this information to a school board, get a statistics or probability instructor from the local college to help you. Very few high school math or science teachers are familiar with it, and the chance of finding a school board member who knows anything about it is about the same as that of finding a penguin on Miami Beach.

 * * *

Assume that a drug test is to be given to 10,000 high school students. Before the test is given, two things are known:

1. The test is 95% accurate, or on the average, out of every 100 tests given, 5 of the results will be wrong. (Most of the available tests are advertised as 99% accurate, but that figure is derived from rating the test in carefully controlled laboratory situations with randomized samples. In real life, with poorly trained part-time administrators, poorly handled specimen cups, bad sanitation conditions, sloppy records, uncalibrated lab equipment, and out-of-date or impure reagents, almost all of them will perform at less than a 95% level).

2. Only 5% of the group to be tested will have used the drug in question. (In actuality, only marijuana use will have reached the 5% level -- and it presents other, different problems). No other illegal drug is use levels nearly that high.

Based on these assumptions, of the 10,000 people tested, 9,500 will be non-users and 500 will be drug users.

When the 500 drug users are tested, 5% of the test results, or 25, will be incorrect. The other 475 results will be correct. The result is:

475 true positives (drug users found to be drug users)

25 false negative (drug users found to be non-users).

These results, by themselves, are harmless and probably acceptable. But what happens when the test is administered to the non-users?

The 9,500 tests administered to non-users will also be right in 95% of the cases, or 9025 correct results. Wrong results will appear in 5% of the cases, or 475 cases. The results of testing the non-users are:

9025 true negative (non-users identified as non-users)

475 false positives (non-users identified as users).

Now combine the two sets of results:

475 true positives (drug users correctly identified as drug users)

475 false positives (non-users incorrectly identified as drug users).

In other words, if someone fails the drug test, she is just as likely to be a non-user as she is to have used drugs. This result is the same as that one would get by tossing a coin – and the coin toss costs less and doesn’t invade anyone’s privacy. Any test that can get a student kicked off the football team, forced into rehab, or even expelled from school should be more reliable than a coin toss.

The story of mammograms is more complex. The consequences of false negatives – undetected cancers – and of false positives – anxiety, more testing, or even surgery, chemotherapy, or radiation – are much more severe. The test itself is not a simple yes/no like the drug test and the reliability is much less than 95% and depends partially on the skill and experience of the examiner. I will have to leave the full explanation to a professional statistician in another forum.

No comments:

Post a Comment