Many customers who made the same types of phone calls as you also bombed The World Trade Center

I’m not ordinarily a defender of Bush Administration actions concerning its response to The World Trade Center attacks, but the database analysis proponent in me feels something should be clarified in the minds of most Americans. According to a recent NEWSWEEK poll, “53 percent of Americans think the NSA‘s surveillance program ‘goes too far in invading people’s privacy.'” This of course is the taking of cell phone and other telephone records and mining them for clues to possible terrorists.

The outcry, I think, is in part because when we think of phone surveillance we think of wire-tapping (or, in the case of cell phones, wireless-tapping). However, if I understand this situation correctly, the NSA used this vast database of phone call numbers (both of originators and recipients), along with call dates, times and lengths, to look for suspicious patterns that were similar to those found in known terrorists’ phone behaviors.

I know, I know. If you analyze for this type of activity, you can also find patterns in the activities of your political enemies. Imagine the blackmail potential! It could shut down Washington! (Hmmm … could the blackmail have already begun?)

But let’s assume for the moment that we could somehow shine some light on the activity, thus preventing such abuses. Is this data mining an invasion of privacy? I suspect it’s closer to the surveillance we’re all accustomed to — and appreciative of — in our quiet suburban neighborhoods.

Probable cause is a term used to justify a police officer pulling over a citizen for questioning. I would equate this database research to looking for probably cause. So how is the research done? It uses the same technique that marketers use to predict whether a consumer will like this product versus that one.

For instance, you buy a CD on Amazon, and the web site immediately says, “Other of our customers who bought that CD also purchased these.” Then it lists three or four other, often surprisingly unrelated, artists, along with their latest CDs. If you have a big enough music collection, and predictable enough tastes, you’re surprised that you already love the work of one or two of those other artists. Amazing!

Amazon, and other large marketers using this profiling, let you know in advance that they looked into their database and found those correlations (through the statement, “Other customers of ours …”). What they don’t tell you is that usually, those data relationships are — on their own — too obscure or unrelated to be recognized in any way other than by using a sophisticated statistical regression analysis.

The same for this NSA action. I think a lot of Americans are concerned because they imagine an all-seeing computer is examining every single phone call they make or receive. I also suspect they are angry because now they have yet another privacy vulnerability to worry about, along with identity theft, spyware, etc.

But I suspect the process of profiling that was done by the NSA is more along the lines of the Amazon example. The predictive model takes into consideration thousands of weak correlations — possible coincidences that are only significant because when added together they match the behavior of known terrorists, (I would say convicted, but good ole Mr. Moussaoui is about it, and that’s an awfully small sample to try to model against! Known domestic terrorists would include the guys who died in their planes on 9/11, and made plenty of phone calls before they did).

So, if that is the case, is this intrusive? That depends.

 Is a police officer driving down your quiet residential neighborhood invading your neighborhood’s privacy when looking for probably cause to investigate a possible crime? This officer may not stop if one suspicious fact is noted about someone in your neighborhood. Maybe even two or three aren’t sufficient for probably cause. Each on its own may be too subtle — too similar to the behavior of those not breaking the law. But if there are enough suspicious facts concentrated around the behavior of, let’s say, that guy parked outside your door, then the officer will conclude the correlation is too great. The behavior and evidence surrounding that guy show too many similarities to those of convicted criminals. This behavior taken as a whole is too close to that of a burglar, let’s say.

The brain of that cop isn’t going to retain much information the next day, or even the next hour, about the non-suspicious behaviors that were observed, and in a similar way, I don’t think the NSA’s computers will be able to do much else but identify the behavior patterns they are programmed to sniff out.

Which brings me back to my original observation. How in the world did I become a defender of Bush? The answer is the NSA, under his watch, found a non-intrusive way to comb this country for possible criminal activity. I only pray that there will now be enough judicial (and judicious) oversight to ensure that the profiling being done is for real enemies of the state, and not enemies of the administration and its incumbents.