The dangers of the NSA's data-mining efforts

The dangers of the NSA's data-mining efforts

By John Bambenek

In the last few weeks a vigorous debate on the legality of the NSA's data mining efforts in the war on terror has raged fueled by the revelations of Edward Snowden on the previously undisclosed size and scope of U.S. intelligence gathering efforts.

Besides the obvious privacy and Fourth Amendment concerns, there is one component that should be more thoroughly discussed. Namely, whether the use of data mining and its weaknesses in terms of accuracy and vulnerability to manipulation make it safe to use in a national security context.

Data mining has weaknesses in terms of accuracy in the results it produces. You can get to higher levels of accuracy by applying different layers of filters, sure, but some base level of inaccuracy still persists.

According to the NSA, a great deal of Internet communications and phone record metadata are captured and it then applies a variety of indicators to determine "foreignness" of the subject. The general threshold is 51 percent confidence (essentially a coin toss) that a subject is foreign is required to proceed.

You are either inside the territorial United States or you are not. The fact that the bar is set quite low and that the technology can't ascertain with a near 100 percent level of confidence all of the time, Americans engaged in purely domestic communications will be monitored.

This begs the question of what kind of accuracy in data mining is there when you are looking for a more sophisticated attribute like "is this person a terrorist?"

To be fair, that is why ultimately communications are prioritized and then examined by a human to make those subjective calls. However, thousands of innocent and purely domestic communications were monitored and processed by the NSA. To deal with this problem, the NSA hired more analysts and the number of "privacy violations" actually increased proportionally.

The government points to a number of terrorist acts disrupted by data mining and disrupted attacks are a good thing. The concern is how many innocent people had their privacy violated with the same operations?

As an analogy, if the government monitored every bedroom of every married couple in America, it surely would detect many cases of domestic violence. It would also monitor far more cases of spouses in intimate relations without any indication of malfeasance. If you were in the latter camp of innocent "collateral damage," you'd have every right to be somewhat less than pleased.

Those inaccuracies exist even before a bad actor decides to try to manipulate the data. One of the most sophisticated public data-mining technologies is something most of us access everyday — a search engine. Search engines work by scanning the entire Internet so that when a consumer wants to find web pages related to X, they get only web pages related to X.

If you take a general subject, you can see active attempts to "manage" search engine results. Search for "lawyers" in a search engine of your choice. Which results are listed first is generally a factor of professionals who take action to make sure their clients are listed first. I know, I run a firm that charges good money to law firms to make sure their firms show up first in web search results.

If someone knows something about the algorithm they can manipulate the results. Sure, the NSA's algorithms are private but so are Google's. As more is revealed about the program, more information is out there to start to manipulate the results.

It is known that the NSA monitors phone records and call logs. Hackers in the '80s would routinely tap into hard-line phones to make "free phone calls." With cell phone "cloning," it is possible to make phone calls from a victim's cell phone number.

It was revealed that the Drug Enforcement Agency has been using NSA data to track drug investigations. If someone made enough calls from a victim's home phone, could that lead to a SWAT raid of an otherwise innocent victim? It isn't outside the range of possibility.

Data mining is still a technology in development. It is a serious thing to have someone's privacy compromised by these operations. There needs to be real examination and discussion of the risks of taking action based on potentially bad data.

John Bambenek is a computer security expert from Champaign. He is president of Bambenek Consulting, a cybersecurity firm, and a visiting lecturer in the Department of Computer Science at the University of Illinois at Urbana-Champaign. He can be reached at

Comments embraces discussion of both community and world issues. We welcome you to contribute your ideas, opinions and comments, but we ask that you avoid personal attacks, vulgarity and hate speech. We reserve the right to remove any comment at our discretion, and we will block repeat offenders' accounts. To post comments, you must first be a registered user, and your username will appear with any comment you post. Happy posting.

Login or register to post comments