Fun With Data Mining
I think this article is getting around various places….I just saw it hit slashdot late last night, though I pulled it out of EFFector myself…
Data Mining 101: Finding Subversives with Amazon Wishlists is an interesting article on pulling information out of Amazon wishlists. Basically, the author downloaded the first page of 260,000 wishlists and then searched through the data to find “dangerous” (That is, if you were a paranoid government.) books and keywords. Included are books like On Liberty, Fahrenheit 451, and 1984. Some of the keywords were Michael Moore, Rush Limgaugh, and Koran. You get the idea. He comes up with a list of what books people seem to be interested in and goes on to demonstrate how he can easily find where many of these people live and even generate a map showing everyone.
And in case you were wondering, it was all fairly simple. It didn’t require any special resources of any kind. Anyone who knows how to program could do the same.
This is all pretty straightforward though. And although it is disturbing as a concept, there are two reasons why it’s even more disturbing as a reality.
First of all, the author links what he’s doing to the whole Patriot Act wiretap thing. This is where the government spies on it’s own citizens even in order to get those terrorists. Bush had said that while there were wiretaps, they were only being used on international calls amongst people who were known to have ties to terrorists. What he didn’t mention was that apparently, data mining such as this was used to determine who should get those wiretaps. So in a way, everyone was being spied on anyways.
Secondly, the Patriot Act can make the whole thing transparent to everyone and the FBI seems to be considering some data mining of their own. From the article:
amazonThis is what’s possible with publicly available information, but imagine if one had access to Amazon’s entire database - which still contains every sale dating back to 1999 by the way. Under Section 251 of the Patriot Act, the FBI can require Amazon to turn over its records, without probable cause, for an “authorized investigation . . . to protect against international terrorism or clandestine intelligence activities.” Amazon is forbidden to disclose that they have turned over any records, so that you would never know that the government is keeping records of your book purchases. And obviously it is quite simple to crossreference this info with data available in other databases.
On a final note, the FBI is now hiring computer scientists to implement a project that sounds very similar to what I just did:
“Currently, the FBI is strengthening systems engineering in order to tie new systems together architecturally and ensure that standards for custom and packaged applications are enforced, and it needs engineers to accomplish this goal, the agency said.
“The FBI is also focusing on data warehousing as well as federated search technology, which allows a single search query to be deployed across a number of databases, regardless of whether those databases belong to the same protocol or platform.
“‘Warehousing has been very successful, yet enterprise extraction, translation and loading processes must be fine-tuned,†the FBI said. “Data engineers are needed to model legacy databases for federated search and participate in legacy transition planning.’”(Computerworld)
Popularity: 5% [?]





