Archive for August, 2006

AOL Search Data Reveals a Great Deal

Thursday, August 31st, 2006

As I’m sure you’ve already heard, there was a little mistake made by a research team over at AOL when they decided to release a 3 month sample of their search log data to the academic community. Of course the dataset was retracted from their servers within a matter of days, but by that point there were mirrors of the data everywhere and it was too late.

During the week of August 6, some people in AOL’s research division decided to release to the public a little database they had. It contained a list of about 658,000 users and the Web searches each made from March to May. If you were one of those lucky, randomly selected souls, every search term you entered was opened to the world.

AOL didn’t tell its users it could do this, nor that it was going to, and it didn’t offer anyone the opportunity to opt out. It did take a small step back from the abyss by substituting a number for the users’ screen names.

“So what?” you might say. “As long as no one knows it was me searching for “dwarf prostitutes in south dakota” what difference does it make?”

The problem is that searches aren’t anonymous, even if the screen names were withheld to protect the innocent. The New York Times proved this when it tracked down user 4417749, one Thelma Arnold of Lilburn, Ga., from her searches.

And you don’t need the resources of the Times. Even a part-time technology columnist of average intelligence can glean plenty from the database.

Feel free to check out a few of the websites that have been built around this data set in the past few weeks:

Data Mining Used to Find New Materials

Thursday, August 31st, 2006

An interesting combination of data mining and quantum mechanics at MIT seems to have created a new approach for predicting crystalline structures. They use the same data mining techniques that are employed in consumer applications like e-commerce shopping recommendation engines and market basket analysis.

The MIT team preloaded the entire body of historical knowledge of crystal structures into a computer algorithm, or program, which they had designed to make correlations among the data based on the underlying rules of physics.

Harnessing this knowledge, the program then delivers a list of possible crystal structures for any mixture of elements whose structure is unknown. The team can then run that list of possibilities through a second algorithm that uses quantum mechanics to calculate precisely which structure is the most stable energetically - a standard technique in the computer modeling of materials.

The latest research work has been published by Nature Materials under the title “Predicting crystal structure by merging data mining with quantum mechanics” (Volume 5, Number 8, Pages 641-646, August 2006). ABSTRACT | FULL TEXT