Author Archive

Minister of Information

Wednesday, June 20th, 2007

New York Magazine has a good profile of Dr. Edward Tufte. If you are not familiar with his work, you should be. Dr. Tufte is an expert and pioneer in the field of visual communication of information and this is a nice introduction to read before you buy your copies of his fantastic books.

He keeps going on the road, selling steadily, a few gigs a month, year after year. That may be why there are 1.4 million copies of his titles in print—a staggering figure for self-publishing. (The top seller, The Visual Display of Quantitative Information, has been a reliable mover since 1983.) And at these six-and-a-half-hour presentations, the audience starts cheering when he hits the floor, clamors for their books to be signed, buys posters at the table out front. As soon as the applause stops, Tufte bolts backstage, enthusiastically draining a Corona.

Agriculture Department Exposes SSNs

Friday, April 20th, 2007

Came across an article in the New York Times describing the latest occurrence in the growing trend of private consumer information being inadvertently or purposely exposed on the internet. Now, due to obvious concerns about identity theft, millions of government dollars will have to be spent to monitor all these folks’ credit reports. Even worse than that though, is how many places this database has been copied which are completely outside of the agency’s control.

The Agriculture Department said that its review of the database shows that between 100,000 and 150,000 people could be at risk.

Privacy advocates say the actions by the agencies may not be enough. The database is more than two decades old, and is used by many federal and state agencies, by researchers, by journalists and by other private citizens to track government spending. Thousands of copies of the database exist.

Information Rich Web Design

Saturday, April 14th, 2007

Dr. Tufte has posted on his blog a letter he wrote to the Executive Editor of the Washington Post, following their site’s recent redesign. In short, he delivers the Editor the following excellent instructions to be handed off to their web designer:

Make our webpage straightforward, and if possible elegant–and, no matter what, increase the amount of news available within the immediate eyespan of the viewer on the homepage. We want more of what we do well immediately visible. People come to our website for the news, not for the interface.

Edward Tufte
March 29 2007

Sage advice any site designer should heed. Click over to Dr. Tufte’s site to join in the discussion about the Post’s redesign.

Web Analytic Solution Comparison

Wednesday, April 11th, 2007

Manoj Jasra posted a very useful web analytic solution comparison on his blog recently. If you are using, or are considering using, any kind of web analytic package on your site, his collection of links is definitely worth browsing through.

It’s Official: PowerPoint Bad for Brains

Tuesday, April 10th, 2007

The Register UK reports on new research coming out of Australia which recommends doing away with PowerPoint presentations as a means to communicate information.

Anyone who’s been a victim of “death by PowerPoint” - that glazed and distant feeling that overwhelms you when some sales droid starts their presentation - will be reassured by Aussie researchers who’ve discovered biological reasons for the feeling.

Humans just don’t like absorbing information verbally and visually at the same time - one or the other is fine but not both simultaneously.

Researchers at the University of New South Wales in Australia found the brain is limited in the amount of information it can absorb - and presenting the same information in visual and verbal form - like reading from a typical PowerPoint slide - overloads this part of memory and makes absorbing information more difficult.

Professor Sweller said: “The use of the PowerPoint presentation has been a disaster. It should be ditched.

“It is effective to speak to a diagram, because it presents information in a different form. But it is not effective to speak the same words that are written, because it is putting too much load on the mind and decreases your ability to understand what is being presented.”

The theory of “cognitive load theory” suggest the memory can deal with two or three tasks for a period of a few seconds - any more than that and information starts to get lost.

Read the abstract of Professor Sweller’s work.

AOL Search Data Reveals a Great Deal

Thursday, August 31st, 2006

As I’m sure you’ve already heard, there was a little mistake made by a research team over at AOL when they decided to release a 3 month sample of their search log data to the academic community. Of course the dataset was retracted from their servers within a matter of days, but by that point there were mirrors of the data everywhere and it was too late.

During the week of August 6, some people in AOL’s research division decided to release to the public a little database they had. It contained a list of about 658,000 users and the Web searches each made from March to May. If you were one of those lucky, randomly selected souls, every search term you entered was opened to the world.

AOL didn’t tell its users it could do this, nor that it was going to, and it didn’t offer anyone the opportunity to opt out. It did take a small step back from the abyss by substituting a number for the users’ screen names.

“So what?” you might say. “As long as no one knows it was me searching for “dwarf prostitutes in south dakota” what difference does it make?”

The problem is that searches aren’t anonymous, even if the screen names were withheld to protect the innocent. The New York Times proved this when it tracked down user 4417749, one Thelma Arnold of Lilburn, Ga., from her searches.

And you don’t need the resources of the Times. Even a part-time technology columnist of average intelligence can glean plenty from the database.

Feel free to check out a few of the websites that have been built around this data set in the past few weeks:

Data Mining Used to Find New Materials

Thursday, August 31st, 2006

An interesting combination of data mining and quantum mechanics at MIT seems to have created a new approach for predicting crystalline structures. They use the same data mining techniques that are employed in consumer applications like e-commerce shopping recommendation engines and market basket analysis.

The MIT team preloaded the entire body of historical knowledge of crystal structures into a computer algorithm, or program, which they had designed to make correlations among the data based on the underlying rules of physics.

Harnessing this knowledge, the program then delivers a list of possible crystal structures for any mixture of elements whose structure is unknown. The team can then run that list of possibilities through a second algorithm that uses quantum mechanics to calculate precisely which structure is the most stable energetically - a standard technique in the computer modeling of materials.

The latest research work has been published by Nature Materials under the title “Predicting crystal structure by merging data mining with quantum mechanics” (Volume 5, Number 8, Pages 641-646, August 2006). ABSTRACT | FULL TEXT

Feds Sharpen Secret Tools for Data Mining

Wednesday, July 26th, 2006

Big brother may be trying to watch you, but it’s unclear how skilled he is at dealing with the petabytes of information being collected.

Data-mining systems used by intelligence agencies include:

• Hardware and software from NCR subsidiary Teradata that is capable of storing and searching databases as large as 4 million gigabytes, or twice as much information as is held in all research libraries in the USA. Teradata executive Bill Cooper won’t say what’s in the Teradata systems that intelligence agencies use, but he says their applications include searching financial transactions for signs of money laundering.

• A program designed to identify members of terrorist networks and determine the most important members of those networks. Cogito Inc., of Draper, Utah, sold the program to the National Security Agency and other intelligence agencies, company executive William Donahoo says.

• Software from Verity Inc. used by the Defense Intelligence Agency and the Department of Homeland Security. A 2004 congressional report says DIA’s Verity system includes personally identifiable information about Americans from other agencies and commercial sources.

The five data-mining programs developed under Total Information Awareness are among at least eight TIA projects that have continued since Congress killed TIA in 2003. They include four efforts to create software that searches through mountains of data for evidence of terrorists and three projects that allow intelligence analysts from many different agencies to collaborate on computer networks. A contract to pull all of the new software together into a working system also remained active until at least last year, government records show.

AI Set to Exceed Human Brain Power

Tuesday, July 25th, 2006

While the pace of advancement in machine intelligence has been slower than most have hoped, progress is being made. New approaches are needed in order to assimilate and understand the petabytes of information being generated by our 21 century society. Existing methods of computing and analysis need to evolve significantly in order to keep up with the rising data tide, else it will be all we can do just to process and store all the information being created let alone gleen useful knowledge from it.

Nick Bostrom, Director of the Future of Humanity Institute at the UK’s Oxford University, said that AI-inspired systems were already integral to many everyday technologies such as internet search engines, bank software for processing transactions and in medical diagnosis. “A lot of cutting edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it’s not labelled AI anymore.”

But Bostrom said that traditional “top-down” approaches to AI, in which programmers coded machined to cope with specific situations, were being supplemented by “bottom-up” systems inspired by enhanced understanding of the neural networks of the brain, leading to more subtle forms of AI.

“The more we discover how the human brain achieves intelligence the more we’ll be able to use the same computational architecture and logarithms in computers,” said Bostrom.

Analysis is Not Evil

Wednesday, June 28th, 2006

An important point was brought up in this article in regards to the negative connotation the term “data mining” often has for people. This stems from users’ prior history with data mining tools that were ineffectual, difficult to use, and provided results that were more abstract than actionable.

Linda Koontz, information management issues director at the Government Accountability Office, said some agencies she interviewed about programs that mine data refuse to identity their programs as such.

“Different people sometimes mean different things by the term data mining,” she said. “There isn’t one definition that everyone agrees with. A lot of people feel aversion to using the word ‘data mining’ because they think that casts a negative pall over what they are doing.”

GAO defines data mining as the application of database technology and techniques to uncover hidden patterns and subtle relationships in data and infer rules that allow for the prediction of future results. Koontz said she doesn’t understand why data mining has a negative connotation. “Analysis is not evil,” she said.

Read more about the CDC’s BioSense initiative