Sparklines: Merging Visual Data with Text
Wednesday, May 24th, 2006 | Justin Bugajski | No Comments
Take a look at this recent blog post that mentions Edward Tufte’s Sparkline concept and links to a couple of other sites that have information about effectively deploying them.
Information Visualization Toolkit
Thursday, May 4th, 2006 | Justin Bugajski | No Comments
If you are a developer looking for a better visualization toolkit, check out the newly released beta called Prefuse. Dubbed as “a Java-based toolkit for building interactive information visualization applications”. This is a BSD-licensed toolkit so it won’t break the bank for you to try it.
More insight into this particular development package can be found on Matt Stephens Blog
Storytelling Style for PowerPoint
Wednesday, May 3rd, 2006 | Justin Bugajski | No Comments
From the LA Times about Cliff Atkinson, who runs a one-man, Los Angeles-based company called Sociable Media. His site and his work are worth checking out. While the business world might in fact be better off if PowerPoint was eliminated completely from desktops, that seems unlikely to happen.
A better approach is to learn the tools necessary to effectively communicate with this software, not forgetting that the presentation slideshow is meant to enhance your presentation not be your presentation. So why not use some of the same techniques that Hollywood has employed for years in producing movies, namely the 3-act storytelling structure? Something to consider the next time you are preparing yet another rambling list of bullet points and hoping that your audience will stay awake. Get them engaged by using a well thought out flow.
Beating Traffic
Monday, April 24th, 2006 | Justin Bugajski | No Comments
From Brandon Hansen’s blog about how he analyzed personal commute times in order to best maximize his time on the road.
The idea being to minimize time in the car without changing too much his standard work schedule. It is clear that Brandon put a substantial amount of effort into this analysis, and his results are presented in a straightforward manner. What I also found to be interesting, was the U.S. Census report about commute times, and the other interesting reference materials Brandon had uncovered during his initial data gathering process.
Keep Things Exceptional
Thursday, April 20th, 2006 | Nick Bugajski | No Comments
A recent story about a driver circumventing the traffic lights on his way to work reminds us that security does not work without monitoring. Exceptional events should be rare, which means it should be reasonable to keep track of when they occur.
In this instance the problem was solved when people noted that the same car seemed to be around when the traffic lights were behaving abnormally. There is no reason that this could have been noticed sooner by an automated feedback system. With a proper automated system, it should be simple to note the rise in occurrence of these exception events (he was going to work every day!) and notify a person who can decide if it is meaningful/worth further investigation. Even if this person were to decide it is not, the system could notify them again later when it becomes apparent that the events are occurring at regular intervals. And even if that is dismissed, then the logging of events at least allows a person somewhere to go for investigation should they note the abnormal behavior independently of the system, as was the case in this situation.
Hopefully such feedback is part of the upgrade they mentioned will be implemented. More security does not really seem necessary, just better feedback.
An Engaging Presentation Style
Friday, April 14th, 2006 | Justin Bugajski | No Comments
How do you give a 15-minute presentation on a technical subject, and keep the audience engaged and interested? I came across this presentation by Dick Hardt, the CEO of “Sxip, a software security company headquartered in Vancouver. Sxip stands for “Simple, eXtensible Identity Protocol”, and is pronounced “Skip” in case you were wondering.
What is really interesting to see is that Dick uses hundreds of slides in a 15 minute presentation, leaving each slide on the screen for no more than a couple of seconds. The slides don’t contain flashy diagrams or reams of 10pt bulleted lists; rather, with a refined simplicity, only contain a few words or a simple picture. Investigating further, I learned that this presentation style originated first with Stanford law professor, Lawrence Lessig, and is known fondly as the “Lessig Method”.
While this unique approach may not be appropriate for all situations, it certainly gives us a sense of how PowerPoint can be used to effectively complement a talk, rather than replacing the talk with words that are read off the screen.
Why not give it a try, even if just for a part of your presentation next time? See if you can grab the audience the way Dick was able to!
Sun’s Open Sourced Modeling Tools
Thursday, April 13th, 2006 | Justin Bugajski | No Comments
A good bit of news out of Silicon Valley today, Sun is releasing open source UML software in a bid to compete with the IBM Eclipse development project. Having two large tech companies making significant open source contributions is an important step towards expanding the reach of analytics software.
Don’t Summarize Away Everything
Tuesday, April 11th, 2006 | Nick Bugajski | No Comments
When presenting results of analysis, it is very important to make sure statistics are presented along with their constraints. Leaving details out may make for an easier read, but it could very well leave the reader misinformed. Such questionable presentation of statistical information might lead a critical reader might become prejudiced against the writer. An article about legislation to close a road in Golden Gate Park a second day each week provides an example:
The academy sees 10 percent fewer visits on Sundays than it does on Saturdays, the closed roads making the difference, Kilduff said.
Now it could be that the difference in attendance at the museum is indeed made up by the road closure. Complex problems like museum attendance usually have more than one variable, making it hard to believe, for those with some scientific background, that the number presented is accurate. Those readers unable to pick up on the simplification of a complex problem might now be under the impression that attendance at the museum on Sunday will go up at least 10% if the road is opened back up on that day.
If we assume that the statement is correct and a result of an unbiased study, only poorly stated, all that need be done to clarify it is a slight rewording:
The academy has attributed a 10 percent drop in attendance on Sundays in comparison to Saturdays to the road closures alone.
The Horrors of Poor Visualization
Thursday, April 6th, 2006 | Justin Bugajski | No Comments
Data visualization expert Howard A. Spielman wrote a recent article in BI Review Magazine that accurately describes one of the biggest problems that arises from feature-rich “business intelligence” tools that are incredibly weak at helping users communicate: poor graph design can confuse and distort your message to the point of mis-information. Simple is always better, don’t let the tool get in the way of your message! If you take a step back and decide what you are trying to say before you begin to create your chart or graph, I promise you will have better results.
Quick ETL
Wednesday, April 5th, 2006 | Nick Bugajski | No Comments
If you have a large amount of data logs that you need to process, you might consider writing a parsing script in Ruby.
A lot of server logs come in CSV files that are gzipped. Rather than going through a process involving unzipping and parsing files one at a time or configuring an ETL tool to do it for you, it may be easier and faster to just write a script that does everything you need all at once. Consider the following Ruby snippet:
Zlib::GzipReader.open('data.csv.gz') do |gz|
CSV::Reader.parse(gz.readline) do |row|
# do something interesting!
end
end
Two lines of code and it is already time to add your business logic!
Given that most ETL tools are complicated and/or expensive, writing a simple script often seems the path of least resistance. Especially when you just need to load the data to do some analysis and not set up an ongoing processing system.