December 14, 2011 0

Using indexing to avoid the lie factor – Fox News Chart redesign

By in Experiments in Info Design, Not So Great Information Design

Ran across an interesting article on a graphic shown on Fox News regarding unemployment under President Obama.  Here’s the original chart:

Now, it’s probably just a data input mistake, but notice how the 8.6% on the far right isn’t exactly where it should be.  Here’s a couple horizontal lines to help you out:

It’s not really that this bothers me much; mistakes happen.  What I didn’t like was moveon.org’s redesign:

which they claim is more “honest”.  Unfortunately, they forgot about a little thing called the Lie Factor (a la Tufte).  In this particular chart, the most recent drop in unemployment appears to be about 33% in the chart, when in fact it’s only about 4.5% (0.4% change on a base of 9%).

This is a problem I run into frequently – when the chart creator is most interested in showing the change in a variable that starts at a high base. Another common example is a quality rate that varies between 99% and 100% – how do you determine where to set the Y-axis start and end points without skewing the data?

My suggestion is to think about the question you’re really trying to answer.  In the chart above, the author is most interested in the change in unemployment during the Obama administration.  The key word there is change - the relative start and end points aren’t nearly as important as whether it went up or down.  This is a great time to use indexing to eliminate the lie factor and tell an honest story.  Here’s a quick example using the data above, but indexing the unemployment rate back to January of 2009 when Obama was inaugurated:

This chart tells a slightly different story than either the original or the redesign.  Obama had a rough couple of years, but in 2011 it looks like there has actually been some progress.  Still, the unemployment rate is 10% higher now than when he took office (note:  I’m not arguing causality here – my economics professors can rest easy).

This redesign isn’t perfect, and it would be difficult to show only this chart in a larger conversation about unemployment, but if you’re looking to show the change in the unemployment rate during Obama’s presidency, I would argue that this is a more effective way to do so than either the flawed original or the deceitful redesign.

August 25, 2011 0

Displaying Categorical Information: The Patents of Steve Jobs (via NY Times)

By in Neat Visualizations

The New York Times has really out-done themselves this time, with a beautiful and highly informative look at the patents Steve Jobs has been assigned to over the past thirty years.  I’m not even as interested in the content as in the visually striking, well-organized, and easy-to-navigate interface.  To think that it was completed in the day following Steve’s resignation is equally impressive.  The categorization of patents makes the information accessible, and the images are both informative and visually engaging.  The pictures afford clicking and, as you would expect, display more information about the particular patent in question.  Finally, the summaries on the left give the appropriate amount of information to describe each category and highlight an interesting patent within the category.

I also like it because it reminds me of McMaster-Carr‘s home page, which is also a beautiful display of categorical information.

New York Times, the Patents of Steve Jobs

 

 

July 28, 2011 0

From the NY Times: Comparing the Deficit Reduction Plans

By in Neat Visualizations

Just a very quick post about a nice interactive graphic in at nytimes.com (who happens to be, in my opinion, the leader in informative, clear, and *usually* unbiased representations of quantitative data).  It shows how the deficit reduction plans have changed over the past week and how the plans of various parties differ.

The point I want to make about this particular graphic is that no graphs were used, and I consider that a good thing.  Many publications would be tempted to add some silly pie chart showing only two numbers, or a wasteful cutesy pictograph with green dollar bills.  In this case, none of these were necessary and if present would have distracted from the simplicity and clarity of the figures.  I also appreciate how explanations are muted but in close proximity to the data they are describing.  A clean, balanced look at the chaos that is currently ensuing in Washington.

Don’t be afraid to use numbers to describe data – they can be quite effective, when there aren’t many of them.

Comparing the Deficit Reduction Plans - Interactive Graphic from NYTimes

July 10, 2011 0

Using Mechanical Turk to Generate Random Numbers: Episode 1 – The Jerks

By in Experiments in Info Design

Wait, what?  Who would use Amazon’s crowd-sourcing marketplace to generate random numbers?  A fine question!  Two types of people – those with a rock-solid faith in the abilities of mankind, and those that get into discussions in a Google cafe about the likelihood of certain numbers (like primes or perfect squares) being over or under-represented in the resulting sample if you would do such a thing.  I’ll pay you $0.025 if you guess which one I am correctly.

Why $0.025?  Well, because that’s what I paid my helpful mechanical turkers, on average, to complete this task:

“ Just send me a single random number between 1 and 100 (including numbers 1 and 100).”

If you’ve ever used Mechanical Turk, you know that it’s pretty much a breeze to set up such a task, and I was able to do so in just a couple minutes.   My first intention was to pay 1,000 people $.01 to complete the task.  So I published the request, sat back, and waited for the joyful flow of random numbers to hit my screen.

Unfortunately, I underestimated either the number of people willing to make 1 cent for 30 seconds of work or the number of people on Mechanical Turk.  After seven hours I had only received 53 responses.  After 24, I had 165.  I didn’t really feel like waiting another 8 days to get to 1,000, so I decided to add a little flavor to the experiment.

Instead of asking 1,000 people to complete the task for $.01, I would c0llect buckets of 100 replies, raising the reward by 1 cent for each bucket.  I had my original 165, and over the next week, submitted requests at $.02,$.03,$.04, and $.05.  I thought it would be interesting to see if the accuracy or response rate changed as I increased the payoff.

After about a week, the experiment was completed.  Here’s a timeline of the first 100 responses I got for each project request, starting on Saturday, June 04 and concluding sometime in the morning of the following Saturday (you’re need to click through so you can see a larger version):

Mechanical Turk Random Number Requests - Overall Project Timeline

Response Rates

You’ll notice that the various projects took different amounts of time to get to 100 replies.  I expected at least an upward trend in response rate as the reward increased, but that wasn’t the case.  Here is a time-normalized view of the same data:

Mechanical Turk Random Number Replies, Time-Normalized

From this view, it looks like the response rate decreased as the reward increased.  This was unexpected, but clearly there are a lot of outside factors that could have impacted these rates.  I didn’t start the requests at the same time of day, I have no idea what other jobs were available for Turkers at the time of publishing, and Amazon may mess with the prioritization of projects.  But, for completeness, here is a table/plot of the average time between responses for each reward:

Mean Response Rate between Mechanical Turk Responses

So, I have absolutely no idea why this would be the case.  I’m debating ways I can run this experiment that might produce better results, such as increased the reward increment, starting the projects simultaneously or at the same day and time of the week, etc.  Until then, I’ll have to claim that in this reward range, there is no obvious increase in the response rate when you offer to pay a bit more.

Errors!

In the timeline charts above, you’ll notice some ugly red marks.  Yes, these are the people that somehow failed to earn their Abrahams.  Despite the seemingly simple task, about 5% of responses were incorrect.  5%!  I might expect 5% of people to incorrectly answer questions that involve some sort of mathematical operation (addition, perhaps), but to simply give me a number?   I am saddened.

After looking through the 33 replies I rejected, I have decided to group the people that submitted these replies into two groups:

  1. Those who have trouble with directions.  20 of the 33 rejected responses included somewhere between two and a thousand numbers between 1 and 100.  Not a single number, but multiple.  Maybe they thought they would get a bonus for extra effort?  Interestingly enough, not one answer came from a random number generator – most were patterns of numbers (though sadly, no Fibonacci).
  2. Jerks.  13 of the 595 people who submitted replies are jerks.  Does 12,900 look like a number between 1 and 100?  How about “23556235845101000000000000000000000000000000?” .   No, it does not.  You, sir, are a jerk.  And I will not pay you!

Here is a summary of the errors by reward, for those that are interested:

Table of Mechanical Turk Response Errors

 

…sigh.

Episode 1 Wrap-Up

Well, I learned two things.  First, I didn’t really think through the set-up of my experiment very well.  But for less than $20, I can always give it another shot and I’m open to suggestions.  Second, 2.2% of Mechanical Turkers are jerks, with a fairly tight confidence interval around that number.  More or less than the real world?  We’ll never know.

Stay tuned for Episode 2: Distributed Terror, where I’ll look at the distribution of response rates (are they Poisson-distributed?  I really hope so…) and the actual distribution of Turker responses from 1 to 100. How uniform are they?  Which numbers were most p0pular?  Which were shunned by the Turk community?  Can we somehow tie this all back to the golden ratio?  These and other questions will be answered.

July 7, 2011 0

Limit Redundant Information in Tables

By in Experiments in Info Design

There are two basic ways to present quantitative information – in a chart or in a table.  We often focus on the proper design of charts, probably because they are the more engaging of the two presentation formats.  However, table design is a fascinating subject that deals with how we can quickly perceive structure and differences between nominal values.  One of the tenets of table design is to use redundancy with intent, and eliminate redundancy that simply clutters the message.  For example, consider the table below, taken from this article on performance enhancements in Mac OS X:

Mac OS X Performance Improvements Table

Along with some relatively obvious flaws like color intensity selection, alignment, labeling, and fill, one of the biggest problems with this table is that it contains a large amount of redundant information.  In the tables above, I count about 140 distinct pieces of information (words or numbers).  Many of these words are repeated – things like the test names, the phrase “Standard Graphics Speed Tests”, etc.  Besides presenting more information than is necessary for the reader, it obfuscates the design of the experiment.  For example, it isn’t obvious that the two trials used similar tests (Cinebench 11.5, Xbench 1.3) until the reader actually reads all of the test names.  Similarly, it isn’t obvious where the differences lie either – the (2x) and (8x) for Cinebench rendering are hidden behind redundant text, making them easy to miss and false assumptions to be made (e.g. the Mac Pro and the Macbook Pro both used 8x Cinebench rendering).

Consider the redesign below.  I was able to reduce the number of pieces of information to ~95, a 33% reduction from the original table design.  I eliminated as much redundancy as I could and used spatial proximity and a logical table structure to reveal the design of the experiment – two different computers, many similar tests.  A reader can ascertain all of this information at a glance instead of reading through gobs of text.  I also took one liberty with the data — I highlighted “significant” changes as opposed to those that appeared to be just noise, drawing the reader’s attention to the tests that showed the most change.

Redesigned Table with Reduced Redundancy

When designing a table, remember that the very structure of the table is a tool that you can use to convey organization, whether it be an experiment design, parent-child relationships (notice the indented lines), or similarity.  Eliminate redundancy when it obfuscates these messages, and use it to highlight important messages (notice the redundant color selection between the title of the chart and the column labels).

June 13, 2011 0

Clear Data on U.S. Tax Rates from the Center for American Progress

By in Neat Visualizations

Admittedly, I haven’t checked the “honest” half of this information, but Paul White’s article about US tax rates does a great job of clearly showing a number of different data sets.  Take this example, which describes income taxes paid by wealthy Americans:

chart of taxes paid by wealthy americans

Pretty much the only thing in the chart is information – a starting year and value for each line, and an ending year and value for each line.  The purpose here is to show the trend, not clutter the graph with meaningless data.  The artist takes the additional helpful step of eliminating a legend and fixing labels in close proximity to the lines (and even uses the same color for the label and the line!).

In the following chart, the artist makes use of both hue and intensity as preattentive attributes (pdf, page 5) to draw our attention to interesting data points (the smallest, the biggest, and similar countries in dark blue, the US in bold red):

Nice chart describing corporate tax rates

Notice how quickly you can understand the point of the chart – the United State’s position in a list of country corporate tax rates.

Overall a very nice, minimalist, and effective presentation of information.  Good way to end the weekend!

 

June 4, 2011 0

Concept Design Studio: Playing with Sketchup

By in General Information

I’ve been playing with Google’s Sketchup product since before Google owned it (waaaay back in 2004).  It’s a fantastic product for creating quick 3D models, made all the more powerful by the library of objects created by Sketchup users.  I don’t use it often, but every once in a while it’s a fun way to pass an hour. Yesterday I did just that to imagine what my design studio of the future would look like.

About two weeks ago I was fortunate enough to get a tour of an industrial design studio in San Francisco.  The space was incredible — a building with hundreds of years of history housing some of the most fascinating product designs of the future.  The atmosphere was dripping with creativity, innovation, and potential.  It made the Google offices I work in, while fun and inspiring themselves, drab by comparison.

I had never been in a design studio before, and afterwards  I was asked how my preconceptions about design space differed from the reality. I struggled to put my thoughts into words, but the one component I was surprised at was the spatial separation of digital and physical design space.  To me, the two have become so intertwined that I believe it makes sense to combine them into a single workflow, so prototypes can be made on a bench next to massive displays showing renderings next to whiteboards with design sketches.  Instead of seeing digital efforts as a separate workflow, intertwine them physically with the method that has been used since industrial design has existed:  sketch, prototype, refine.

Unable to clearly articulate these thoughts, I thought it would be fun to spend an hour in SketchUp throwing together my future design studio (hah!).  Some of the highlights:

  1. The space is designed into pods consisting of computer space, workspace (big tables for messing with physical medium like foam board), creative space (huge whiteboards where sketches can be made or posted), and inspiration space (big LED televisions that can show renderings or cycle through photos that will inspire the designers).
  2. The designers’ computer space always faces the creative space, so they are continuously presented with the entirety of their efforts.  This allows for subconscious and conscious assimilation of the entirety of design information, and plays to the senses described in Daniel Pink’s “A Whole New Mind” – most importantly, design, story, and symphony.
  3. Windows.  Lots and lots of windows.  Sunlight makes us happy.  Happiness begets creativity.  Creativity inspires design.  Win.
  4. Some relaxing space to take a breather.  I am assuming the kitchen and meeting rooms are on a separate floor, so I’ve just tossed in a few inspiring design peaces (an Eames lounge chair, for example).

If I spent more time with it, I would do a few things differently.  First, probably lower the height of the whiteboards – I feel like the space has become a bit too closed off. Second, I might switch the external pods to be facing the windows – design should always be looking towards the world, right?  Finally, I should probably add some plants, and maybe a patio with a garden.  Nonetheless, it was really fun to see what could be done in an hour.  While it certainly won’t be winning any awards for beauty, it’s a really easy way to sketch some prototypes for your new office, house, etc.

So, what’s in your ideal studio space?

 

 

 

May 22, 2011 0

Data Lite: Birth of Technology via Ngram

By in Neat Visualizations

Google’s Ngram Viewer let’s you see how often a word or phrase has appeared in Google’s scanned book library (full about).  Here’s a quick one with regards to technology over the last 170 years.  Kind of interesting how early on “telegraph” was mentioned, and what’s with “computer” showing up in 1900?

Google ngram about technology

May 19, 2011 0

Dishonesty from the Wall Street Journal

By in Not So Great Information Design

The New Republic highlights the dangers of dishonest data.  Apparently the chart  on the left below, originally accompanying this article in the Wall Street Journal (the same publication that produced this book??), has been making the rounds of conservative blogs as support for arguments against increasing taxes on the wealthy – because, clearly, all the wealth resides in the middle class.  Right?

Mother Jones does us a solid by re-designing the chart (on the right below) in a slightly more (but not completely) honest fashion (I’m assuming this is due to their lack of source data).  The problem with the original, of course, is the arbitrary selection of bucket sizes, which range from $4,000 ($1K – $5K) to $4 million ($1M – $5M).  The presentation of data should always objectively represent the underlying information.  Simply using constant bucket sizes for the histogram (say, $10k buckets) would eliminate these issues.  Except in extreme cases, use constant bucket sizes for histograms.  Anything else skews data in unexpected ways.

Shame on the original designer and the Wall Street Journal for allowing this kind of crap on their site.

where the money is redesign by mother jones

May 18, 2011 0

Use Tree Plots Sparingly

By in Experiments in Info Design, Not So Great Information Design

Big things stand out.  So, when Asymco decided to report on the relative market and profit share in the smartphone industry, they got big.  Really big:

The dark orange represents Apple, and the remainder of color-coding is meant to represent profitability – pink and orange are profitable, white is (assumedly) break-even, and blue is a company operating at a loss.  The design in the original post has a number of flaws that make it difficult to draw conclusions from the information:

  1. Chart choice.  This is the primary flaw of the design.  The use of a tree map-type style chart was incorrect for this data.  Though tree charts are capable of showing the relative size of different elements, they aren’t very effective.  Humans have a difficult time determining the difference in size between two areas, particularly when those areas that are not in close proximity.  For example, try to determine from the charts above who has a larger unit share, Motorola or HTC.  The rectangles are pretty much the same size, with Motorola possibly edging out HTC.  Tree maps also tend to take a lot of space to say very little.
  2. Color selection.  Two issues here.  First, the colors break conventional norms, with red representing profitability and blue representing a loss.  Typically, “in the red” refers to a loss, and black or another more positive color like green would be used to represent positive profits.  Second, there is no legend on the charts to describe which color means what.  White isn’t labeled anywhere, including the author’s explanatory text.
  3. Labels.  There is no reason not to label these large boxes with the full name of the manufacturer.  I could probably figure out what “SE” means, but I don’t want to (and neither does anyone else).  Also, what do “Diversified” and “Smart” mean?  I can pretty much figure it out, but I shouldn’t have to.
  4. Contrast and assumed proximity.  The contrast between the “Diversified” and “Smart” areas is low, and I have to assume that anything below the “SMART” label is a smartphone-only manufacturer.  Why should I make that assumption?  Tree charts don’t have to follow that convention, so the author is again taking liberties with his audience’s ability to interpret.

Here is a quick alternative, with estimated numbers since I don’t have access to the source data.  I’m not 100% sold on the side-by-side charts, but I thought I’d throw it out there and see if it stuck.  I also haven’t reduced the space of the chart as much as I’d like.  A better alternative might be a scatter plot, but my attempts yielded a fairly cluttered chart.  If I chose to reduce the information presented (like eliminating the distinction between Distributed and Smartphone-only manufacturers), a scatter plot might be very effective.

However, even in this simple redesign, notice how easy it is to draw conclusions both within a category (by units or by operating profit) and across categories.  I’ve eliminated the confusing color coding and used simple English to highlight companies without profits and to distinguish between the different types of manufacturers.  Color is only used to highlight the main point of the chart – that Apple makes a few phones but takes home most of the money.  What do you think?  How could this design be further improved?

Asymco Chart Redesign