Screen Shot 2015-02-25 at 2.33.35 PM

We all know by now that visualization, thanks to its amazing communication powers, can be used to communicate effectively and persuasively massages that stick into people’s mind. This same power, however, can also be used to mislead and misinform people very effectively! When techniques like non-zero baselines, scaling by area (quadratic change to represent linear changes), bad color maps, etc., are used, it is very easy to communicate the wrong message to your readers (being that done on purpose or for lack of better knowledge). But, how easy is it?

How easy is it to deceive people with visualization?

This is the main question of the work we have recently published at ACM CHI’15 named “How Deceptive Are Deceptive Visualizations: An Empirical Analysis of Common Distortion Techniques“. This has been developed in my lab at NYU with our amazing students Anshul Vikram Pandey, Katharina Rall and my colleagues Meg Satterthwaite and Oded Nov.

The deception/distortion problem is of course very well known. Tufte has whole chapters on ”Junk Charts” and the “Lie Factor” in “The Visual Display of Quantitative Information“. The super classic “Hot to Lie with Statistics” has lots of interesting examples with graphs and stats. You may also want to read “How to Lie with Maps” and “How to Lie with Chats” and the excellent article “Disinformation Visualization: How to lie with datavis” (hosted by the excellent Visualizing Advocacy).

All these sources discuss the problem in details and provide many examples, but we were surprised by the lack of experimental work in this area. To test the deception effect we therefore created an experiment.

The Experiment

For the experiment we selected a series of well-known graphical distortion techniques and developed a study on Amazon Mechanical Turk with hundreds of participants to test the deception effect.

We use techniques such as:

Truncated Axis

truncated

When the bars do not start from zero (right) the difference looks much bigger.

Inverted Axis

inverted

When the axis is inverted (right) the values grow towards the bottom and thus mislead the reader.

Aspect Ratio

aspect

The ratio can be manipulated to give the impression a quantity grows much faster/slower than it does.

Area Mapping

area

When a quantity is mapped to the radius, the perceived quantity is the area of the bubble which grows quadratically.

How is the deceptive effect measured?

We present half of the participants with a deceptive version of the chart and half with the not deceptive one. Then, we ask the same question to both and measure the difference between the two. Let me give you an example.

We show bar charts similar to those shown above and say that they represent the percentage of population with access to drinking water in Willowtown and Silvatown (yes we use fake names, see why in the paper). One version has a truncated axis, the other does not. Then we ask: “How much better are the drinking water conditions in Willowtown as compared to Silvatown?“.

The participants do not have to provide a precise number, rather they have to provide a gross estimation of the effect using a 5-item Likert scale, ranging from “slightly better” to “substantially better“. We do the same with other charts and data, using slightly different questions and then compare for each case how the two versions compare. Here are the results.

Results

The results, probably not surprisingly by reassuringly look like this:

effect size and confidence intervals of deception study

How do you read this? This is a comparison of the average response with confidence intervals. The average is the dot, the confidence interval is the hinges you see around the dot (confidence intervals show plausible values, that is where the dot may fall). Each color represents one chart type: line, bubble, bars (we have inverted axis also in the paper but it is not shown here).

For each chart the control, that is the chart with not deceptive effect, should lead to a smaller estimate in the response. When I ask you “How much better are the drinking water conditions in Willowtown as compared to Silvatown?” the control chart should show that the difference is not that big. But when I show you the same data with a truncated axis, the difference between the bars looks bigger and this should lead to a much bigger estimate. This is exactly what you can see in the chart above. For each condition the deceptive version always leads to bigger estimates. You can also see that line charts and bar charts seem to have a more dramatic effect than bubbles.

The paper shows other aspects of the study. For instance, we tried to see whether this effect is modulated by individual differences like gender, age, education level, etc., but we failed to find these differences (this of course does not mean they do not exist).

Implications

What are the implications of the study? It think it very simply indicates that this effect is real and can be very big. So now when you talk to someone and want to show that this effect is supported by evidence you can point him/her to our paper. I think this work also traces the line for other potentially interesting future works. I think we need much more evidence-based results in visualization to support our intuitions, this is a first step in this direction and hopefully it will inspire other researchers to do the same.

Also, this work is in collaboration with top Human Rights experts (Prof. Satterthwaite at NYU Law and her student Katharina Rall) and I am excited to see our work having an impact on such an important community. Human Rights has a constant need to communicate messages effectively, with little bias and persuasively. This is a tiny new brick in their toolbox and this is very exciting!

 

{ Comments on this entry are closed }

I often find myself asking: “What do we do this Data Visualization thing for?”. Of course I do it mostly because it’s fun, and I bet it’s the same for you. Yet, is there a way we can find some deeper meaning in it? Are there some higher level purposes we can identify? Meaning often comes in relation to impact one can have on other people’s lives, so here is a tentative list off the top of my head of how vis can impact people’s lives (feel free to add yours in the comments below).

1) To help scientists and researchers do great things with data.

This is probably the most important for me. Clearly influenced by the way I work. I often pair up with domain experts and help them make sense of complex data through visualization. My reasoning goes like this: if I provide tools for scientists to reason about complex issues I may end up playing a role for important discoveries. I have done work with biologists and doctors who study cancers, diabetes, and other nasty diseases. I have done work with development agencies who try to eradicate poverty in the world. I have done work with climate scientists who try to understand how climate evolve. Isn’t that great? Working with scientists is hard but also deeply satisfying. When you see that spark in their eyes and. I urge everyone to try. That’s a great way of doing vis.

2) To increase people awareness and understanding of complex issues that matter.

Here we are clearly in the data journalism realm, but not exclusively, not necessarily. Think about educating students in school, or writing a book full of charts, or publishing a paper. With visualization we can increase people’s knowledge, isn’t that fantastic? Maybe some of them will make some important decisions for life: choose a career path, become an activist, change school, become a doctors, who knows? This power of course comes with a lot of responsibility but we have the power to change people’s life through information communication. We can have an immediate effect.

3) To provide pleasurable experiences.

Ok, let me tell you that. People like visualization because it’s beautiful. Maybe not only because of that, but beauty plays a major role and we should celebrate it. What is life without beauty after all? Data visualization has an artistic side, both when the main purpose of the designer is art or something else. It is similar to architecture I guess: a great new building or bridge can be a great pleasure to watch even if its goal is mainly functional. As creators of visuals people use or consume we have the opportunity to provoke emotions and pleasure. The emotion of understanding on a visceral level how much dedication and experience is embedded in a beautiful artifact. The emotion of understanding that behind data there may be people’s lives and for that reason to find a new way to deeply connect with them. Art should be celebrated more. It goes directly to people’s heart when done well.

4) To increase data literacy.

And of course we can also teach people how to think with data, so that they now have a meta-skill! Teaching visualization to people is deeply satisfying and has such a direct impact on their life. My students are so excited when they take my course. I like to think that this is because of me but it’s mostly because the subject is beautiful and empowering. And how about teaching kids in school? I am waiting for that to happen.

And now it’s your turn. How do you find meaning in visualization?

{ Comments on this entry are closed }

Book: Statistics as Principled Argument

by Enrico on January 9, 2015

in Reviews

stats-principledI just started reading Statistics as Principled Argument and I could not resist to start writing something about it because, simply stated, it’s awesome.

The reason why I am so excited is because this is probably the first stats book I found that focusses exclusively on the narrative and rhetorical side of statistics.

Abelson makes explicit what most people don’t seem to see, or be willing to admit: it does not matter how rigorous your data collection and analysis is (and by the way it’s very hard to be rigorous in the first place), every conclusion you draw out of data is is full of rhetoric.

I think this is a super important message not only for those who produce stories or arguments based on data, from scientists to journalists, but also and above all for the population at large. I think too often people are impressed by this aura of scientific rigorousness and objectivity that numbers and technology provide. There’s no such thing as totally neutrality and objectivity. There are credible and not so credible arguments.

Here are a few sentences extracted from the book I’d like to share:

“… the presentation of the inferences drawn from statistical analysis importantly involves rhetoric” …

and then on the narrative role of stats:

“Beyond its rhetorical function statistics statistical analysis has a narrative role. Meaningful research tells a story with some point to it, and statistics can sharpen the story.”

and on interestingness:

“I have been led to consider hat kind of claims a statistical story can make, and what makes a claim interesting. Interestingness seems to have to do with changing the audience’s belief about important relationships, often by articulating circumstances in which obvious explanations of things break down”

and on the purpose of statistics:

“I have arrived at the theme that the purpose of statistics is to organize a useful argument from quantitative evidence, using a form of principled rhetoric.

and then he brilliantly warns us that we cannot just do without the rigorousness of numbers and stats:

“The word principled is crucial. Just because rhetoric is unavoidable, indeed acceptable, in statistical presentations it does not mean that you should say anything you please.”

It looks like a great book everyone should read. I am on chapter four.

p.s. Thank you very much Alberto for suggesting the book to me in the first place and Stefania for reminding me.

{ Comments on this entry are closed }

Had a fantastic visit at ProPublica yesterday (thanks Alberto for inviting me and Scott for having me, you have an awesome team!) and we discussed about lots of interesting things at the intersection of data visualization, literacy, statistics, journalism, etc. But there is one thing that really caught my attention. Lena very patiently (thanks Lena!) showed me some of the nice visualizations she created and then asked:

How do you evaluate visualization?

How do you know if you have done things right?

Heck! This is the kind of question I should be able to answer. I did have some suggestion for her, yet I realize there are no established methodologies. This comes as a bit of a surprise to me as I have been organizing the BELIV Workshop on Visualization Evaluation for a long time and I have been running user studies myself for quite some time now.

Yet, when we are confronted with the task of evaluating visualization for communication purposes and for a wide audience what is the best way to go? I am not aware of established practices or methodologies that address this problem. Traditionally, academic work has focussed more on exploratory data analysis problems conducted by experts or very narrow experimental work on graphical perception.

But, let’s see what are the main issues and options there …

1) Expert Review or User Study? This is a classic problem in usability evaluation. Should we ask an expert to look at our visualization and give suggestions on how to improve it or involve users and see how they perform? Both a very valid and not necessarily mutually exclusive options. Typically, expert reviews are less costly and as such they are used in the early phases of the development process to iterate fast on the design. User studies involve a (hopefully) representative sample of people who get exposed to the visualization and some sort of qualitative or quantitative data collection about their experience. The unique problem of visualization, as opposed to the more generic problem of user-interface design, is that there are not so many experts out there. Plus, the experts do not use an established methodology, so the whole process does not scale. But if you want to run user studies you life does not get easier. User studies are a huge mess and if you don’t have experience running them you can do lots of things wrong. Very wrong.

2) Representative Sample? Assuming you want to run a user study, what is a representative sample? Once again, I think visualization poses unique challenges here. The problem is that visual literacy is quite low in the population so it’s not clear what you should shoot for. If you want to communicate to the layman you might end up not using visuals at all! But at the same time if every agency out there plays safe we won’t see any progress and we cannot expect visual literacy to increase. It’s a catch-22: if we don’t use advanced graphics people don’t learn, but if we use these visuals they might not be able to read our message. So we are left with the question of what is a representative sample. I think the main question here is representative of what? One way to go is to create a profile and recruit people with this profile or try to cover a whole spectrum of profiles, which of course might be much more costly and time consuming.

3) Data Collection / Benchmark Tasks or What? Ok now we have a representative sample of our readers, how do we test our visualization with them? One might try to adopt established methods from usability evaluation but the problem here is that usability evaluation is mostly based on the concept of “task”, that is, I show my study participants my interface and ask them to do something with it. Is that a good method for vis? I am not sure. Communication-oriented visualization is not really about performing a specific task to achieve a well-defined goal. Visualization is more about information transfer. How do we measure information transfer? Maybe we show the visualizations first and then we ask questions afterwards to see what information people have retained? That’s a viable way but it does not capture the visualization process itself, that is, what and how the user thinks during his or her interaction with the visuals. Another way to go is to use a “think-aloud” protocol: you sit next to your users and ask them to vocalize what they are thinking. This way you have a direct experience with what is going on. But, once again, this is easier said than done, as the way you interact with your participants, what you ask them, when and how, can heavily influence the outcome. So you have to be very careful there too.

There are probably many many more issues here but the common thread seems to be that while there are established methods and methodologies one may be able to adopt from traditional usability testing, visualization poses some unique challenges that are not solved yet.

On a side note, we also discussed the use of crowdsourcing platforms like Amazon Mechanical Turk to evaluate visualization. This is another viable way. It may (maybe) solve the sampling problem but it does not solve the others. Actually the others get even more complicated when you have limited interaction with your target population.

And you? Do you have any experience doing evaluation in this area? Are there other important issues or solutions worth mentioning?

Once again thanks Scott, Alberto and Lena for the inspiring discussion that triggered this blog post. There is so much more work that needs to be done!

Take care.

{ Comments on this entry are closed }

I could not resist writing this short blog post after having a such a nice conversation with Scott Davidoff yesterday. Scott is a manager at the Human Interfaces Group at NASA JPL and he leads a group of people that takes care of big data problems at NASA (I mean big big data as those coming from telescopes and missions).

While on the phone he said:

You know Enrico … the way I see it is that we are mechanics for scientists … the same way Formula 1 has mechanics for their cars“.

What a brilliant metaphor! Irresistible. It matches perfectly my philosophy and at the same time, sorry to say, I think it does not match very well with the way most people see vis right now.

It reminds me the brilliant “Computer Scientist as a Toolsmith“, the fantastic essay written by Fred Brooks (ACM Turing Award) which I have adopted a long time ago as my personal manifesto. Fred Brooks advocated for a different way to see the role of Computer Science (one I am sure many of my colleagues refuse) as an engineering discipline whose purpose is to provide services to scientists. He famously stated that:

IA > AI (Intelligence Amplification can beat Artificial Intelligence).

That is, a machine and a mind can beat a mind-imitating machine working by itself.

And this all reminds me why I do what I do and why I think we should do more. Much more. In 2011 I was invited at Visualizing Europe, and event organized by Visualizing.org, and I gave a talk that pretty much covered the same ground: “Data Visualization is NOT Useful. It’s Indispensable“.

Talking with Scott, once again I realized how many people out there need our help. These are the people who may discover the next cure for cancer, help us going to Mars, find a way to preserve our planet, prevent terrorist attacks or disasters, just to name a few. You may think these people already have the necessary knowledge, means and skills to tackle big data problems on their own but you are wrong. These people are busy with their science, and for a good reason!

All these people need us! Let me repeat it: all these people need us! It’s up to us to show them what they can do with our tools and skills. Most of them simply do not imagine how powerful some of the things we do may be for them.

Let me tell you one thing: I have collaborated with a few scientists in my career so far and they love it when we make their life easier. Often they are blown away by simple trick we take for granted.

So if you are passionate about data and data visualization I urge you to think about this: you can decide to tackle hard problems with data. You can decide to make a big difference with pairing up with people who deal with hard scientific problems and help them make progress. It’s up to you to make this choice.

C’mon!

My biggest ambition is to be a mechanic. A mechanic for the the Formula 1 of science.

And you?

 

{ Comments on this entry are closed }

Data Visualization or Data Interaction?

May 8, 2014

… or whatever we want to call it. Yin Shanyang writes on twitter in response to my last post on vis as bidirectional channel: This comment really hits a nerve on me as I have been thinking about this issue quite a lot lately. I must confess I am no longer satisfied with the word […]

Read the full article →

Visualization as a bidirectional channel

May 6, 2014

I am preparing a presentation for a talk I am giving next week and I have a slide I always use at the beginning that asks this question: How do we get information from the computer into our heads? This works as a motivation to introduce the idea that regardless the data crunching power we […]

Read the full article →

My (stupid) fear we may, one day, become irrelevant

April 1, 2014

[Be warned: this is me in a somewhat depressive state after the deep stress I have endured by submitting too many papers at VIS’14 yesterday. I hope you will forgive me. In reality I could not be more excited about what I am doing and what WE are doing as a community. Yet, I feel […]

Read the full article →

Course Diary #3: Beyond Charts: Dynamic Visualization

March 7, 2014

This is the last lecture of the introductory part of my course where I give a very broad (and admittedly shallow) overview of some key visualization concepts I hope will stick in my students’ head. After talking about basic charts and high-information graphics I introduce dynamic visualization as visual representations that can change through user […]

Read the full article →

Course Diary #2: Beyond Charts: High-Information Graphics

February 28, 2014

Hi there! We had a one week break at school as the inclement weather forced us to cancel the class last week. Here are the lecture slides from this class: Beyond Charts: High-Information Graphics. In this third lecture I have introduced the concept of “high-information graphics”, a term I have stolen from Tufte’s Visual Display […]

Read the full article →