Are we stuck in filter bubbles? Here are five potential paths out

Are we stuck in filter bubbles? Here are five potential paths out

Algorithms can help, but more fundamentally, we need to figure out what we want a diverse pool of information to look like.

By Jonathan Stray @jonathanstray July 11, 2012, 12:30 p.m.

The filter bubble is a name for an anxiety — the worry that our personalized interfaces to the Internet will end up telling us only what we want to hear, hiding everything unpleasant but important. It’s a fabulous topic of debate, because it’s both significant and marvelously ill-defined. But to get beyond arguing, we’re going to need to actually do something. I have five proposals.

If you’re not familiar with the filter bubble argument, start with Eli Pariser’s TED talk. The basic idea is this: All of us now depend on algorithmic personalization and recommendation, such as Google’s personalized results and the Facebook news feed which decides for us whose updates we see. But if these individually-tailored filters are successful in giving us only what we want — as measured by what we click on or “like” — then maybe they’ll remove all the points of view we disagree with, all of the hard truths we’d prefer to ignore, and everything else in the world that might broaden our horizons. Stuck in our own little sycophantic universes, we’ll be isolated, only dimly aware that other people exist or that we might need to work together with them in a shared world.

Or maybe not. The hyperlink has the magical ability to expose us to something completely different in just a single click. Different people have different information needs, and without algorithmic filtering systems we’d be lost in the flood of the web. And anyway, was traditional, non-personalized news really that good at diversity?

People have been talking about the dangers of personalized algorithmic filters since the dawn of the web — here’s Jaron Lanier in 1995, and Cass Sunstein in 2002 — and we’re still talking about it. We can order another round and argue about this forever, or we can try some new things.

1. Stop speculating and start looking

When we look at how people interact on the web, what do we actually see? It’s now possible to visualize the data trails left by communities.

On Amazon, Orgnet showed that most people buy “conservative” or “liberal” books but not both by mapping the “people who read X also read Y” recommendations. On Facebook, a 2008 analysis showed that, yes, our “friends” are more likely to agree with our political attitudes than random strangers — 17 percent more likely to be exact. But it also showed that we tend to imagine our friends to be much more like us than they really are, thus inflating our perception of a filter bubble. On Twitter, people who tweet political terms break into left- and right-leaning social network clusters.

But these sorts of studies cannot answer questions of cause and effect. Do filtered media worlds cause the online segregation we see, or do people construct self-reinforcing filters because they already have divergent beliefs?

This is why the recent Facebook study of link sharing is so unusual: it’s a comparative experiment to determine whether seeing a link in your Facebook news feed makes you more likely to share it. Drawing from a pool of 250 million users and 73 million URLs, Facebook researchers hid certain links from the control group, experimentally removing the effect of seeing that link on Facebook. This breaks the dashed line of causation in the diagram below, which makes it possible to estimate, by comparison, the true influence of the algorithmically customized news feed on your behavior.

The results, summarized by Farhad Manjoo at Slate, aren’t a ringing validation of the filter bubble argument. Unsurprisingly, people are more likely to share links posted by close friends, where “close” is counted (algorithmically) in terms of number of recent likes, comments, etc. But because most people have many more distant friends than close friends, most of what the news feed actually influences us to share comes from weak ties, not strong ones. In other words, the news feed tends to prompt us to view and re-share information from the edges of our social network, not the center.

This study has its limitations: it tracks all URLs shared, not just “news.” Maybe we’re happy to share cat pictures from random folks, but we only trust close friends when it comes to political issues. Still, I’m holding it up as an example because it’s both empirical (it looks at the real world) and experimental (by comparing to a control group we can determine causation.) It might be the largest media effects study ever undertaken, and if we’re serious about understanding the filter bubble we need more work like this.

Simultaneously, I think we also need to be studying older forms of media. It’s not enough to compare what we have now with some idealization of the past; let’s really look critically at the broadcast media era to better understand the tradeoffs now being made. There’s a strong argument that mainstream news only ever really represented the concerns of white, middle-class men, and of course it used to be much harder to consciously seek out alternative perspectives. But nobody ever talks about the “filter bubble of the 1960s.”

2. Bring curation into journalism

Editors still command attention. Every time someone subscribes in any medium, whether that’s in print or on Twitter, they are giving an editor license to direct their attention in some small way. Every time an article page includes a list of suggested stories, someone is directing attention. Editors can use this donated attention to puncture filter bubbles in ways people will appreciate.

But if there has been a decline in the power of editors to set the agenda for public discussion, maybe that’s because the world has gotten a lot bigger. A news editor has always been a sort of filter, making choices to cover particular stories and selecting their placement and prominence. But they filter only the product of their own newsroom, while many others are filtering the entire web. How can users depend on a filter who ignores most of everything?

Editors could become curators, cultivating the best work from both inside and outside the newsroom. A good curator rewards us for delegating our attentional choices to them. We still like to give this job to people instead of machines, because people are smart, creative, idiosyncratic, and above all personal. We can form a relationship with a good curator, sometimes even a two-way relationship when we can use social networks to start a conversation with them at any moment.

But traditional journalism isn’t really in this game. For a start, curation simply wasn’t possible in broadcast and print, because those media don’t have hyperlinks. News organizations tied to those media have been very slow to understand and embrace links and linking (see this, this, and this). Meanwhile, the classic “link roundup” continues to thrive as online form, social media has created a new class of curation stars such as Maria Popova and Andy Carvin, and there are hugely popular news sources that mostly curate (Buzzfeed) or only curate (BreakingNews).

There are many possible reasons why linking and curation have not been more fully adopted by traditional news organizations, but at heart I suspect it boils down to cultural issues and anxieties about authorship . There are glorious exceptions, such as Reuters’ Counterparties, which captures what Felix Salmon and Ryan McCarthy are reading. I’d love to know what other good reporters find noteworthy; that information would be at least as valuable to me as the articles they eventually produce. I believe there’s still a vital role for human “filters,” but only if they’re willing to direct my attention to other people’s work.

3. Build better filtering algorithms

Filtering algorithms are here to stay, and we can make them better. In his book, Pariser suggests a diversity control on our news reading applications:

Alternatively, Google or Facebook could place a slider bar running from “only stuff I like” to “stuff other people like that I’ll probably hate” at the top of search results and the News Feed, allowing users to set their own balance between tight personalization and a more diverse information flow.

I really like the concept of giving users simple controls over their personalized filters, but it’s a monumental UI and technical challenge. We can throw around phrases like “my newsreader should show me more diverse viewpoints,” but it’s really hard to translate that into code, because we’re not being very specific.

The task comes down to finding an algorithmic definition of diversity, and there are several avenues we could explore. Most recommendation systems try to maximize the chance that you’ll like what you get (that is, literally “like” it, or click on it, or rate it five stars, or whatever.) This is an essentially conservative approach. Instead, a filtering algorithm could continually explore the boundaries of your interests, looking for what you didn’t know you wanted. Luckily, this idea has mathematical form: We can borrow ideas from statistics and information theory and say that the algorithm should sample the space of possible items in a way that reduces uncertainty fastest.

Reddit already uses this idea in its comments filtering system, which asks users to vote items up or down. But you can’t vote on comments you never see, which tends to trap voting-based filtering systems in a popularity feedback loop. In 2009, Reddit found a better answer: take into account the number of people who have actually laid eyes on the comment, and “treat the vote count as a statistical sampling of a hypothetical full vote by everyone, much as in an opinion poll,” as Randall Munroe of xkcd fame explains (with pictures!) What’s really going on here is that the filtering algorithm takes into account what it doesn’t yet know about its audience, and tries to find out quickly; the math is here.

Another possibility is to analyze social networks to look for alternate perspectives on whatever you’re reading. If people in our personal social network all hold similar opinions, our filters could trawl for what people are reading outside of our home cluster, retrieving items which match our interests but aren’t on our social horizon. I suspect such an algorithm could be built from a mashup of document similarity and cluster analysis techniques.

There’s huge scope for possible filtering algorithms. But there isn’t much scope for non-engineers to experiment with them, or engineers who don’t work at Google and Facebook. What I’d really like to see is an ecology of custom filters, a thriving marketplace for different ways of selecting what we see. Then we could curate filtering algorithms just as we curate sources! This idea seems to have been most fully articulated by digital humanities scholar Dan Cohen in his PressForward platform.

4. Don’t just filter, map

I think a lot about how to design better filters, and I always run into the same basic problem: I’m just not sure how to decide how someone’s horizons should be broadened. There is far more that is important, a far greater number of issues that really matter, than one person could possibly keep up with. So how are we to make the choice of what someone should see on any given day? I have no good answer to this. But I see another approach: Don’t try to choose for someone else. Instead, just make the possibilities clear to them.

We have no maps of the web. We have no visceral sense of its scale and richness. The great failing of search algorithms is that they only give you what you ask for, but I want a picture of the entire discoverable universe. This is the core idea behind the Overview Project, where my team and I are building a visualization system to help investigative journalists sort though huge quantities of unstructured text documents.

So what if news readers included a map of available items, and the relationships between them? Here’s what it might look like for political books (click for larger):

This map, produced by Orgnet, shows the relationship of U.S. political books to one another in August 2008, as recorded in Amazon.com sales data. An arrow means that people who bought one book also tended to by the other. It’s clear that people are living in two different worlds, with only a few bridges between them.

Now imagine a “You Are Here” marker on the map, highlighting books (or articles) that you’ve already read. Pariser argues that curiosity is the sense that you’re missing something — but we’re always missing something, and we’re always part of a community that is isolated from others. Let’s make those truths palpable in our information consumption systems. There are many concrete ways to do this; I offer some suggestions here and here.

In physical space, we can stare at a map of the world and find ourselves on it. We can tap our finger on where we came from, and realize how much we have yet to see. It is this experience that I want to replicate online.

5. Figure out what we really want

The filter bubble is a pretty abstract concept. It needs concrete examples for illustration, but pretty much all of the examples offered so far have come down a concern that the American left and right will become increasingly isolated from one another, unable to work together to solve common problems.

This is a concern, because American politicians and public alike really have become more divisive and polarized over the last several decades. But if the goal is less polarization, the filter bubble is several steps removed. Addressing polarization by addressing the filter bubble is a plan that depends on a lot of big assumptions: You have to believe that the filter bubble causes or enables social segregation by politics, that exposing people to content from alternate political viewpoints will reduce political extremism, and that there isn’t some other, more direct or effective way we could deal with polarization. For example, our personal relationships change us far more than any “information” we consume, so maybe we should be talking about connecting people, not content. If we want to address political polarization, then we should start asking questions at the beginning, rather than immediately assuming that filter bubbles are the issue.

Conversely, the filter bubble concept seems like it should apply to a lot more than American politics. Pariser might imagine that a good filter gives a nice balance of liberal and conservative views, but what about more unorthodox philosophies? What about things that aren’t politics at all? Maybe a diverse filter should tell me about the environmental effects of bees, or the innovations of Polish cinema. For that matter, I haven’t heard anyone mention language bubbles, which are far more pervasive and invisible. Why don’t my filters show me more material that was translated from Chinese? In a global era, exposing different countries and cultures to each other might ultimately be a far more important goal.

These two questions have been in the background of the filter bubble discussion, but they should be central. First, what is the scope of “diversity”? Does it mean more than domestic political attitudes? Are domestic politics even the example we should be worrying about most? Second, what is it that we are trying to accomplish? How would we know if we were successful? Why do we believe that changing our filters is the best way forward? If we can’t answer these questions, then we have no basis to create better filters.

Girl-in-the-bubble photo by Mike Renlund used under a Creative Commons license.

POSTED July 11, 2012, 12:30 p.m.

SEE MORE ON Aggregation & Discovery

SHARE THIS STORY

TWITTER FACEBOOK (0) EMAIL TUMBLR LINKEDIN

Show comments

Show tags

Paul Quigley

Hey Jonathan, interesting piece. On using social signals without ending in a bubble, I think you need to get into big data to use social sharing without ending up in a feedback loop. NewsWhip (ahem) uses aggregate signals – speed of spread through social networks to put a social filter on the whole news web. Users can apply filters like Tech, U.S., fashion, etc. to see what’s getting shared now in that topic. Ergo – social signals without getting bubbled. Disclosure – I’m cofounder, and yeah I’m kind of plugging it here.
jonathanstray

But why is “speed of social spread” the right signal to decide what we see? Doesn’t that just get us back to a popularity contest? Or in this case, a virality contest.

It’s not that we shouldn’t use various signals to draw items to our attention. We absolutely have to, if we’re going to have algorithmic filters at all. It’s just that we need better ways of translating our values into code. If mass entertainment is all we’re going for, then it’s easy, we just pick whatever is hot now. And that will be the path of maximum audience and therefore maximum commercial viability. But is that always what we want?
Johannes Koponen

I’m sorry to turn this conversation into a startup exhibition – I’m cofounder of Scoopinion – but I just want to emphasize the point Jonathan just made by showing a concrete example. We at Scoopinion measure reading speeds, scrolling and other behavior of the anonymous crowd in content to decide which stories are worth reading.

Then we aggregate the ones that are. We call this crowdcuration. Perhaps surprisingly, a very different set of stories do well in our service than in the services that rely on access based metrics (clicks, likes, shares).
Abdallah Al-Hakim

This is a terrific article that I will need to re-read several times because of the depth of information. Briefly, I am relying on very few social media tools to discover communities. The process of community discovery is difficult one and I have not seen any online service that makes that makes that an automated process. Nonetheless, I am starting to discover some interesting conversations and individuals who are in some cases leading to community discovery. As the web turns more social, I am aware that I am most influenced by a small group of people but as you mention in the article, I try to breakout of those groups (or filters) and identify new nodes.
RobinGood

Excellent write up Jonathan. I really think these are very good suggestions you are making. I see some strong parallels with what I have myself evoked here: http://www.masternewmedia.org/the-google-panda-guide-part-4-the-future-i-would/ – yes, new user-controlled algos, quality curation and mapping the web are three key trends that I think will rapidly change the way we find and access what interest us the most.

Thank you for writing it.
Paul Quigley

Yep, I think you’re right Jonathan, thanks for the reply. Crowd curation (real time popularity) is a valid signal but no comprehensive filter. Though I’d argue it’s not maximum audience, it’s maximum “sharing” – people don’t share crap (simply because it’s embarassing to broadcast gossip news), they broadcast stuff they care about, or that will bring a reaction.

A general thought: we’re so early in the digitalization of human communications and digital social distribution I think our worries are based on the state of technology that will be thoroughly superseded in a few years time. So many signals (like time spent reading each article, I’m sure) will come into play that we’ll end up with a matrix weighted by what we want to know, what others want to know, and what it’s probably beneficial for us to know, with various quality filters thrown in for good measure.

Given the clay is human, social distribution is probably here to stay, for better or for worse. I think given the messy nature of human relations, it won’t engender bubbles. Bubbles have always and will always be there for those who seek them, so will greater vista for those who want to be challenged sometimes.
Stefan Happer

The Zite app for iOS is for me the best recommendation service so far. Once you define your “sections” (i.e. you define the topics that interest you, which is their way of avoiding the filter bubble) it brings you surprisingly interesting and relevant content in each of them. Not sure how they do this exactly.
victoriaalex

Appreciate reading comments to summarize POV…that helps me through the filter(s). Appreciate your informative take Paul Quigley. Amusing that you chose the word “kinda.” It’s okay to do some chest-thumping & have passion about what you do. I mean, words like this don’t jive, have you ever heard of a woman being “sorta” pregnant? Just sayin’…
Janus Daniels

Why so much concern for “an algorithmic definition of diversity” when we desperately need an automatic discernment of veracity?
I have interest in “stuff other people like that I’ll probably hate” but I have a burning desire for anything more honest and more useful that what I read now.
Give me a search slider from truthiness to truth. :-)
Willy V

This all is why I turn off all “personalization” and do not log in when using search engines. I want whay I want, not what some filterization wants me to believe I want.
jonathanstray

Truth is an even harder problem than diversity. I actually just had a conversation with a very sharp engineer trying to explain the problems, among which are:

– many (most?) interesting truths are not empirical, but involve norm-based interpretation
– even in the case of empirical truths, it’s often impossible to verify (because, e.g. the event happened in the past, or causation cannot be established because no experiment is possible, or the necessary information is not available for whatever reason)
– truth is very often contextual, that is, to understand what some claim means we must understand a lot about the world
– You can build algorithmic models for predicting accuracy based on source, topic, match against other sources, etc. and they will work to some degree, but they will break immediately when the world changes. (E.g. if you are depending on an on-the-ground source in a conflict and they decide to change sides.)

My engineer’s response to this: “Oh, so you’d need general AI to do this.”

My response: No. Not even a human can, in general, figure out whether a statement is true. In most cases we can only make educated guesses.
jonathanstray

Search engines are filters by definition. You cannot turn the filters off — that would mean looking through the entire web manually.
Oakleys vault

There is an exceptionally wide variety of sunglasses to choose from vaultoakleys2012.net. The brand is dedicated to designing cutting edge Oakleys vault and use only the best technologies and materials to ensure your Oakleys sunglasses 2012 will see you in first place in the both the sports and style ranks.
snipergirl

Errr, is your use of “gnus” for “news” or “nous” an April Fools’ Joke?