Calculating and Visualizing Offer Similarity

When a user visits a TrialPay touchpoint, they’re shown a variety of offers from our advertisement network they can complete in order to earn some virtual good for free (e.g., anti-virus software, Fandango movie tickets, or in-game virtual currency).  Our data science team has been working to determine whether certain offers are more similar than others, and whether offers can be sorted into “communities” of offers that tend to be completed by the same people, using only user clickstream data.

Visualization by Dave Holtz. Adapted from a visualization by Mike Bostock

The graphic above is a visual representation of the calculated similarity between (and subsequent clustering of) different offers in TrialPay’s advertisement network.  An interactive version can be found here.  Similarity between offers is calculated by constructing a completor-offer matrix. Each row in the matrix corresponds to a TrialPay offer completor, and each column corresponds to a TrialPay offer. The cosine similarity between each column is calculated to build an offer-offer similarity matrix.

A naive way to visualize this data is to construct a network diagram, wherein each node corresponds to an offer, and each (weighted) link corresponds to their calculated similarity. However, once data becomes large (as TrialPay’s is), network diagrams tend to become inscrutable hairballs — “hairballs” might make a great addition to the MOMA‘s latest collection of abstract art, but it won’t be help you make useful insights into user behavior!

A helpful alternative to the network diagram is representing the network as an adjacency matrix, where each cell Aij  represents a link from node i to node j. This is the approach we’ve taken above.  Again, in our case nodes correspond to offers, and the links correspond to the strength of their similarity in the offer-offer matrix. The more intense the coloring of the cell, the more similar the offers.

You might have noticed that the cells in this adjacency matrix visualization are colored differently. Color corresponds to different ‘clusters’ or subcommunities of offers. Clusters are calculated using the R package iGraph‘s implementation of the Walktrap community-detection algorithm [1], with nsteps = 4. Clustering offers allows us to answer the question, “What larger groups of offers tend to be completed in tandem?” A logical advancement in this line of thinking is to create user profiles corresponding to the offer subcommunities. A few prominent ones present in the above dataset include:

People with wanderlust:

  • These users complete offers like Orbitz Vacation PackagesSmart DestinationsOrbitz Hotels, and Funjet Vacations. With all of that traveling, its a marvel that these people have time to get free online goods through TrialPay!

People who love sharing the details of their medical history

  • These users complete offers like the Gout Survey, the Rheumatoid Arthritis Survey, the Pediatric Depression Survey, and the Crohn’s Disease Survey. These are the TrialPay equivalent of that one uncle at Thanksgiving who loves to show you those fresh stitches from his gnarly surgery in October.

People who spend a lot of time around the house, but still want to have fun

  • These users complete offers like Flirty ApronsCreative Girls ClubAnnie’s Creative Painting Kit Club, and Amora Coffee. They’re probably spending a lot of time at home with the kids (hence the need to stay fueled with some Amora Coffee!). But hey, just because you’re taking care of household chores doesn’t mean you don’t want to spice things up now and again with a Flirty Apron…right? If Danny Tanner of Full House completed offers with TrialPay (and we like to think he would’ve had the internet existed in the days of TGIF), he’d probably fall into this group.

The interactive version of this visualization allows you to re-order the matrix’s rows and columns.  Re-ordering the rows and columns of the adjacency matrix can surface fascinating insights into TrialPay user behavior. This visualization allows you to try different orderings via the drop-down menu. There are four ways to organize the rows and columns.

  • What’s in a name? This option sorts the rows and columns by the offer name.
  • By frequency: “What’s the frequency, Kenneth?” This option sorts the rows and columns by how many offers a node has some connection to (i.e., how many offers have non-zero similarity to this one?).
  • By cluster: “Ain’t nobody fresher than my mother***in’ clique!” This option sorts the rows by their cluster / subcommunity.
  • By offer type: What’s your type? Here at TrialPay, we assign offers a type (such as cost per lead or cost per sale). This option sorts the rows by type.

It’s noticeable that sorting by characteristics with stronger correlation to offer similarity (e.g., cluster) produce more highly diagonalized adjacency matrices (i.e., most non-zero values appearing near the diagonal). Characteristics that don’t correlate with similarity (e.g., name) show no such trend.

Even though the adjacency matrix is an effective way to visualize this information, it still doesn’t scale well to data as large as ours (thousands of offers). In order to procure a more digestible data set, the above visualization only shows offers meeting certain selection criteria pertaining to value, similarity, and reward currency.

[1] Pascal Pons, Matthieu Latapy: Computing communities in large networks using random walks

This entry was posted in Data Science. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s