Posts Tagged ‘realtime data’

As we wrote in our last post, Gnip co-sponsored the 2011 Dreamforce Hackathon, where teams of developers from all over the world competed for the top three overall cash prizes as well as prizes in multiple categories.  Our very own Rob Johnson (@robjohnson), VP of Product and Strategy, helped judge the entries, selecting the Enterprise Mood Monitor as winner of the Gnip category.

The Enterprise Mood Monitor pulls in data from a variety of social media sources, including the Gnip API, to provide realtime and historical information about the emotional health of the employees. It shows both individual and overall company emotional climate over time and can send SMS messages to a manager in cases when the mood level goes below a threshold. In addition, HR departments can use this data to get insights into employee morale and satisfaction over time, eliminating the need to conduct the standard employee satisfaction surveys. This mood analysis data can also be correlated with business metrics such as Sales and Support KPIs to identify drivers of business performance.

Pretty cool stuff.

The three developers (Shamil Arsunukayev , Ivan Melnikov  and Gaziz Tazhenov) from Comity Designs behind this idea set out to create a cloud app for the social enterprise built on one of Salesforce’s platforms.  They spent two days brainstorming the possibilities before diving into two days of rigorous coding. The result was the Enterprise Mood Monitor, built on the Force.com platform using Apex, Visualforce, and the following technologies: Facebook API (Graph API),  Twitter API, Twitter Sentiment API, LinkedIn API, Gnip API, Twilio, Chatter, Google Visualization API. The team entered their Enterprise Mood Monitor into the Twilio and Gnip categories. We would like to congratulate the guys on their “double-dip” win as they took third place overall and won the Gnip category prize!

Have fun and creative way you’ve used data from Gnip? Drop us an email or give us a call at 888.777.7405 and you could be featured in our next blog.
No Comments

While Steve Jobs’ resignation yesterday had investors anxiously watching how $AAPL fared in trading, we at Gnip were having fun watching a different ticker- the realtime Twitter feed.

As you can see from the graph below (these represent the number of “Steve Jobs” mentions per minute*), Twitter showed an incredible spike almost immediately. Apple-specific activity peaked 11 minutes after the news broke, showing how quickly the word spread. Honors for first tweet go to @AronPinson, who must have some blazing fast typing skills.

Once again, it’s incredible to see how social media is quickly becoming a trusted means of accessing and delivering realtime information.

*For more details on how we conducted this search across the millions of real-time tweets we have access to, contact us!

It’s been a volatile time for the markets the last few weeks. Today especially – the Dow closed down 635 points; S&P, -80; NASDAQ, -175. While there’s no shortage of opinions on how/why the market will/will not recover, one thing is for certain – having the right data to make decisions is more important than ever.

Part of the reason for this is that the markets are clamoring for trends – definitive information on stock trends and market sentiment. Which is why it’s exciting to see how our finance clients are using the Gnip realtime social media data feeds. In a time of increased volatility, our hedge fund (and other buy-side) clients are leveraging social media data as a new source of analysis and trend identification. With an ever-growing number of users, and Tweets, per day, Twitter is exploding, and market-leading funds are looking at the data we provide as a way to more accurately tap into the voice of the market. They’re looking at overall trend data from millions of Tweets to predict the sentiment of consumers as well as researching specific securities based on what’s being said about them online.

While the early-adopters of this data have been funds, this type of analysis is available to individuals as well. Check out some start-ups doing incredible things at the intersection of finance and social media:

  • Wall Street Birds is a service that allows average investors to make investment decisions based on the analysis of social media data
  • Centigage provides analytics and intelligence designed to enable financial market participants to use social media in their investment decision-making process
  • SNTMNT offers an online tool that gives daily insights into online consumer sentiment surrounding 25 AEX funds and the index itself

For the first time in history, access to (literally) millions of voices is at our fingertips. As the market continues its volatility, those voices gain resonance as a source of pertinent information.

The social ecosystem has become the pulse of the world. From delivering breaking news like the death of Osama Bin Laden before it hit mainstream media to helping President Obama host the first Twitter Town Hall, the realtime social web is flooded with valuable information just waiting to be analyzed and acted upon. With millions of users and billions of social activities passing through the ever-growing realtime social web each day, it is no wonder that companies need to reevaluate their traditional business models to take advantage of this valuable data.

But with the exponentially ever-growing social web, massive amounts of data are pouring into and out of social media publishers’ websites and APIs every second. In a talk I gave at GlueCon a couple of months ago, I ran down some math to put things into perspective. The numbers are a little dated, but the impact is the same. At that time there were approximately 155,000,000 Tweets per day and the average size of a Tweet was approximately 2,500 Bytes (keep in mind this could include Retweets).

A Little Bit of Arithmetic

155,000,000 Tweets/day   2,500 Bytes = 387,500,000,000 Bytes/day

387,500,000,000 Bytes/day  24 Hours = 16,145,833,333 Bytes/hour

16,145,833,333 Bytes/hour 60 minutes = 269,097,222 Bytes/minute

269,097,222 Bytes/minute 60 second = 4,484,953 Bytes/second

4,484,953 Bytes/second  1,048,576 Bytes/megabyte = 4.2 Megabytes/second

And in terms of data transfer rates . . .

1 Megabyte/second = 8 Megabits/second

So . . .

4.2 Megabytes/second  8 Megabits/Megabyte = 33.8 Megabits/second

That’s a Lot of Data

So what does this mean for the data consumers, the companies wanting to reevaluate their traditional business models to take advantage of vast amounts of Twitter data? At Gnip we’ve learned that some of the collective industry data processing tools simply don’t work at this scale: out-of-the-box HTTP servers/configs aren’t sufficient to move the data, out-of-the-box config’d TCP stacks can’t deliver this much data, and consumption via typical synchronous GET request handling isn’t applicable. So we’ve built our own proprietary data handling mechanisms to capture and process mass amounts of realtime social data for our clients.

Twitter is just one example. We’re seeing more activity on today’s popular social media platforms and a simultaneous increase in the number of popular social media platforms. We’re dedicated to seamless social data delivery to our enterprise customer base and we’re looking forward to the next data processing challenge.

The Twitter Streaming API is designed to deliver limited volumes of data via two main types of realtime data streams: sampled streams and filtered streams. Many users like to use the Streaming API because the streaming nature of the data delivery means that the data is delivered closer to realtime than it is from the Search API (which I wrote about last week). But the Streaming API wasn’t designed to deliver full coverage results and so has some key limitations for enterprise customers. Let’s review the two types of data streams accessible from the Streaming API. 

The first type of stream is “sampled streams.” Sampled streams deliver a random sampling of Tweets at a statistically valid percentage of the full 100% Firehose. The free access level to the sampled stream is called the “Spritzer” and Twitter has it currently set to approximately 1% of the full 100% Firehose. (You may have also heard of the “Gardenhose,” or a randomly sampled 10% stream. Twitter used to provide some increased access levels to businesses, but announced last November that they’re not granting increased access to any new companies and gradually transitioning their current Gardenhose-level customers to Spritzer or to commercial agreements with resyndication partners like Gnip.)

The second type of data stream is “filtered streams.” Filtered streams deliver all the Tweets that match a filter you select (eg. keywords, usernames, or geographical boundaries). This can be very useful for developers or businesses that need limited access to specific Tweets.

Because the Streaming API is not designed for enterprise access, however, Twitter imposes some restrictions on its filtered streams that are important to understand. First, the volume of Tweets accessible through these streams is limited so that it will never exceed a certain percentage of the full Firehose. (This percentage is not publicly shared by Twitter.) As a result, only low-volume queries can reliably be accommodated. Second, Twitter imposes a query limit: currently, users can query for a maximum of 400 keywords and only a limited number of usernames. This is a significant challenge for many businesses. Third, Boolean operators are not supported by the Streaming API like they are by the Search API (and by Gnip’s API). And finally, there is no guarantee that Twitter’s access levels will remain unchanged in the future. Enterprises that need guaranteed access to data over time should understand that building a business on any free, public APIs can be risky.

The Search API and Streaming API are great ways to gather a sampling of social media data from Twitter. We’re clearly fans over here at Gnip; we actually offer Search API access through our Enterprise Data Collector. And here’s one more cool benefit of using Twitter’s free public APIs: those APIs don’t prohibit display of the Tweets you receive to the general public like premium Twitter feeds from Gnip and other resyndication partners do.

But whether you’re using the Search API or the Streaming API, keep in mind that those feeds simply aren’t designed for enterprise access. And as a result, you’re using the same data sets available to anyone with a computer, your coverage is unlikely to be complete, and Twitter reserves the right change the data accessibility or Terms of Use for those APIs at any time.

If your business dictates a need for full coverage data, more complex queries, an agreement that ensures continued access to data over time, or enterprise-level customer support, then we recommend getting in touch with a premium social media data provider like Gnip. Our complementary premium Twitter products include Power Track for data filtered by keyword or other parameters, and Decahose and Halfhose for randomly sampled data streams (10% and 50%, respectively). If you’d like to learn more, we’d love to hear from you at sales@gnip.com or 888.777.7405.

The Twitter Search API can theoretically provide full coverage of ongoing streams of Tweets. That means it can, in theory, deliver 100% of Tweets that match the search terms you specify almost in realtime. But in reality, the Search API is not intended and does not fully support the repeated constant searches that would be required to deliver 100% coverage. 

Twitter has indicated that the Search API is primarily intended to help end users surface interesting and relevant Tweets that are happening now. Since the Search API is a polling-based API, the rate limits that Twitter has in place impact the ability to get full coverage streams for monitoring and analytics use cases.  To get data from the Search API, your system may repeatedly ask Twitter’s servers for the most recent results that match one of your search queries. On each request, Twitter returns a limited number of results to the request (for example “latest 100 Tweets”). If there have been more than 100 Tweets created about a search query since the last time you sent the request, some of the matching Tweets will be lost.

So . . . can you just make requests for results more frequently? Well, yes, you can, but the total number or requests you’re allowed to make per unit time is constrained by Twitter’s rate limits. Some queries are so popular (hello “Justin Bieber”) that it can be impossible to make enough requests to Twitter for that query alone to keep up with this stream.  And this is only the beginning of the problem as no monitoring or analytics vendor is interested in just one term; many have hundreds to thousands of brands or products to monitor.

Let’s consider a couple examples to clarify.  First, say you want all Tweets mentioning “Coca Cola” and only that one term. There might be fewer than 100 matching Tweets per second usually — but if there’s a spike (say that term becomes a trending topic after a Super Bowl commercial), then there will likely be more than 100 per second. If because of Twitter’s rate limits, you’re only allowed to send one request per second, you will have missed some of the Tweets generated at the most critical moment of all.

Now, let’s be realistic: you’re probably not tracking just one term. Most of our customers are interested in tracking somewhere between dozens and hundreds of thousands of terms. If you add 999 more terms to your list, then you’ll only be checking for Tweets matching “Coca Cola” once every 1,000 seconds. And in 1,000 seconds, there could easily be more than 100 Tweets mentioning your keyword, even on an average day. (Keep in mind that there are over a billion Tweets per week nowadays.) So, in this scenario, you could easily miss Tweets if you’re using the Twitter Search API. It’s also worth bearing in mind that the Tweets you do receive won’t arrive in realtime because you’re only querying for the Tweets every 1,000 seconds.

Because of these issues related to the monitoring use cases, data collection strategies relying exclusively on the Search API will frequently deliver poor coverage of Twitter data. Also, be forewarned, if you are working with a monitoring or analytics vendor who claims full Twitter coverage but is using the Search API exclusively, you’re being misled.

Although coverage is not complete, one great thing about the Twitter Search API is the complex operator capabilities it supports, such as Boolean queries and geo filtering. Although the coverage is limited, some people opt to use the Search API to collect a sampling of Tweets that match their search terms because it supports Boolean operators and geo parameters. Because these filtering features have been so well liked, Gnip has replicated many of them in our own premium Twitter API (made even more powerful by the full coverage and unique data enrichments we offer).

So, to recap, the Twitter Search API offers great operator support but you should know that you’ll generally only see a portion of the total Tweets that match your keywords and your data might arrive with some delay. To simplify access to the Twitter Search API, consider trying out Gnip’s Enterprise Data Collector; our “Keyword Notices” feed retrieves, normalizes, and deduplicates data delivered through the Search API. We can also stream it to you so you don’t have to poll for your results. (“Gnip” reverses the “ping,” get it?)

But the only way to ensure you receive full coverage of Tweets that match your filtering criteria is to work with a premium data provider (like us! blush…) for full coverage Twitter firehose filtering. (See our Power Track feed if you’d like for more info on that.)

Stay tuned for Part 3, our overview of Twitter’s Streaming API coming next week…

You may find yourself wondering . . . “What’s the best way to access the Twitter data I need?” Well the answer depends on the type and amount of data you are trying to access.  Given that there are multiple options, we have designed a three part series of blog posts that explain the differences between the coverage the general public can access and the coverage available through Twitter’s resyndication agreement with Gnip. Let’s dive in . .. 

Understanding Twitter’s Public APIs . . . You Mean There is More than One?

In fact, there are three Twitter APIs: the REST API, the Streaming API, and the Search API. Within the world of social media monitoring and social media analytics, we need to focus primarily on the latter two.

  1. Search API - The Twitter Search API is a dedicated API for running searches against the index of recent Tweets
  2. Streaming API – The Twitter Streaming API allows high-throughput, near-realtime access to various subsets of Twitter data (eg. 1% random sampling of Tweets, filtering for up to 400 keywords, etc.)

Whether you get your Twitter data from the Search API, the Streaming API, or through Gnip, only public statuses are available (and NOT protected Tweets). Additionally, before Tweets are made available to both of these APIs and Gnip, Twitter applies a quality filter to weed out spam.

So now that you have a general understanding of Twitter’s APIs . . . stay tuned for Part 2, where we will take a deeper dive into understanding Twitter’s Search API, coming next week…

 

When you think about it, the stock market is a pretty inspiring thing.

Over the past several centuries, humans have actually created an infrastructure that lets people put their money where their mouth is; an infrastructure that provides a mechanism for daily valuation of companies, currencies and commodities. It’s just unbelievable how far we’ve come and reflecting on the innovation that’s led us here brings to light a common but powerful denominator: Information.

  • When traders began gathering under a buttonwood tree at the foot of Wall Street in the late 1800′s, it was because proximity allowed them to gossip about companies.
  • When Charles Dow began averaging “peaks and flows” of multiple stocks in 1883, his ‘index’ became a new type of data with which to make decisions.
  • In 1975, when the sheer volume of paper necessary for trades became unmanageable, the SEC changed rules to permit electronic trading, allowing for an entirely new infrastructure.
  • And in the 1980′s, when Michael Bloomberg and his partners began building machines (the now ubiquitous Bloomberg Terminals), they tapped into an ever-growing need for more data.

Those are just some examples of the history that is exciting for us @Gnip, because of the powerful signal the market is sending us about social media. Here are some of the more recent signals we’ve seen:

  • The Bank of England announcing they were using Google search results as a means of informing their “nowcasts” detailing the state of the economy.
  • Derwent Capital Markets launching the first social-media based hedge fund this year.
  • The dedication of an entire panel to Social Media Hedge Fund Strategies at the Battle of the Quants conference in London last week.
  • Weekly news articles that describe how traders are using social data as a trading indicator (here’s one as an example).
  • Incorporation of social data into the algorithms of established hedge funds.

In other words, the market is tapping into a new and unique source of information as a means of making trading decisions. And the reason social media data is so exciting is because it offers an unparalleled view into the emotions, opinions and choices of millions of users. A stream of data this size, with this depth and range, has never really existed before in a format this immediate and accessible. And that access is changing how our clients analyze the world and make trades.

We’ve been privileged to see these use cases as we continue to serve a growing number of financial clients. Most exciting to us, as we respond to the market’s outreach for our services, is understanding our pivotal place in this innovation. As the premier source of legal, reliable and realtime data feeds from more than 30 sources of social media- including our exclusive agreement with Twitter- we’re at the center of how firms are integrating this data as an input. And that’s incredible stuff.

Are you in the financial market looking for a social media data provider? Contact us today to learn more! You can reach us at 888.777.7405 or by email.

Have you ever thought “Gnip”. . . well that is a strange name for a company, what does it mean? As one of the newest members of the Gnip team I found myself thinking that very same thing. And as I began telling my friends about this amazing new start-up that I was going to be working for in Boulder, Colorado they too began to inquire as to the meaning behind the name.

Gnip, pronounced (guh’nip), got its name from the very heart of what we do, realtime social media data collection and delivery. So let’s dive in to . . .

Data Collection 101

There are two general methods for data collection, pull technology and push technology. Pull technology is best described as a data transfer in which the request is initiated by the data consumer and responded to by the data publisher’s server. In contrast, push technology refers to the request being initiated by the data publisher’s server and sent to the data consumer.

So why does this matter . . .

Well most social media publishers use the pull method. This means that the data consumer’s system must constantly go out and “ping” the data publisher’s server asking, “do you have any new data now?” . . . “how about now?” . . . “and now?” And this can cause a few issues:

  1. Deduplication – If you ping the social media server one second and then ping it again a second later and there were no new results, you will receive the same results you got one second ago. This would then require deduplication of the data.
  2. Rate Limiting – every social media data publisher’s server out there sets different rate limits, a limit used to control the number of times you can ping a server in a given time frame. These rate limits are constantly changing and typically don’t get published. As such, if your server is set to ping the publisher’s server above the rate limit, it could potentially result in complete shut down of your data collection, leaving you to determine why the connection is broken (Is it the API . . . Is it the rate limit . . . What is the rate limit)?

So as you can see, pull technology can be a tricky beast.

Enter Gnip

Gnip sought to provide our customers with the option: to receive data in either the push model or the pull model, regardless of the native delivery from the data publisher’s server. In other words we wanted to reverse the “ping” process for our customers. Hence, we reversed the word “ping” to get the name Gnip. And there you have it, the story behind the name!

Over 500 individuals recently gathered in New York City for this year’s TechCrunch Disrupt Hackathon. This annual event, fueled by pizza, beer, and Red Bull, features teams of die-hard techies that spend 20 hours, many without sleep (hence the Red Bull), developing and coding the next big idea. Participants compete in a lightning round of pitches in front of a panel of judges with the winners receiving an opportunity to pitch on the main stage at the TechCrunch Disrupt Conference in front of more than 1,000 venture capitalists and industry insiders.

We are excited that one of the apps that was developed at the 2011 Hackathon was powered by Gnip data! We love it when our customers find new and creative ways to use the data we provide.

Edward Kim (@edwkim) and Eric Lubow (@elubow) from SimpleReach (@SimpleReach), which provides next generation social advertising for brands, put a team together to develop LinkCurrent, an app powered by Gnip data and designed to measure the current and future social value of a specific URL. When fully developed, the LinkCurrent app will provide the user with a realtime dashboard illustrating various measures of a URL’s worth — featuring an overall social score, statistics on the Klout Scores of people who have Tweeted the URL, how many times the URL has been Liked on Facebook and posted on Twitter, and geo-location information to provide insight into the content’s reach. Call it influence-scoring for web content.

The hackathon team also included Russ Bradberry (@devdazed) and Carlos Zendejas (@CLZen), also of SimpleReach, Jeff Boulet (@properslang) of EastMedia/Boxcar (@eastmedia/@boxcar), Ryan Witt (@onecreativenerd) of Opani (@TheOpanis), and Michael Nutt (@michaeln3) of Movable Ink (@movableink)– Congratulations to everyone who participated! You created an amazing app in less than 20 hours and developed a creative new use for Gnip data. I highly encourage all of you to check it out: www.linkcurrent.co

Have fun and creative way you have used data delivered by Gnip? We would love to hear about it and you could be featured in our next blog. Drop us an email or give us a call at 888.777.7405.

Follow Gnip


Join Our Newsletter







Archive

Recent Posts
Categories
Tags
Blogroll

Recent Tweets

  • # Check out the cool app built using Gnip data during the Dreamforce Hackathon http://t.co/4FkqmSIq. Congrats to the @comitydesigns team!
  • # Congrats to @atshopr on the great work at TechCrunch #hackdisrupt! http://t.co/LCGtKtN
  • # Gnip commercial customers had smooth sailing today amidst the bumpy ride on the Twitter Streaming API
  • # Happy Birthday @chrismoodycom!
  • # Current data levels using Twitter Streaming and Search APIs vs Twitter Resyndication partner just outlined by @robjohnson #dreamforce @df11