Last data update: Mon Jan 05 10:04:12 +0100 2009
We have attempted to collect a variety of data about the relative popularity of programming languages, mostly out of curiousity. To some degree popularity does matter - however it is clearly not the only thing to take into account when choosing a programming language. Most experienced programmers should be able to learn the basics of a new language in a week, and be productive with it in a few more weeks, although it will likely take much longer to truly master it.
Browser requirements: sadly, yes, this site requires a browser that supports
Javascript and the canvas tag. I wasn't happy with other options for creating
charts in image formats, and this also saves some bandwidth, since the Javascript is only
downloaded once. Firefox, IE and Safari ought to work, although I haven't tested it with the
latter. Konqueror apparently does not work. The charts are created with Chartr and Plotr.
Note: these results are not scientific. They are interesting nonetheless, and are an attempt to glean as much data as possible notwithstanding the fact that gathering precise data is impossible. We hope you find them interesting as well. Constructive suggestions on improving them are welcome. Contact information is provided at the bottom of the page.
Yahoo provides an API to its search API. Previous versions of these
statistics used numbers from Google, but since Google has deprecated its own API, we utilized
Yahoo's. Searches took the form "language programming"
This is a fairly crude approximation of popularity, however, it's worth including, because all other things being equal, the more popular a language is, the more pages will exist mentioning it.
We used Yahoo's search API for this too, with queries like this: language
programmer -"job wanted" site:craigslist.org
Popular languages are used more in industry, and consequently, people post job listings that seek individuals with experience in those languages. This is probably something of a lagging indicator, because a language is likely to gain popularity prior to companies utilizing it and consequently seeking more people with experience in it.
Utilizing Amazon's search API, we searched for language programming in the
books index.
Books are also a lagging indicator, but a good way to eliminate languages that aren't "established". Haskell may not be widely used in industry, for instance, but it does have a few books written about it, or at least mentioning it.
Data from Freshmeat was obtained from this page: http://freshmeat.net/browse/160/
Freshmeat is a good place to get data on open source projects that have passed the early stages and actually released something. These results most likely reflect differences in what people are paid to work with and what they choose to work with when they can choose. There were no freshmeat projects utilizing Cobol, for example, although it seems to fare decently in the other results.
Data from Google Code Search was obtained using the API to search here: http://www.google.com/codesearch
This is similar to Freshmeat in that it favors open source projects with code floating around on the web. Unfortunately, it seems that the Google Code people don't like Forth much, as it's not on their list of languages. I have renewed the request to add it.
Data from Del.icio.us was obtained with the Yahoo Search API, because the del.icio.us API
really isn't up to the job yet. We did site: searches like language
programming.
This is an interesting bit of data for a couple of reasons. First of all, it seems more linear that the others. It ought to reflect what people genuinely find interesting or useful themselves, rather than what they put out there at random, which means they have an incentive to be 'honest'. The order of the language also seems to change significantly compared to the other data sets.
This is a chart showing combined results from all data sets.
For fun (well, this whole site is "for fun"... let's just say it's extra data we don't include in the main results), we also gathered some data from sites programmers often visit to talk about programming languages. Because of how this industry functions, what people are experimenting with, what they want to use, and what they're paid to use every day are often different things. For the moment, we use three sites:
The data were obtained using Yahoo's search API on the Lambda The Ultimate web site, utilizing the
title: query option in an attempt to eliminate false positives due to the
presence of these terms on every page: Erlang, Lisp, Haskell, Tcl, Python.
This site is firmly grounded in academia, and many participants are associated with programming language research, so more "experimental" or innovative languages are commonly discussed and well regarded. What's interesting about the numbers is that there seems to be a cap, with several languages equal to the maximum. Perhaps it's an error with Yahoo's data - we'll keep an eye on it for future versions of this report.
The data were obtained using Yahoo's search API with the programming.reddit.com web site, and the
title: query option, due to the (c) 2007 at the bottom of every
reddit page that returns lots of false positives for C.
This site has gained in popularity recently, and often has decent discussions of programming languages and their relative merits. The community is generally curious about up and coming languages like Haskell and Erlang. Of course there are also many people working in industry with languages like Java and PHP.
The data were obtained using Yahoo's search API with the Slashdot web site. We use the title: query
option here too, to be fair.
Slashdot reaches a very wide audience, and while it hasn't been quite as popular as more recent arrivals like reddit, it's still a very popular site, and has been around for a while, so is worth including.
Normalized results from the discussion site data sets - these results are not included with the 'normalized results' above. It's interesting to note how languages like Haskell and Erlang are talked about a lot, despite scoring fairly low on the normalized popularity chart above. People are interested in them, but haven't begun to use them on a large scale yet.
With the proper infrastructure in place for gathering and saving data, we intend to update this data on a regular basis, as well as showing historical trends.
Past versions of these statistics used data on prices of keywords in programs like Google's AdSense. We have currently applied for access to this data from Google, and are waiting on approval.
"C" named languages are something of a problem. Queries for "C" tend to return results
for C# and C++ as well. One way of dealing with this would be to run queries like this:
C -C# -C++, however, that unfairly penalizes pages that contain discussions
of both C and C++. The D programming language suffers from a similar problem (it tends to
be confused with "3-d programming", so we tweaked some of the searches to account for
this, and use "D programming language" where appropriate.
More sources of data are always welcome.
We're willing to add other languages, but they should register in all of our existing data sources.
Check out the Google Group for announcements / updates, or if you want to make a suggestion about the survey. We also welcome email to suggestions --- at --- langpop.com. Thanks!