So a while back I started to think about how to get data on individual users of Weblogs in order to analyze it for trends in usage, abandonment and demographical information.  Seb has some stuff here, but it is not detailed enough to create a data driven graphs to detect trends and other patterns.  Phil's email was good for some single data points but as well, but again no historic data driven trend analysis is really possible from it.  Plus while the method used to obtain the data was brilliant, it is less then perfect in regards to accuracy and precision (as he notes). Ross Mayfield posted a snapshot of some LiveJournal Stats that resulted in Joe Jenett posting a graph of the Ageless project data (but the ageless project notes that the sample and data quality are not pristine). Blogger has told everyone that they have 1 million registered users, but they haven't been kind enough to share how many blogs exist  (plus, users do not map to blogs in a 1:1 manner) or how many are "active" in some sort of way.  I emailed Dave Winer about Radio usage but never heard back from him. From a wired article in August 2002 Dave is quoted as saying that there are about 50,000 Radio Blogs with a growth of rate of 4000 a month (that seems pretty high to me), but again no mention of "active" vs. abandoned or any type of demographical information.  Finally, MovableType, Bloxsom, GreyMatter and the host of other blogging tools out there don't appear to have any discernable method to get usage statistics (other than doing Phil's trick google searching for key phrases inserted by the blogging tool).

Thank God for the Livejournal folks and their handy dandy stats page (also in raw text format).

Most stats are updated on a daily basis and while some are live.  The stats page include such goodies as:

For those of you who don't know, Deadjournal runs the Livejournal backend (open source goodness) and thus also outputs the same Stats page.  After examining the Deadjournal stats page for a while I noticed a little CVS entry flying around on the their version of the page. I started to get giddy inside and downloaded winCVS to see if Livejournal entered their Stats data into CVS on a regular basis.  After getting re-acquainted with CVS I downloaded the LiveJournal tree and went hunting for a a stats repository (specifically the raw .txt file).  Alas, I struck out...not to say they don't have it somewhere, but I didn't find it when I went hunting...

But then I got another grand idea...since Livejournal and Deadjournal are public websites why not use the all seeing and all archiving Way Back Machine to get the great Livejournal provided data from history.  Almost a grand slam (imho) except for the following:

Nevertheless I am happy to say I did get some data assembled, and it does show some interesting  trends.  I think some of the trends displayed by this data can/will be generalized to all blogs, but as Jeff Jarvis points out in the comment left on Ross Mayfield's page, Livejournals have unique aspects that set them apart from other blogs.

Now that the method is out of the way lets get to the fun stuff...data.  Here is the raw data I gathered and inputted into excel (that was fun...not).  Note that the second worksheet in the excel workbook also has miscellaneous data points from a variety of sources.  If you use the data please attribute it appropriately. Below are some preliminary charts that I found interesting.  I called my dad  for some statistics brush up (since it is a part of his working set and I haven't thought about it in nearly 6 years).  While the data is very sparse for 2002  (1 data point) and clumped for 2000 (4 data points in 4 months) the graphs do show some interesting stuff.

The graph above shows the total users that have signed up for a Livejournal and the total number of users that have posted something in the last 30 days.  I like Phil's definition of active (2 posts on 2 different days in the last month) but we have to work with what data we have.  Thus, active users from here out will be the number of users that have posted at least once in the last 30 days.

The slope of active users is less then that of total users over time, or stated differently, delta between sign ups and active users is increasing.  So while more people are starting Livejournals, fewer are remaining active after creation.  One plausible explanation is that as the "gospel" about blogs/journals has become more mainstream the rate of people trying them has increased but the percentage of converts is decreasing.  Another possibility is that people are signing up for multiple tools in order to select the one they like most but with Livejournal's closed nature this seems a bit unlikely to be reflected in their stats.  I tend to (want to?) believe the abandonment of blogs/journals is natural and that no one factor can single handedly account for the trend.

What does this mean for other blogging tools? 

I would venture that this trend is similar for blogger.com blogs, pitas blogs and other free browser based blog tools. However, it is important to note that Livejournal is a closed community that requires an invite or $$$ to join while blogger/pitas/others are 100% free. Thus it is very likely that the delta between total signups and active users is even more pronounced within blogging systems with painless sign up process and no barrier to entry (a  high profile can't help either...).  Radio requires users to buy a seat after its free 30 day trial period so I would guess its total signups vs. active users delta is significantly larger then Livejournals from a percentage point of view.  But that delta is heavily impacted by economics factors so it is not an apples to apples situation.  For the Host it yourself tools (MT, GM, Bloxsom, etc) I doubt that this trend is as pronounced because of the initial time investment it takes to get them up and running.

The graph above shows that females are signing up for Livejournals at a faster pace then males. In 2001 there is a more dramatic jump in female users however, that could influenced by the lack of data during that time period.  One additional data point that can be brought in is that 74.1% of Deadjournal users are female (based on 4/3/2003 stats page).  Any set of factors could be influencing why females are signing up at a faster pace for Livejournals, but looking at the raw data it is clear that female users have always outnumbered male users in the system.  Perhaps there are aspects of Livejournal that specifically appeal to female users and/or marketing mechanisms (e.g. friend of a friend, branding) that promote higher adoption among females. 

What does this mean for other blogging tools?

I really can't draw any conclusions without usage data from other tools. 

This sexy looking graph confirms (over time) what the daily snap shots of Livejournal data show...the tool is predominately used by people under the age of 35.  It also shows how little data from 2002 we have.  Now I am going to go out on a limb here and throw out something that may or may not have any truth.  It is an observation that I need to put some time into gathering data to support or negate, but it feels "right"  when I reason it out in my mind.  In my previous job I focused on the needs and wants of the NetGen Audience.  Essentially that is the set of kids that are 13-24 and grew up with the internet and take being online as oxygen.  This group is constantly exploring, embracing, and popularizing trends online and in the physical world. Thus they are the holy grail for marketers and even more importantly for technologies that require viral adoption (think email, im, p2p networks, etc).  This is a gross simplification so go get more info by reading this or searching for "net generation" and "trend adoption models". 

I hope the part about viral adoption got your attention.  In the late 80s and early 90s the NetGen embraced email and helped push it into the mainstream.  Granted business reliance also significantly propagated email to the mass consciousness, but NetGeners (especially older ones with their college email address and newly geographically distributed friendships to maintain) very important influencers.  Then in the mid 90s the NetGen jumped on the IM (instant message) ship and really started to internalize it.  IM is quickly becoming a standard for corporations and people of all ages (including my mom!) and again it is hard to dismiss the NetGen's early adoption and key role in making it a mass trend.  Now I could go on for P2P networks, SMS, and other technology that was embraced by them, but you get the idea.  Lets say there are lots of NetGeners interested in blogging/journaling (as the graph above shows)...this could be an indicator or perhaps a key aspect of the future growth and impact of blogging/journals.  Granted that this is a weak, at best, analysis but it has been swimming around my head for a while and I had to let it out.  Actually, now that it is out I might have to go look for some data and try to back it up....:)

What does this mean for other blogging tools?

I would be willing to bet that diaryland, xanga, and deadjournal have similar trends in the age of their users.  This opinion is based on the way they are presented as well as how they work (community focused features similar to livejournal).  Blogger is a wild card as they are fairly mainstream and appear to attract users from all sorts of backgrounds. Same goes for Pitas.  My gut says that Movable Type, Bloxsom, and Radio are used by an older crowd.  Why?  One reason is they are more geeky (powerful) to setup, use, tweak.  Another is the type of marketing they have done...definitely not going for the 13 year old user like dairyland does.

For this graph I calculated the average age from the raw Livejournal data and added some sub categories of the NetGeners and NetGeners + PCgeners (the group that grew up with un-networked pc as oxygen). The graph shows a perplexing movement in average age of Livejournal users.  Why did the age drop until late 2001 and then start to grow again?  Nothing obvious jumps to mind, but I was showing this work to my roommate and we came up with two ideas.  The 9/11 tragedy might have had some influences as it was created a lot of buzz in about blogs/journals in the mainstream consciousness.  The other possible influencer could be the dot com fall out.  Or maybe this just mean absolutely nothing...that would serve me right for trying to find patterns in data.   

One interesting observation to note is the relative stability of the NetGen average age.  While the total user base dropped ~2.5 years in average age the NetGen dropped ~1 year in average age.  This could of course be explained by significantly smaller range of ages encompassed by the NetGen, but then again the slopes before the upturn at the end of 2001 are noticeably different.  Furthermore the NetGen + PCgen average is very similar to the total user base so something appears to be afoot.

What does this mean for other blogging tools?

Not a clue, other than it would be fantastic to see if this trend existed outside of Livejournal users. If the trend exists across all tools it would be a very interesting sociology study to figure out why it happened. 

Future Work

There is still a lot of crunching that needs to be done, but I am excited about the data so far.  Hopefully other eyes looking at it will help uncover more relationships, trends and other sources of data to be included for future analysis (maybe others have saved a stats page on their local drive?).  Some ideas I would like to explore:

After getting some feedback from the world on this, doing some more crunching, and relearning some statistics I plan on try to get this work published in an academic journal.  We are seeing the world embrace blogs and helping build the academic research foundation for them is very important, imho. 

By: Neel "Bubba" Murarka | me@neelbubba.com | www.neelbubba.com on 04/09/2003

Updates:

4/17/2003: Got a comment on this?  Click here to post it.  Also, this was posted to lj_biz by Evan Martin today.  Lots of new readers as a result. 

4/18/2003: Correction from Joe Jenett.

Copyright 2003 Neel Murarka.