,           ,
 /             \ 
 `-_---' `---_-'
  `--|o` 'o|--'     TWiki . Main
     \  `  /
      ): :(
::: SearchReferralZeitgeist :::
  TWiki . Main . SearchReferralZeitgeist # Edit # Attach # Diffs # Printable # More :::
Logged in as TWikiGuest


TWiki System

Knowledge Web


Search Referral Zeitgeist

Most webloggers pay an inordinate amount of attention to their referer logs, the logs made by the web server that show the page from which a visitor linked to their site. A significant fraction of these inbound links come from Internet search engines such as google,yahoo, or excite, and the search terms people used to find your site can be as entertaining as they are informative. The German word zeitgeist means, literally, "time ghost," figuratively meaning the "spirit of the time," and search engine traffic can provide a peek at this.

Many ISPs and web hosting providers provide a dry and matter of fact view of referer logs in the form of a "stats page," but if you want to share a view of the inbound search traffic with your readers, pointing them at your stats page is probably not the best way to go about it.

If you have access to the raw log files for your site, and your web server is configured to capture referer information, then you can have a perl script that builds a web page showing the search terms from search engine referrals in a decorative way that visually indicates the number of times a term was used to find your site by altering the text size:


Referer Logs

The first thing you need to do is make sure that your web server log file is recording referer information. There are two approaches to referer logs: combine them with the access log, or keep a separate log just for refererrals. If your web server is already using the Combined Log Format, your log entries will look similar to this: - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 
    "http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)"

The http://www.example.com/start.html part is the referer. The Apache configuration for Combined Log Format is:

    LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"" combined
    CustomLog /var/log/http/access_log combined

If your web server is not already configured to use Combined Log Format, then you should consider making a separate referer log (because who knows what other hacks are already parsing your current log format!). To do that in Apache, add the following to your Apache configuration:

    LogFormat "%h [%{Referer}i] : %U" reflog
    CustomLog /var/log/httpd/referer_log reflog

(making adjustments to suit where your log files are kept). This will result in referer log entries that have the following format: [http://www.example.com/start.html] : /apache_pb.gif

You may have to work with your ISP or web server admin to arrange these things. Referrals from search engines usually have the form


The trick is to locate all of the referral URLs that come from search engines, parse out the search terms, keep track of how many there are and what they're pointing to.

The Zeitgeist Perl Module

That's where the Zeitgeist.pm perl module comes in. Once you know what format your referer logs are in, and where they are kept, and where you want your zeitgeist page to live, you can use Zeitgeist.pm to write a script that does all the work. If you have separate referer logs in the format shown above, then here is a simple script that will take /var/log/httpd/referer_log and build a zeitgeist page /home/user/www/zeitgeist.html:

    use Zeitgeist.pm

    my $reflog = '/var/log/httpd/referer_log';
    my $zeitgeist = '/home/user/www/zeitgeist.html';

    my $z = new Zeitgeist();
    $z->readlogs( files => [$reflog] );

You could run this periodically by hand, or make a cron job which runs it periodically to keep it up to date.

If your logs are in Combined Log Format, you will have to make a couple of adjustments

    my $reflog = '/var/log/httpd/access_log';
    my $z = new Zeitgeist( refpos => 10, targetpos => 6);

This indicates the position of the referer URL and the target URL in the log output, counting "words" separated by whitespace, starting at zero. There are other situations that the Zeitgeist.pm module can deal with. For example, say you rotate and compress your log files periodically, but you still want to include the search terms from the compressed logs. Also, you would prefer to have '+' signs separating the search terms in the HTML output (instead of the default · character). For this latter option, you would use the separator option to the Zeitgeist::new method:

    my $z = new Zeitgeist( refpos    => 10,
                           targetpos => 6,
                           zcat      => '/usr/local/bin/zcat',
                           separator => '+');
    $z->readlogs( files => [$reflog, "$reflog.1.gz", "$reflog.2.gz"] );

Zeitgeist.pm automatically tries to decompress files ending in '.gz'. You may need to tell Zeitgeist where your gzip binary is located. You can also pass readlogs an open FileHandle object, if you need to do something more complex to get at your referer information.

    $z->readlogs( handle => new FileHandle("/my/groovy/hack |") );

Since the output is pretty plain, you can sandwich it between header and footer HTML code by supplying Zeitgeist::new with the names of files to include at the top and bottom of the output:

    my $z = new Zeitgeist( header => "/home/user/www/z_head.html",
                           footer => "/home/user/www/z_foot.html");

Alternatively, you could include the zeitgeist.html file into another page using a Server-side Include:

    <!--#include file="zeitgeist.html"-->

See Also:

-- JimFl - 26 Mar 2003

Attachment: Action: Size: Date: Who: Comment:
Zeitgeist.pm action 12222 27 Mar 2003 - 06:13 JimFl The Zeitgeist.pm perl module

# Edit menu  

Topic revision r1.2 - 27 Mar 2003 - 21:31 GMT - JimFl

The contents of this page are licensed under a Creative Commons License
Creative Commons License