Who's Linking to You?
SEARCH: webmonkey  the web

this article for free. 

1  Who's Linking to You?

Who's Linking to You?
by Jeff Burchell 22 Nov 1996

Jeff Burchell Jeff Burchell is HotWired's resident Unix guy. It's rarely his fault.

Page 1

Q:  Is there any way to tell what other sites are linking to my pages?
- Curious in Castro Valley

A:  There are a couple of ways to do this. The least effective (but easiest and fastest) way is to use a search engine. Both AltaVista and HotBot allow you to search for links in their databases. The second and most effective method uses something called the referer log, which is kept by most Web servers.

Search engines

To see which pages in the HotBot database are linked to www.eff.org, you just have to type link:www.eff.org into the search box. It's that easy.

Referer logs

A referer log keeps track of what page a user was reading immediately before coming to your site. Usually, this means that there's a link to your site from that page. Most Web servers keep referer logs, though the log's exact syntax varies from server to server. (Editor's note: Apparently, the engineer who coined the phrase "referer log" didn't know how to spell it.)

I'm going to explain the referer log generated by the default logging module of Apache, the server software we use at HotWired.

A referer log looks like this:

http://www.blah.com/index.html -> /story/index.html

http://www.svelt.com/burn/ -> /icns/wow.gif

http://www.mom.com/ippy/ -> /index.html

http://www.meep.com/trash/ -> /so/cool.html

That's nice, but what does it mean?

The syntax of a referer log reads like this:

    <pointing page> -> <page pointed to>

So, http://www.svelt.com/burn/ -> /icns/wow.gif means there's a link on the page http://www.svelt.com/burn/ that points to /icns/wow.gif.

So, what are some neat tricks I can do?

Well, if you have access to the referer log, then you probably have access to a Unix box, with its array of text utilities. Here are a couple of common referer-log munging techniques. Each of these is a command (or several commands piped together) that should be typed on a Unix command line.

For the purposes of this demonstration, assume the referer log's filename is ref_log.

What pages link to me?

The command: sort ref_log | cut -d- -f 1 | uniq will return a list of every site mentioned in your referer log.

What it does:

    sort ref_log
    Alphabetizes the list (needed for uniq, later).

    cut -d- -f 1
    Drops everything after the -> in the log, so you just get a list of who is linking to you, and not the pages they're linking to.

    Unique. Deletes duplicate lines from an already sorted list.

How many times has someone been referred from a particular site?

The command: grep www\.meep\.net ref_log | wc -l will return the number of times the site www.meep.net appears in your referer log.

What it does:

    grep www\.meep\.net ref_log
    picks out lines in the file that contain "www.meep.net" (you need to put a backslash in front of the "." character in grep).

    wc -l
    Counts the number of lines (one hit = one line).

Who is linking to a page other than /index.html, and where are they linking to?

The command: grep -v \ /index.html$ | sort | uniq | less

What it does:

    grep -v \ /index.html$
    Get lines that don't end in "index.html." The v means get lines that don't match. The $ means carriage return, indicating that you're only looking at the ends of lines.

    sort | uniq
    Put the list in order, and throw away duplicates.

    Look at the list one page at a time.

That's just the beginning of what you can do to manipulate your referer log. Combinations of these commands can be used to produce almost any kind of output.


Wired News: Contact Us | Advertising | Subscribe
We are translated daily into Korean and Japanese

Copyright 2006, Lycos, Inc. Lycos is a registered trademark of Lycos, Inc. All Rights Reserved.
Your use of this website constitutes acceptance of the Lycos Privacy Policy and Terms & Conditions

[an error occurred while processing this directive]