Q: Is there any way to tell what other sites are linking to my pages?
- Curious in Castro Valley
are a couple of ways to do this. The least effective (but easiest and fastest) way is to
use a search engine. Both AltaVista and HotBot allow you to search for links in
their databases. The second and most effective method uses something called the referer log, which is kept by most Web servers.
To see which pages in the HotBot database are linked to www.eff.org, you just have to type link:www.eff.org into the search box. It's that easy.
A referer log keeps track of what page a user was reading immediately
before coming to your site. Usually, this means that there's a link to your site from that
page. Most Web servers keep referer logs, though the log's exact syntax
varies from server to server. (Editor's note: Apparently, the engineer who
coined the phrase "referer log" didn't know how to spell it.)
I'm going to explain the referer log generated by the default logging module of Apache, the server software we use at HotWired.
A referer log looks like this:
http://www.blah.com/index.html -> /story/index.html
http://www.svelt.com/burn/ -> /icns/wow.gif
http://www.mom.com/ippy/ -> /index.html
http://www.meep.com/trash/ -> /so/cool.html
That's nice, but what does it mean?
The syntax of a referer log reads like this:
<pointing page> -> <page pointed to>
So, http://www.svelt.com/burn/ -> /icns/wow.gif means there's a link on the page http://www.svelt.com/burn/ that points to /icns/wow.gif.
So, what are some neat tricks I can do?
Well, if you have access to the referer log, then you probably have access to a Unix box, with its array of text utilities. Here are a couple of common referer-log munging techniques. Each of these is a command (or several commands piped together) that should be typed on a Unix command line.
For the purposes of this demonstration, assume the referer log's filename is ref_log.
What pages link to me?
The command: sort ref_log | cut -d- -f 1 | uniq will return a list of every site mentioned in your referer log.
What it does:
Alphabetizes the list (needed for uniq, later).
cut -d- -f 1
Drops everything after the -> in the log, so you just get a list of who is linking to you, and not the pages they're linking to.
Unique. Deletes duplicate lines from an already sorted list.
How many times has someone been referred from a particular site?
The command: grep www\.meep\.net ref_log | wc -l will return the number of times the site www.meep.net appears in your referer log.
What it does:
grep www\.meep\.net ref_log
picks out lines in the file that contain "www.meep.net" (you need to put a backslash in front of the "." character in grep).
Counts the number of lines (one hit = one line).
Who is linking to a page other than /index.html, and where are they linking to?
The command: grep -v \ /index.html$ | sort | uniq | less
What it does:
grep -v \ /index.html$
Get lines that don't end in "index.html." The v means get
lines that don't match. The $ means carriage return, indicating that you're only
looking at the ends of lines.
sort | uniq
Put the list in order, and throw away duplicates.
Look at the list one page at a time.
That's just the beginning of what you can do to manipulate your referer log. Combinations of these commands can be used to produce almost any kind of output.