andiFashion Apparel Directory Link Checking Routine Email this page

Right-click Andilinks Link Checking Routine to bookmark or copy URL.


Google
search andiFashion
9/11/01 not
	forgotten




    © 2002-2006 Andilinks
.

How all 75,000+ links are checked, ongoing.Xenu



I use Xenu's Link Sleuth.   I have additional link checkers listed here and have tried several of these but I have had the best (though not perfect) results with Xenu.   The character Xenu is the devil for the Church of Scientology, but that's another story--not about link checking--that can be read on Xenu's website.   I share Tilman Hausherr's dislike for religious cults, BTW.

PING:   v.  To determine if an IP address is accessible. More details on "ping" at  Webopedia.com.

This is a supplement to Xenu's excellent explanation on how to check links with his program.   I provide it for anyone who has a website with many links or has an interest in how I keep all these links current.    The link checker also has some features that don't apply to my particular situation, so it might pay to read the Xenu site too if you're planning to use it for the first time.   There is also an excellent online manual and a Yahoo! user group for the Xenu checker.   The first time you check your links should be the most difficult because you will find many dead ones, especially if you haven't done a global check in a long time.

After the initial learning curve repeat checking on a regular basis keeps the task easy and routine.   Whenever I perform the check I usually think of some way to make the task easier the next time, and if it applies for other's use I will add it to this page.

Please do not run the link checker on andilinks.com URL's because this causes an unnecessary bandwidth drain on this free bookmark site.   I do block IP's of unapproved bot activity including those checking this site with Xenu.

While it is possible to check your pages while online, you will use fewer internet resources and cause fewer headaches for your server's system administrator if you check local versions on your hard-drive.   Using Xenu irresponsibly can affect its usefulness because many sysAdmin's and site owners (myself included) will deny access to IP's or user-agents that repeatedly abuse their servers or sites.   I don't believe a single hit every ten days is abuse, but if a particular user-agent shows up on a web site's logs too often the site owner is likely to ban it.   More on sites that deny the Xenu user-agent further down this page.

The link checker is free, works extremely fast and gives a useful output with the link title listed in the same order that the links are given on the page.   It will skip duplicate links and has options to exclude internal links or any URL's of your choosing.   The result page itself can be double-clicked to recheck the link immediately or at a later time if you save the Xenu file (double-click or right-click the red links to see if they are really dead).   There is a menu item under File to "Retry Broken Links," and by clicking on the heading of the "Status" column you will sort by status--though if your links are not in alphabetical order on your page it will then be difficult to re-sort them back into their original order--unless you have a saved version.

I run the link checker about every 14 days avoiding week-ends because it seems more sites are down for maintenance then.   On a recommendation from someone on a message board I downloaded the Xenu checker for the first time in December 2002 and have been doing this check regularly since then.   I checked about 5,000 links on that first link-check with Xenu and I found over a hundred links that stayed dead for a week.   Since I keep all the links in a database as well as on my site I find it more convenient to make up special HTML pages just for link checking, but I began by checking local versions of the actual web pages.

It is amazing how consistently at any one time about 1% of all the links do not respond yet consistently winnow down to about .1% after a week of checking and correcting (see table below).    Many of the links shown dead on the first pass are domain name server (DNS) errors at my ISP or other packet-switching errors and not dead sites.   A single scan of over 13,000 links takes under 30 minutes, which is amazing.

Over the next ten to fourteen days it is easy to recheck the dead links periodically from the saved Xenu file and many do return to life.   Typically about 40% will return to life in the first few hours and at the end of 14 days usually about 10-30 truly dead links remain (again, table below).   At that time I move the links to an inactive HTML page and delete them from the site.   Some of the error 404's, or other dead links may be typos made while transcribing the URL, a change in the default page name, or some other minor change that was made by the site, so it pays to investigate dead links carefully before deleting them permanently.   Often a Google search on the site title or keywords will tell you all you need to know.   It is very useful to retry the URL truncated to its root directory to see if it is just one page removed and not an entire dead site.

I do not use the report feature but use the application interface itself for locating broken links in my database.  For this I use the menu option "View > Show broken links only" and right-click the red URL to "Copy URL" then paste this to the "Find" box of my database (or HTML editor if working directly on the page).   This ensures that I have the exact broken link as there may be many from the same domain, yet only one broken by as little as a single incorrect character.

One thing that has surprised me is that even a few "big time" sites will remain down for days at a time.   Occasionally I will run the link checker on the "Inactive" page and find one or two that has reactivated.  

There are some sites that deny access to robots or specifically deny the Xenu User-Agent, and you'll learn which those are on the first pass.   This is one reason why the first pass is the most difficult.   Be sure to mark those that deny Xenu so they can be easily excluded or remembered on the next pass.   They appear on the report marked "forbidden" though a normal browser does not return the "forbidden" page-- check them again with a browser because "forbidden" sometimes does actually mean the site is down, possibly for bandwidth overage or some other intentional "deny all."

A very few others regularly return "time out" or "not found" to Xenu yet work fine with IE.   I don't why, but since they consistently show this behavior over time I have concluded they deny just Xenu, probably using some mod_rewrite program but that's a discussion for another page that I haven't written yet.    For now, Google "mod_rewrite" for more on this.

It is only about a tenth of a percent of the links overall that deny or defeat Xenu so it doesn't affect the routine or Xenu's usefulness much once you have identified them.   The Xenu FAQ # 20 goes into greater detail about this.   A URL that ends in a double slash "//" will return a "not found" on Xenu while functioning normally for a browser.   There are only two like this (Greenfield Online, SiteSeer) among my thousands of links.  The Xenu "Starting Point" dialog box will remember the URLs to skip if you enter them there.

These 178 sites (6/24/2005) on Andilinks deny Xenu or defeat the ping in one fashion or another, listed alphabetically:

2Wire, 3D Gamers, A Business Resource Directory for the Internet., A9.com, AAA Computer Search, Akademie.de, Aljazeera.Net English, AllAfrica, Allen, Dean, Arrow, Artcyclopedia, arXiv, Assemblylanguage.Net, Atrio Systems, Auburn's Links, auditmypc, Barnstead International, Barrys Clipart, Bartlett's Familiar Quotations, Behavior OnLine, Belgium, Blogography, Bluetooth   Wikipedia, the free encyclopedia, Brazil, Brown, Mike, Business 2.0, Campus Pipeline, Carper, Glenn E., Chipworks, Choate, Brad, Code Project, Codefixer, Colorado State University Pueblo, Commence, CommWeb, CompInfo, Configate, CoverYourASP, CreatingOnline, Creative Mac, CRM911, Cryonics Belgium, Dave's Links, Decision Craft, delphi3000, Derwent, Dixon, Dr Patrick, dWoz.com, Economics and Statistics Administration (ESA), Edison MediaWiki, Education4Today, EduHound, EHTML Shareware Listings, Ethical Hackerz Security Services, European Physical Journal, Firewall.cx, First 4 Internet, First Crack Podcast with Garrick Van Buren, First Impressions, Firstwave, Fonts Free, fonts.ontheweb, Froogle, Frost & Sullivan, frost.com, Gem of Proverbs and Poems, Globus Alliance, Good Jobs First, Google Advanced News Search, Google News, Google News Search, Google Unclesam, Greenfield Online, Gruppo CRIBISNET, GSAAuctions.gov, H&M, Hanover Company Services, Hawkins Alwin (Nurse and Mac user), History of Computing Foundation, Hobbes' Internet Timeline, Homeschool Community, Houlihan Lokey Howard & Zukin, Howard Forums, Human Nature Review, Hyperborean Moose, The, India Times, Indian Institute of Science, Indonesia, InFocus, InfoSysSec, INTA, Internet Public Library, Teen Division, Internet Tips, INVESTools, Iomega, Keyword Suggestion Tool, Kodak Business Imaging Systems, Labjunk, LightCross , Lisa Loeb, LocalHarvest, Lynk Systems, Magellan IngĂ©nierie, Mangrove Systems , May Department Stores Company, MetLife, Mind  Body Wellness, Mitchell, morons.org, NDTV, NetScreen, Nexans, NEXCOM, NextGen, NORDX/CDT, NS Lookup, Odeo Blog, Only Cgi, Open Source Awards, Opticom, OurTrainingSite.com, Ozark Airlines, Page Computer, Paint Shop Pro, Patni, pepys project, Peters, Tom, Phrase Finder, PocketPC thouhts, Podcasting   Wikipedia, the free encyclopedia, Popular Science, Projistics, Pronto Software, Red Ferret Journal, Red Ferret Links, Reuters, RichLink, Ritz Camera, Rosetta Project, SanDisk, SciGate (IISc), Serif DrawPlus, Serif PagePlus, ServerSide, The, SkyREPORT, Sono Group, Space Telescope Science Inst, Textism, thefreecountry, Thomson Derwent, Top Site Listings, Trodo, Uchronia, USA CityLink, User Interface Engineering, User Interface Engineering     Web Site Usability Book, Voice over IP    Wikipedia, the free encyclopedia, Web Standards Awards, WebmasterWorld, Werbach, WhatYouGet, Wikipedia, Williams Communications, Winthrop Publications e Business Strategy Journal, Wireless Systems Design, Woman's Day, Women's Health Links, Wonderfile, WorldClass, XDE, XMLPitstop, Yes Television, Yushin America, Inc.,

It's a small number percentage-wise, but annoying when they turn up every time to be rechecked.   Since I make up a custom page just for link checking, I omit them from the list automatically and check them manually once a month or so.  

I do not count redirects as errors, in fact whenever possible I link to just root directories omitting "index.htm" or whatever in order to keep my file sizes down.   This also eliminates errors should the site change the name of their default (index.html) page.   Redirects that show up "red" on the Xenu result page can be traced to the original URL by right-clicking to "Properties."   The URL at the bottom of the Properties box will need to be re-checked, repaired or deleted.

For some reason the Xenu program sometimes refuses to close until it is done and occasionally I must kill it with the ctrl/alt/del Task manager "End Task," after saving the results.

There are many reasons why a link should be deleted besides failing this check.   Often the domain goes to a generic directory or domain seller, sometimes the character of the site changes radically, and sometimes the content just goes stale.   So I do a lot of link checking the old-fashioned way, looking at the site."   This is much slower than using Xenu, but unlike the Xenu check, I learn a lot about the content of the sites. And that, after all, is why I began this site in the first place!  


Link Checking notes, blog.

06/05/06   The women's wear pages were just checked while I added thumbnail images with the HTML2JPG bot.   It is an effective way to check links if you also happen to be making images of the pages but would be quite tedious as a link checker alone.   My web host, Futurequest.net is now blocking Xenu globally which has increased the number of "forbidden" returns with Xenu.   I have investigated one server side link checker but since it uses javascript redirects I am reluctant to try it because of the possibility that Google might misinterpret it as a spamming scheme--they are very slow to correct such errors when they happen.   So I continue using Xenu though if more sysadmins block it globally its usefulness will diminish.

03/27/06   I just finished checking the "big table" which currently holds around 29,000 links. The local database is up to date but I'm updating the site while changing its appearance so it will be four or five more days before that's complete.

02/07/06   Link checking is now done on a page by page basis in the women's wear and pop music categories using the methods on this page and these sections in general are being kept up-to-date better than any ever have.   The "big table" is still checked from time to time but as many of the categories become less relevant to the general theme of the site they may be dropped entirely.   The reason they remain at all is that with occasional link checking the pages are still valuable as a reference and do get some traffic.

10/17/05   "Within the week," my estimate from the last post turned into exactly two weeks but I am very pleased to report that 324 dead links have been purged from the "big table" and my routines are approaching their pre-ban quality.

10/03/05   I am still very unhappy with Google and the way this whole banning thing was handled but Andilinks is finally back in the index.   Now Yahoo has dropped Andifashion and Google still has it in its 'sandbox' so there is no longer any reason to have the andifashion domain.    All the pages are now back in the andilinks.com domain.   I have deleted 39 dead links from women's wear and 134 dead links from Pop Music.   The checking for the 'big list' is in process and within the week the entire database will be purged of dead links.    It will take a few days more for the static pages to be updated but I'm pleased to say things are back on track. 

9/15/05   On July 28, 2005 my primary Andilinks site was banned by Google and that domain was dropped from the index.   Needless to say this disrupted many of my routines and has lowered my opinion of the dominant search ****.   I will survive, and continue this site and the original Andilinks.   Since then I have had to check the Google site:domain page visually with a browser for reasons too complicated to go into here, but I am nearly finished with that process for the Andi Fashion site and will resume using Xenu globally again soon.   I have been using Xenu to check individual pages since with the new andifashion domain I have begun an entirely new database file and so the stats table below can no longer be updated.

6/24/05   Back on a 21 day interval though the figures shown on the chart below only include the largest table.   The smaller tables are easier to do manually during spare moments since they less of a chore.   I have updated the list of sites that deny Xenu, but this list too is only from the largest table and doesn't include the pop music or women's wear pages.

6/05/05   A complete schedule for checking all the links is now in place but it will be another week to ten days before I've completed the first cycle.   I am confident that I will be able to keep up leaving no broken link on the site for more than 21 days, less in most cases.   There are the difficult cases where a dead link goes directly to a "parked" position with a domain registrar or a generic directory.   Those are more difficult to catch and I haven't found a satisfactory method for doing so with the less popular pages.   Pages that see heavy traffic are check manually from time to time.

5/03/05   I have completed my relocation from Illinois to New Mexico and have just finished one complete check/deletion.   The moving has interfered with the regular routine but now that I'm getting settled I will resume the regular checks and also have time to improve and document the procedures.   Thanks for your patience.

3/13/05   Today I finished checking the women's wear table (3,410 links) deleting 16 dead links and repairing four.   I don't want to keep adding these smaller totals to the chart below so I must think of a better way to document this progress or just stop publishing the figures.

2/18/05   I combined several tables just for the purpose of link checking today.   It worked well but I'm not sure it saved any time over doing the tables separately spread over fourteen days.   As you can see in the chart below ~4,000 links were checked outside this process and the ongoing check reduced the number deleted this time.   This explanation is incomplete but there's no point in writing up the whole procedure until I have it somewhat more optimized.

2/4/05   The largest link table is now broken into several parts and the link checking and deletion is more of an ongoing process than a bi-weekly chore.   This is easier on me and allows me to pay closer attention to the pages getting the heaviest traffic.   I will periodically post a cumulative total of the links deleted or repaired since the last posting.   The routine is still a bit disorganized and I will need to firm up schedules for the various tables.   Once that is done I will rewrite the text above, it may take several weeks.

1/18/05   It is not practical for me to record the link checking done for the smaller data tables (~8000 links) on this page.   I am in the process of further sub-dividing the largest link table, but while it still exists as the largest table I will continue to record the results here.   Today I deleted 108 dead links from the largest database table (34,524 links), it will take another week for me to update the web pages themselves.

12/04/04   I am in the process of restructuring my database.   The restructuring will allow me to check links on the more heavily visited pages more frequently than every two weeks, while leaving the less popular pages on a two, three or four week cycle.   When I have this completed I will rewrite this page to reflect the changes, though much of the explanation for the old routine will remain recorded here.

11/27/04   I've fallen a week behind in this task, I hope to bring the interval back to 14 days with the next cycle.

9/20/04   Still matching dead link records manually, but I have gotten quite efficient at it finishing the entire 80 record matching task in under 10 minutes.

7/13/04   I have begun spidering entire categories for meta-tag descriptions for the purpose of creating more detailed pages, you may find an example here.   While this is more labor intensive than Xenu it is more effective in producing quality lists.   I am doing two or three categories a day and this will no doubt affect the stats below.   But I will continue the Xenu routine as well, and these two methods together should keep the entire site fresh.   Still no significant change in the procedure of matching Xenu data to Access data though I am getting faster with the existing manual matching.

6/28/04   I matched the broken links in the Andilinks database with the exported Xenu results by copying the broken link titles in the Xenu table to the same field in my main links table, then alphabetizing the list and matching the records manually.   It worked fine--certainly much better than the old method--but I think it is still a clumsy way to do this and I will work on a better solution over the next two weeks.

6/14/04   The number of links in my database has grown to the point where I must export the Xenu results directly to my MS Access database.   I have experimented with it this time and I will write up a procedure here soon, probably before 6/28 when the next check is due.   It is much easier and I wish I had started doing it this way long ago.



Link Checking Statistics
2,275 dead links deleted since 3/08/03.

This chart is being updated sporadically though regular link checking continues, see the blog entries above for a more detailed explanation.
   date      total pinged    dead on
   first pass*
   dead links
    deleted
   links
   repaired
11/18/05
06/24/05
06/05/05
05/03/05
03/16/05
03/06/05
02/18/05
02/04/05
01/18/05
11/27/04
11/07/04
10/18/04
10/04/04
09/20/04
09/06/04
08/23/04
08/09/04
07/26/04
07/12/04
06/28/04
06/14/04
05/31/04
05/17/04
05/03/04
04/19/04
04/03/04
03/20/04
03/06/04
02/17/04
02/03/04
01/21/04
01/12/04
12/31/03
12/21/03
12/10/03
11/30/03
11/20/03
11/10/03
10/29/03
10/19/03
10/09/03
09/29/03
09/18/03
09/08/03
08/29/03
08/19/03
08/04/03
07/24/03
07/14/03
07/03/03
06/23/03
06/13/03
06/04/03
05/25/03
05/15/03
05/05/03
04/25/03
04/15/03
04/05/03
03/27/03
03/18/03
03/08/03
30,555
31,410
n/a
n/a
n/a
n/a
30,726
n/a
34,624
33,612
31,455
30,521
29,249
28,746
27,040
25,180
24,716
23,505
22,961
21,552
19,073
16,709
15,737
15,565
15,306
14,999
14,649
14,147
13,808
13,566
13,232
12,953
12,669
12,567
12,343
12,360
12,243
12,010
11,781
11,589
11,414
11,290
11,037
10,818
10,534
10,333
9,990
9,813
9,557
9,414
9,056
8,909
8,512
8,266
8,052
7,645
7,516
7,152
6,876
6,447
6,274
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
n/a
124
93
75
75
125
80
88
74
72
79
66
88
n/a
138
102
196
225
144
49
54
112
108
68
79
81
68
69
57
51
50
49
51
47
30
30
28
26
23
39
17
36
23
30
15
17
28
18
22
14
17
18
15
11
19
15
20
11
16
7
14
12
5
10
7
16
13
9
10
8
7
8
6
11
8
5
18
10
28
20
26
6
12
9
7
6
9
8
13
8
5
9
8
n/a
n/a
6
9
8
9
6
8
5
7
12
6
12
5
8
14
2
8
10
8
9
7
9
14
12
7
6
7
17
6
12
3
4
3
4
6
6
6
5
4
5
5
10
9
8

*I am no longer recording this figure--the history over four months was consistent.
The figures in the last two columns reflect the result from the previous checking cycle.
The total site link count is . Approximately 20% are duplicates not pinged.

             Top