There is one scenario I see play out again and again on Web Design-L, css-discuss, and countless other forums. Newbie Designer posts a link to a test page, asking for help because it doesn’t behave as expected in this or that browser. Guru Designer replies, telling Newbie Designer that their page doesn’t validate, and that they should go validate their page before asking such questions. There is no further discussion; no further replies are posted; no one else is willing to help.
Why does this happen? Why won’t we help you?
The short, smart-alec, Zen-like answer is that we are helping you, you just don’t realize it yet. The full answer goes like this:
Validation may reveal your problem. Many cases of
it works in one browser but not another are caused by silly author errors. Typos like missing attribute values can cause browsers to crash; validation catches these typos. Simple errors like missing end tags (such as
</div>) or missing elements (such as
<tr>) can cause different problems in different browsers. Small mistakes like this are difficult for you to spot in your own code, but the validator pinpoints them immediately.
I am not claiming that your page, once validated, will automatically render flawlessly in every browser; it may not. I am also not claiming that there aren’t talented designers who can create old-style
Tag Soup pages that do work flawlessly in every browser; there certainly are. But the validator is an automated tool that can highlight small but important errors that are difficult to track down by hand. If you create valid markup most of the time, you can take advantage of this automation to catch your occasional mistakes. But if your markup is nowhere near valid, you’ll be flying blind when something goes wrong. The validator will spit out dozens or even hundreds of errors on your page, and finding the one that is actually causing your problem will be like finding a needle in a haystack.
Validation may solve your problem. HTML is not
anything goes; it has rules of how elements can be used and combined. Browsers are written to understand these rules and render your page accordingly. Browsers also have special-case logic to deal with various types of invalid markup, including vendor-specific tags and attributes, illegal combinations of block-level and inline elements, and overlapping elements. Different browsers create different internal representations of this so-called
Tag Soup markup, which can lead to unexpectedly varying results when they go to apply styles or execute script on your page.
Ian Hickson illustrates these differences. Dave Hyatt, one of the developers of Apple’s Safari browser, talks about the
residual style problem caused by improperly nested elements. As Dave’s example shows, this doesn’t just affect CSS-based pages; it affects pure-HTML pages too.
I am not claiming that validation is a magic bullet that will automatically solve all your web design problems; it is not. Designers still cope with lots of cross-browser and cross-platform compatibility problems with valid markup. But validating your pages eliminates a vast array of potential incompatibilities, leaving a manageable subset of actual incompatibilities to work with. Which leads me to my next point…
Valid markup is hard enough to debug already. Debugging Tag Soup is an order of magnitude harder. It’s also not terribly rewarding. Some of us are good at it; many of us have been around long enough to have dealt with it at one point or another. But it’s not where we like to focus our energies. There’s nothing aesthetically pleasing or intellectually satisfying about helping a hack-and-slash coder tweak their shitty markup and bludgeon a few browsers into submission. We know it’ll only break again next week; we’ve been there, we know what happens next week. We know you’re just coding on borrowed time.
And did I mention that debugging this stuff is hard? There’s a lot to keep track of, even when you do everything right. There are bugs in Windows browsers, bugs in Mac browsers, bugs in browsers old and new, bugs in Opera, bugs in Netscape, bugs in MSIE too. Dr. Seuss could make great poetry out of all the bugs we cope with in our valid, standards-compliant pages. And on top of that, you want us to keep track of the near-infinite variety of bugs that could be triggered by your Tag Soup? We don’t have that kind of time, and the time we do have is better spent elsewhere. Which leads me to my final point…
Validation is an indicator of cluefulness. There are a lot of people who need our help, and there are relatively few of us who have the combination of time, expertise, and inclination to debug the work of strangers for free. It’s those pesky power laws at work again: we simply can’t help everyone who asks. Like a Human Resources department that gets 500 resumes for every open position, we have to filter on something, and validation has proven to be a good filter. It is possible — in fact, it is almost inevitable — that this will keep us from interacting with otherwise talented designers who would have turned out to be great friends or professional associates later in life, but that’s the way it goes. It might also be the case that, out of 500 applicants, the perfect candidate for that open position is the one with 5 spelling mistakes on their resume. But you can’t interview everyone. You have to filter on something.
Why is validation a good filter? Because nobody makes valid pages by accident. If you come to us and say,
Hey, I have this page, it’s valid XHTML and CSS, and I’m having this specific problem, well OK, you’ve obviously put some work into it, you’ve met us more than halfway, let’s see what we can do. But if you come to us and say,
Hey, I slapped together this page and it works in X browser but now my client says it doesn’t work in W, Y, and Z, they must be buggy pieces of shit, well… you catch more flies with honey, and you get more help with valid markup.
My friend has a graphic of Bart Simpson writing “I will check Google and the manual before asking for help online.”
If I find really obvious questions that they should figure out themselves I just ignore the email and let the rest of the list deal with it.
— Jake of 8bitjoystick.com
Agreed 100%, Mark.
Developing a website without validating is like running around with a blindfold on. As soon as you ask for directions, don’t be surprised if the only answer you’ll get is “take your blindfold off”.
You’d be amazed at the number of people who take offence to being told this, though.
One important but invisible visitor to your web site is the search engine spider. This is a mindless robot that decides how highly your pages will be when someone does a web search. Invalid markup doesn’t just confuse some browsers – it can confuse a search engine spider and cause your page to be listed on the 27th page of results or possibly be not listed at all.
— Guy Macon, Electrical Engineer
There is a nice logic in what you say. And I agree totally to a point.
There is a problem though, at a certain level of knowledge, that you need a little help to understand what validation is and how to use a validator.
I had a real steep learning curve when I started doing web standards stuff, and frankly I got a bit peeved at RTFM type comments. Sometimes you don’t have quite enough knowledge to know WHY that part of the manual is of use to you.
Oops..sorry for the duplicate pings. I should be more careful with trackback.
100% agree with the debugging part. It’s really difficult to keep track with the Standard vs Implementation. Lots of ignorant web designer would care only how their design displayed on IE instead of syntax validation.
I have just made my site that uses “Valid” code and have experienced all the things you mentioned. Excellent post.
Here was my take on it, it is entitled ‘They are the few, we are the many’ check it out: http://www.smithpaul.com/archives/000033.html
— Paul Michael Smith
Though I don’t mind going through and explaining the errors reported by the validator. It’s definitely that someone has at least made an effort that makes a difference for me. Willful ignorance drives me barmy.
Well, looks like this is fast turning out to be the most cited URI of the week in alt.html, alt.html.critique and comp.infosystems.www.authoring.html.
Congrats, this is the best written “Why Validate?” answer I’ve come across.
— David Dorward
Wow, this is so comp.lang.perl.misc from like 5 years ago. Actually, still could be like that now, I should say “comp.lang.perl.misc when I stopped reading it.”
This is an excellent answer to a question I’ve been asked my so many arrogant web developers. More often than not, however, the “tag soup” coders fight back (as if validation advocacy were an assault on their species), making out that validation is pointless because of browser bugs. Some of these are professional designers, and I pity their clients who are paying for sites that are, in effect, incomplete.
The expression is also commonly heard as “You catch more bears with honey”. Just a variant.
— Phillip Harrington
This is why I re-tagged my weblog. I’ve been trying to find a good subtitle for it, and since I fight so much to keep it valid, that was why I picked my current tagline.
While I find validators useful for debugging, actually getting a page to fully validate is often unnecessary. Certain standards are just too picky about every image needing an ALT tag, single quote vs. double quote, etc. Browsers and spiders tend to be much more forgiving with HTML.
Meanwhile, getting multiple browsers to respond the same way to CSS is always a challenge even when completely valid.
In my undergraduate days, I brought a print out of an assembly program to my instructor to see if he could help out. He glanced at it and said “Of course it isn’t working, it doesn’t have any comments in it.”
Amazingly, 5 minutes into manually writing comments beside each chunk while he watched, I had an “oh shit moment”, spotting the mistake that I’d spent over an hour looking for.
— Jonathan Peterson
Well, if we don’t validate, and don’t understand the W3C’s confusing explanations of why, then what do we do then?
# Line 58, column 3: document type does not allow element “ul” here; missing one of “object”, “applet”, “map”, “iframe”, “button”, “ins”, “del” start-tag
Huh? How else am I suppose to use a UL tag? That looks perfectly legit to me.
Is it an obtuse reference to the BR tag that follows? Or that UL tags cannot be nested inside P tags?
Also is it correct for the validator to choke on the use of an ampersand in URLs?
I would RTFM, and I did RTFM, and many of these “bugs” in my XHTML don’t make sense to me.
— Joe Grossberg
(Yes, I know I have other errors in my XHTML, but I want to be 100% compliant, not 95% compliant.)
— Joe Grossberg
On the other hand, sometimes that cliched answer is just an excuse for the guru to feel superior and put down a lowly peon.
It’s often fun for people to get off on being rude, but it rarely is helpful. Let’s not pretend that everyone who dismissively posts RTFM is doing it out of the goodness of their hearts. Sometimes it’s done just because the gurus are assholes.
Joe: P can not contain UL. Close your P before starting your UL.
Joe – address the problems that make sense, and you’ll find the ones that don’t sometimes go away. Your block-level UL could be showing up inside of an inline element, causing the validator to choke. Or something else completely: your tags are being parsed by the validator – not a good sign.
Actually, looking further, it’s probable the ampersands in your URLs are causing other validator problems elsewhere. Replace each & with an & and try again. You might cut your errors in half or more.
(yes, you have to do this for proper validation. See http://www.mezzoblue.com/cgi-bin/mt/mezzo/archives/000070.asp)
— Dave S.
dammit. stripping. recap:
your <MT> tags are being parsed by the validator – not a good sign.
Replace each & with an & and try again.
— Dave S.
See also: Eric Raymond’s classic “How to ask questions the smart way”
Thank you Mark and Dave S.!
(Note: Dave’s URL was linked incorrectly (it included the “)” character at the end. Is this an MT regexp bug, and should we submit it somewhere?)
FWIW, this is an example of something I’ve learned — it helps to preface newbie questions with something honest like, “I’ve tried reading the documentation, and am still confused.” It’s often preempts the RTFMs.
— Joe Grossberg
PS: I emailed Joe about this privately, but for the record, that error message he got? It’s the validator’s completely stupid and worthless way of saying “hey, you forgot to close your paragraph tag before starting that unordered list.”
I don’t think it’s Joe’s fault that the validator message confuses him. It’s an opaque error message to nearly anyone even if they try to read the fucking manual.
Also: Yes, Joe, you need to encode your ampersands by writing them as & instead of & — this won’t break anything, even though it seems at first like it should. In general, this is a good habit to get into, although most of the time this doesn’t cause any real problems because browsers rarely choke on this.
Er writing them as &.
Sorry. (Don’t replace them with &&)
“(Note: Dave?s URL was linked incorrectly (it included the “)” character at the end. Is this an MT regexp bug, and should we submit it somewhere?)”
No, this is a user error. Parentheses are allowed in URLs and MovableType shouldn’t be assuming that they don’t belong there. The solution is that whenever a user is adding a URL in something which will automatically recognize a URL by pattern, the URL should be surrounded on both sides by whitespace.
Like this: ( http://kynn.com/ )
Not this: (http://kynn.com/)
“In my undergraduate days, I brought a print out of an assembly program to my instructor to see if he could help out. He glanced at it and said “Of course it isn?t working, it doesn?t have any comments in it.”"
Wise professor. More Web developers should use comments.
I try to (but often fail) use a comment at each closing </div> <!– class=”navbar” –> — this trick alone reduces debugging layout.
Kynn, hope you don’t mind, I edited one character of your comment so your ampersand advice reads how I’m sure you meant it to read. Ironically, this is not an ideal forum for discussing HTML markup and entities, since I strip HTML tags but don’t auto-encode ampersands.
Anyway, if Newbie Designer comes back and says “I tried to validate and I’m down to one error and I can’t figure out what this error message means”, they usually get a much more receptive audience. Joe asked a reasonable question in a smart way, and he immediately got a helpful answer.
As someone else pointed out, it’s the willful ignorance (“No! I won’t validate! I’ve been coding pages this way since you were in diapers!”) that grates on us.
“More Web developers should use comments.”
I agree, wholeheartedly, but inexperienced developers often use good comments to justify shit code:
“Good code is its own best documentation. As you’re about to add a comment, ask yourself, ‘How can I improve the code so that this comment isn’t needed?’ Improve the code and then document it to make it even clearer.” — Steve McConnell “Code Complete”
Since HTML is a markup language, and not an imperative one (it doesn’t *do* anything), comments are still quite valuable. (Note to self — add comments to my MT templates.)
— Joe Grossberg
“As someone else pointed out, it’s the willful ignorance (‘No! I won’t validate! I’ve been coding pages this way since you were in diapers!’) that grates on us.”
That reminds me of this one time we were playing basketball with an older guy at the gym. He took three steps on layups (an obvious “travelling” call), but insisted it was a legal move because he “always played that way.”
— Joe Grossberg
I agree that the validator’s messages are confusing.
Get valid is particularly tough when you’re dealing with a dynamic system where you didn’t build the templates.
Even little things like not being sure which version of HTML you’re “closest” to makes this confusing.
Is there a good reference anywhere that maybe ties validator errors to the version history of markup? (“Your document is basically HTML3.2, but this particular tag includes parameters that were added only with v4.01″ would be cool. Hmmm, even an app where you could enter a little chunk of HTML and it would tell you which versions it’s valid against would be pretty cool, and probably lots easier to build.)… Oh LazyWeb!
‘ “Good code is its own best documentation. As you?re about to add a comment, ask yourself, ‘How can I improve the code so that this comment isn?t needed?’ Improve the code and then document it to make it even clearer.”?Steve McConnell “Code Complete” ‘
I disagree with this advice. Comments should be used even when they’re not deemed “necessary” — because of course the person writing the comment will be able to see, at the time he’s writing it, what the code does. Just as it’s hard to edit your own work (because you know what you meant to say), it’s also hard to judge whether or not your own code is obvious. So err on the side of commenting, and 3 years down the line you won’t be scratching your head wondering why you ever did things that way. :)
(I also use comments like <!– ## edit this ## –> to remind me what may need to be updated on a manually maintained page.)
I was emailed yesterday by someone asking for help in getting a web page to validate. I refused assistance when I saw this at the top of the page:
(slightly altered to remove hyperlink)
I cannot believe that anyone uses MS Word to create web pages, much less expect them to validate.
— Simon Jessey
bill, it appears that parts of that code were meant to be XHTML, and other parts were meant to be HTML. briefly:
1. end tag for element “HEAD” which is not open: refers to the fact that your <link> is attempting to close it self, which is bad foo. changing <link … /> to <link …> should fix that.
2. document type does not allow element “BODY” here: this is actually derived from the same problem #1 suffers.
3. document type does not allow element “UL” here; missing one of “APPLET”, “OBJECT”, “MAP”, “IFRAME”, “BUTTON” start-tag: <small><ul> is also bad foo. sounds like a good place to utilize some CSS (see your favorite reference site for some examples); if you really like <small>, make sure you use it inside your <li>s.
4. there is no attribute “WRAP”: doesn’t get much simpler than that… delete ‘WRAP=”virtual”‘; note that this may have side-effect in older browsers.
5. end tag for “DIV” omitted, but its declaration does not permit this: <div> is not like <p>, in that you must close it. adding “</div>” before “</form>” should do the trick.
note: whereas your code would now validate, it qualifieas as what mark describes as “tag soup.” to change your code to semantic markup (blah blah blah) is quite an endeavor, but the benefits are many. (case in point: this site.)
— louis bennett
Wow, Louis, thanks loads! I’ll try to handle those during the upgrade I’m working on this week.
“I cannot believe that anyone uses MS Word to create web pages, much less expect them to validate.”
There are lots of people out there who want to make Web pages, who know nothing of this semantics of which we speak, who look at Word and think “oooh, I can make this into a Web page.” I work with many of them. ;)
It is possible to get structural markup out of Word, but it takes some finessing. Since I do a lot of what I call “conversion projects,” I’ve found a method that works w/out driving me crazy.
1) Go through the source document to use as many Word styles as possible. (I’m encouraging people to do this themselves, too.)
2) Use the Word Filter, or whatever they call it, to generate HTML, instead of the usual save as HTML.
3) Run through Dreamweaver’s clean up Word HTML command. Depending on what I’m doing with it, there’s also a few things that I do in HTML-Kit or Dreamweaver’s source code mode if necessary.
4) Delete Word styles and substitute my own.
5) Validate and tweak. (I know my way around the error messages now, but I’d say there’s a market out there for a validator-to-natural-English translator.)
It may sound like a bit of work, but it’s easier than retyping, and sometimes even easier than cut and paste, given the source material. (as an example, I’m rather proud of how our policy manual – http://www.pierce.ctc.edu/policy/ – turned out.)
You’re likely to get some heckling for your table-based layout, but the page is nicely styled, and really pretty minimal.
I think it’s unreasonable for everyone to have to learn an increasingly complex system to publish.
I think yours is a good hybrid approach.
I think it’d be better if there were nicer tools so that such ad-hocery weren’t necessary.
— Jeremy Dunck
one of the reasons why word’s HTML is known to be so poor is that many people’s word docs have no semantic value. quickly enough, WYSIWYG (that’s “what you see is what you get”) quickly becomes WYSIAYG (“what you see is *all* you get). elaine, this is where your comment, to “use as many Word styles as possible” comes into play: as a word processor, Word may not have a need for semantics, but as it offers a style system, the migration is more clean. i, too, can vouch for Dreamweaver’s good cleaning of Word HTML. (my only complaint was that i had to give Dreamweaver about 300 megs of memory to clean a 1 meg Word doc’s HTML output (which, as i recall, was about 2 megs of HTML).)
— louis bennett
Too picky about alt attributes? So I guess then you don’t care that people using a screenreader on your page have to put up with ‘image. image. image. image.’ then?
The Office 2000 HTML filter mentioned is here:
but it’s innexcusable to need an extra $300 worth of software to get something decent out of the most ubiquitous word processor on the planet.
— Jonathan Peterson
A very nice piece, but this page itself does not validate. The errors are FAQ-level stuff: failures to encode ampersands and smart quotes correctly.
Flavell’s Law comes to mind, although this page is probably out of its scope.
(I think the CMS should take care of these things, but unfortunately many contant management systems do not check the content.)
— Henri Sivonen
Ha! It validated yesterday. MT’s handling of comments is sub-optimal. I fixed one problem yesterday with a trackback from a URL with an ampersand in it; apparently there are other problems with comments as well. Checking…
Fixed for the time being, although the root cause still exists. Looking into solutions now…
What is Flavel’s Law?
In brief, it’s the observation by Alan Flavell that new sites offering HTML tutorials fail miserably when submitted to validation or markup checkers.
I’d imagine this is akin to programming books published with code that won’t compile.
— Joe Grossberg
Yes, also a problem is web accessibility articles on sites with tons of unlabeled spacer GIFs, etc.
I try to practice what I preach. If I say accessibility is important, I try to be as accessible as possible (note this is somewhat subjective at higher levels). If I say validation is important, I try to ensure that every page on my site is and remains valid (not just the home page). If I say CSS is cool, I try to use it in obviously cool ways.
The unencoded ampersand problem in comments is a pickle, though. I need to encode them, but just the ones that aren’t already part of an HTML entity. Perhaps decoding everything with an entity-aware parser, then re-encoding everything. This would also allow people to more easily enter HTML markup examples, since it would auto-encode it for them and display it as-is in the comments. I think.
extra $300 worth of software? okay, I guess that’s true if you’re (still?!) using Office 98, but the plug-in is free if you’re already on Office 2000.
Jeremy, thank you for the kind words. (I debated whether to make the table of contents into a table, but it actually seemed semantically table-like to me, so I took the easier road.) I’d love to see something that made such “ad-hocery” less necessary too.
stupid ampersands. ;)
“but it’s innexcusable to need an extra $300 worth of software to get something decent out of the most ubiquitous word processor on the planet.”
1. The filter download is free, to the best of my knowledge.
2. Word is still a word processor, not an html editor. I can get lots of decent somethings out of it, without extra add-ins of any sort. Those somethings are mostly .doc files, but that’s to be expected when I use a program for producing .doc files.
As for those ampersands, perl can do “negative look-ahead assertions” in regexps, where it won’t match a pattern if it is followed by another pattern. Like: don’t match an ampersand it it’s followed by amp; or #109;. I do that in two lines of a self-rolled weblog post processor, see just under the SPECIAL CHARACTERS:
You could look into that.
I think there needs to some clarification on this. Its not what you say, its how you say it!
First, anyone who posts “My web page doesn’t work in X browser what’s wrong?” should not get replied to with “Go validate your markup.” They should get “Please check the FAQ at this link,” or “Please see this link on Mark Pilgrim’s page, it has a good explanation on how and why you should check to make sure your markup is valid. Once you check your markup please give us some details on your specific problem. Thanks!”
On one level, you want to tell off people who are too lazy to read the manual. On the other hand, if you take the high road and gently guide them to the manual and see if they try it out, you’ll find they will probably more often than not actually read it. For those that don’t read the book, they may scream and shout and stamp their feet, but they’ll have no room to complain because you gave them the answer.
Thats my only comment because while the spirit of this explanation is great, the tone of this article says “why aren’t we helping you” when that should not even be a consideration. It should come out and say right out we aren’t ignoring you, we are helping you! Here is why we are helping you.
From helping people 101 ;)
You can also use HTML Tidy to clean up Word generated (and other bloatedly formed) HTML – http://www.w3.org/People/Raggett/tidy/
I guess I’m picky, but I don’t like how Tidy (as least as how it’s built into HTML-Kit, which otherwise I love) re-indents the code.
(maybe I *should* learn Python; I hear it’a all about the whitespace….)
I really like the irony of this post.
Another bit of sly Pilgrim humour, eh?
— Jacques Distler
a good related resource is Steve Champeon’s article on how to RTFM: http://hotwired.lycos.com/webmonkey/00/08/index2a.html?tw=commentary
I am no longer accepting public comments on this post, but you can use this form to contact me privately. (Your message will not be published.)