Hi! I'm Ryan Grove: Sorcerer at SmugMug, lover of movies, eater of pie, connoisseur of awesome.

Posts tagged with “programming”

Displaying items 1 - 10 of 78


GitHub has always exposed raw source files in public repos via the raw.github.com domain, but — presumably for security reasons — they serve most files with a text/plain content type. This prevents all browsers from rendering HTML, and prevents some browsers from executing JavaScript and CSS.

I‘ve always thought this was unfortunate. Being able to point directly to raw files in a GitHub repo can be useful for testing and for quick demos. I especially want it when I’m writing jsPerf tests and want to compare the performance of code in different branches.

So I threw together rawgithub.com.

It's essentially a proxy for raw.github.com that serves things with the correct content type based on the file extension. Just replace raw.github.com in any GitHub URL with rawgithub.com and you can render HTML, execute JavaScript, and apply CSS right from your GitHub repos.

Have fun with it, but don't use it for anything important! It's slow and it might break occasionally.

Oh, and the source is on GitHub, naturally.

Extending JavaScript natives

The argument about whether or not it’s okay to extend JavaScript natives tends to focus too much on whether you should or shouldn’t, instead of on when you should and shouldn’t, which would be a much more useful discussion.

The ability to extend natives is a powerful feature of the language, and it’s one that can be used to do fantastic things. When used in the wrong places or for the wrong reasons, it can seriously fuck shit up and make developers cry.

Generally speaking, there are two kinds of JavaScript code that you run on any website you build: there’s the code you (or your team) write yourself, and there’s the code that comes from libraries someone else wrote.

Extending natives in your own code is a bit like decorating your living room. It makes perfect sense. You live there, you have opinions about how it should look and where things should go, and you’re the one who either benefits or suffers as a result of your choices, so you should feel free to go nuts.

Extending natives in library code that will be used on someone else’s pages is a bit like being invited into someone else’s house and — without asking permission — repainting the walls, moving furniture around, adding new furniture, breaking expensive vases, and just generally being an asshole.

You could argue that the person who invited you into their home knew you were an asshole before they invited you, so they knew what they were getting themselves into, but that would just make you a victim-blaming asshole, which is even worse than a normal asshole. So don’t argue that.

Sure, some libraries are so well-known for extending natives that anyone who uses them can be said to be opting into those risks. But later, when the developer adds another library to the page to help them build some new functionality and the second library expects natives to behave like real natives when they actually behave however the first library thinks they should behave, it’s the developer who suffers.

And that developer will inevitably file a bug against the second library, because things broke when they started using it. And then that library’s authors suffer.

And then the authors of the second library end up writing code like this to protect users from brokenness caused by the first library, and the fiery heat death of the universe draws just a little bit closer.

All because some asshole thought repainting someone else’s living room was a good idea.

Connecting the dots

Designing a flexible framework is tricky. Most framework users are blinded by choice and would rather just connect the dots. And boy do they hate it if the dots don’t connect up to create exactly the picture they want.

Framework designers need to understand this and at least provide clearly defined guideposts for the users who want to connect the dots, while also making it possible for more adventurous users to blaze their own trails without too much effort.

Framework users should strive to see beyond their immediate goals and look at the ways the framework can make their work easier, not just the ways it can do their work for them.

Simple makefile to minify CSS and JS

I recently needed a quick and easy way to minify CSS and JS for the new YUI Library website (launching soon!). In the past I’ve written powerful and complicated tools for doing static asset management and minification, but this time I wanted something simple.

A good old-fashioned makefile turned out to be the perfect tool for the job. Here’s what I came up with. Feel free to use it in your own projects. This version requires the YUI Compressor, but that can easily be replaced with Closure Compiler, Uglify, or any other tool of your choice.

# Patterns matching CSS files that should be minified. Files with a -min.css
# suffix will be ignored.
CSS_FILES = $(filter-out %-min.css,$(wildcard \
	public/css/*.css \
	public/css/**/*.css \

# Patterns matching JS files that should be minified. Files with a -min.js
# suffix will be ignored.
JS_FILES = $(filter-out %-min.js,$(wildcard \
	public/js/*.js \
	public/js/**/*.js \

# Command to run to execute the YUI Compressor.
YUI_COMPRESSOR = java -jar yuicompressor-2.4.6.jar

# Flags to pass to the YUI Compressor for both CSS and JS.
YUI_COMPRESSOR_FLAGS = --charset utf-8 --verbose

CSS_MINIFIED = $(CSS_FILES:.css=-min.css)
JS_MINIFIED = $(JS_FILES:.js=-min.js)

# target: minify - Minifies CSS and JS.
minify: minify-css minify-js

# target: minify-css - Minifies CSS.
minify-css: $(CSS_FILES) $(CSS_MINIFIED)

# target: minify-js - Minifies JS.
minify-js: $(JS_FILES) $(JS_MINIFIED)

%-min.css: %.css
	@echo '==> Minifying $<'

%-min.js: %.js
	@echo '==> Minifying $<'

# target: clean - Removes minified CSS and JS files.

# target: help - Displays help.
	@egrep "^# target:" Makefile

To use this, save it as a makefile, customize it as necessary, and then run make minify to minify your .js and .css files. Minified files will be saved with a -min suffix alongside the originals. Only files that have changed since the last time you minified them will be processed.

This file is also available as a gist if you’d like to fork it and improve it. Enjoy!

Steve Jobs on programmer productivity

Steve Jobs discussing programmer productivity in his keynote Q&A from the 1997 WWDC:

The way you get programmer productivity is not by increasing the lines of code per programmer per day. That doesn’t work. The way you get programmer productivity is by eliminating lines of code you have to write.

The line of code that’s the fastest to write, that never breaks, that doesn’t need maintenance, is the line you never had to write.

This isn’t the only gem from that keynote. The entire thing is fantastic and worth watching, if only to see in hindsight what an amazing product visionary Steve is.

In this unrehearsed Q&A session (not a prepared presentation), he lays out many of the ideas that will chart Apple’s course for the next 15 years. Mac OS X, iLife, Xcode, MobileMe, iCloud, and even the iPhone and iPad—the seeds of all of these ideas were clearly already present in Steve’s mind.


There's more to HTML escaping than &, <, >, and "

A few days ago I tweeted:

If I had a dollar for every HTML escaper that only escapes &, <, >, and ", I'd have $0. Because my account would've been pwned via XSS."

This was exaggeration for effect—there aren’t many cases where a simple XSS injection could actually empty a bank account—but I wanted to make a point.

By some coincidence, I’ve found myself working with various open source projects recently that take a half-assed approach to HTML escaping. It’s something that tends to be implemented as an afterthought, which is unfortunate because it can be critical for the security of users of these projects. I won’t name any names in this post (pull requests are forthcoming), but I will explain some of the common problems I’ve seen, why they’re problems, and what can be done to fix them.

This post is not an introduction to HTML escaping. It assumes that you already know what HTML escaping is and why it’s necessary. This post also is not a comprehensive catalog of XSS vectors; the examples here are illustrative, but they certainly aren’t the only attacks you need to worry about. The intent of this post is to explain some dangers that you may not be aware of, and to encourage you to read more about them and write safer code.

Note that this post only discusses escaping, which is something entirely different (and far less complicated) than sanitizing. HTML sanitization is a topic for another time.

Escaping < and > isn’t enough

The worst HTML escaper I’ve seen in a major open source project only escapes the < and > characters. This may actually be worse than not escaping anything at all, since it gives the illusion of security, but is trivial to defeat.

For example, let’s say I have the following template, and I’m going to replace the placeholder values, indicated in [square brackets], with HTML-escaped user input:

<a href="/user/[username]">[username]</a>

The attacker enters foo" onmouseover="alert(1) as their username. End result, even after escaping:

<a href="/user/foo" onmouseover="alert(1)">foo" onmouseover="alert(1)</a>

Because the " character wasn’t escaped and the attacker’s input was used in an attribute value, the attacker was able to inject arbitrary attributes and therefore JavaScript (which, in a real XSS attack, would probably be something more harmful than an alert).

This is a classic example of making input safer in one context—in this case, as the content of an <a> element—without considering the other contexts in which it’s likely to be used, such as inside an attribute value.

Escaping &, <, >, and " isn’t enough

The characters &, <, >, and " are the ones most commonly targeted by HTML escaper implementations. This seems to be the minimum set of characters that people think need to be escaped. Unfortunately, it’s still not safe if you don’t have complete control over where the escaped values will be used.

Consider the following template, in which the template author has used single-quoted attribute values:

<a href='/user/[username]'>[username]</a>

This is exploitable using the same attack as the previous example, but with single quotes instead of double quotes: foo' onmouseover='alert(1):

<a href='/user/foo' onmouseover='alert(1)'>foo' onmouseover='alert(1)</a>

You may be saying, “But I always use double quotes to quote attribute values!” Are you also the only person who will ever use your HTML escaper? And are you immune to typos?

Escaping &, <, >, ", and ' isn't enough

This is the character set used by PHP’s ubiquitous htmlspecialchars function, and as you may have guessed, it still falls down on attribute values for two reasons.

First, as Hacker News users DanBlake and nbpoole pointed out in a discussion of this blog post, Internet Explorer treats ` as an attribute delimiter. It may be an edge case, but it’s still a potential attack vector, so ` needs to be escaped too.

Second, HTML also allows attribute values to be completely unquoted. Believe it or not, unquoted attribute values are fairly popular (some people are too lazy to quote them, others are performance zealots who can’t bear the thought of wasting those extra bytes).

Unquoted attribute values are one of the single biggest XSS vectors there is. If you don’t quote your attribute values, you’re essentially leaving the door wide open for naughty people to inject naughty things into your HTML. Very few escaper implementations cover all the edge cases necessary to prevent unquoted attribute values from becoming XSS vectors.

Escaping &, <, >, ", ', `, , !, @, $, %, (, ), =, +, {, }, [, and ] is almost enough

All those characters up there (including the space character!) can be used to break out of an unquoted HTML attribute value. If you escape every last one of them, then you’re probably pretty close to being safe. But you’re still not so safe that you can just start throwing around user input willy nilly.

Why? Because this still doesn’t cover some context-specific cases like inserting user input into the body of an inline <script> element or using user input as part of a URL.

Context is key

If you haven’t figured it out already, the primary message I’m trying to convey here is that you must be aware of the context in which you’re working with user input. Some contexts are more susceptible to attack than others, and there’s no single magic escaping bullet that will protect you or your users in all cases.

In other words, you don’t need to escape everything all the time, but you do need to escape everything that’s important in the particular contexts in which you’re displaying user input.

But there’s still one more wrench to throw into the works…

Always specify a charset, or UTF-7 will eat your face

Even if you do everything else right, serving a page that doesn’t explicitly specify a character set can leave Internet Explorer users open to XSS, thanks to the way IE sniffs out the charset when it isn’t specified.

If an attacker is able to get your page to echo back something that looks like UTF-7 encoding early enough in the page, he may be able to trick IE into rendering the page using UTF-7. This could turn the following seemingly harmless input…


…into something potentially harmful:


I recommend specifying a UTF-8 charset in both the Content-Type HTTP response header and a <meta> tag, since it’s easy for one or the other to get switched off or omitted inadvertently as a codebase ages (this has happened to me).

Further reading

As I mentioned in the disclaimer at the top of this post, this is not a comprehensive reference of all the things that can go wrong with HTML escaping. It’s not even a guide. It’s more of a tip-of-the-iceberg preview. Please don’t assume that, having read this post, you now know everything there is to know about HTML escaping. I can guarantee that you don’t, because I don’t.

I learned a lot from the following sources, and I highly recommend them if you’re interested in learning more:

Sanitize 2.0.0 released

Version 2.0.0 of Sanitize, my whitelist-based HTML filtering library for Ruby, is now available. This release includes several new features and some changes to existing features. I’ll cover the big stuff in this blog post; for the complete list of changes, see the HISTORY.md file.


To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Sanitize is fully compatible with Ruby 1.8.7, 1.9.1 and 1.9.2.


The most significant change in this release is that Sanitize’s core filtering logic is now implemented entirely as a set of always-on transformers. This simplifies the core code and means that Sanitize itself is now built on the same powerful transformer architecture that you can use in your own apps to enhance or alter Sanitize’s functionality.

The environment object provided as input to transformers now contains a slightly different set of data, and transformer output has been simplified. Transformers are no longer required to return anything, and are expected to make any desired alterations directly to the current node and/or document.

Sanitize now has the ability to traverse the document and execute transformers using either depth-first traversal (the default behavior, same as before) or breadth-first traversal (new in 2.0.0). If necessary, you can even run one set of transformers using one traversal method and another using the other method. This allows for greater flexibility and less complexity when writing certain types of transformers.

The README has more details on these changes and new features.

Other notable changes

  • Sanitize now outputs HTML4/HTML5 markup by default instead of XHTML (e.g., <img src="foo.jpg"> instead of <img src="foo.jpg" />, etc.). If you prefer the old behavior, you can set the :output config to :xhtml.
  • Some new elements and attributes (including several HTML5 elements) have been added to the built-in basic and relaxed whitelists. See HISTORY.md for the complete list.
  • Elements like <br>, <p>, and others are now replaced with whitespace when they’re removed in order to preserve the readability of the remaining text content. The list of elements that will be replaced with whitespace when removed is configurable using the :whitespace_elements setting.

Be aware that if you expect specific output from Sanitize in your unit tests, you may need to update your tests. The HTML output from this release may not precisely match the output from previous releases.

Try it out, report bugs

As always, you can try out Sanitize’s built-in filters using the test page at sanitize.pieisgood.org. Please use Sanitize’s GitHub issue tracker to report bugs and file feature requests.

Introducing YUI 3 AutoComplete

In November I gave a talk at YUIConf 2010 introducing the new AutoComplete widget I wrote for YUI 3.3.0, which we’ll be releasing soon. The video of the talk is now up on YUI Theater for your viewing pleasure. The slides are available on SlideShare.

If you didn’t attend YUIConf and haven’t yet had a chance to check out the videos, you should. The conference was packed with excellent talks this year, including quite a few about topics not directly related to YUI, so there’s sure to be something there for you even if you’re not a YUI user.

Ruby script to retrieve and display Comcast data usage

Update (2011-04-03): Comcast’s user account pages now appear to require JavaScript, which makes it impossible to scrape the usage data using a simple script. As a result, this script no longer works.

Comcast has often advertised their high speed Internet service as providing “unlimited” data transfer, but when they say “unlimited”, what they really mean is “limited to 250GB a month”.

Just before the new year, Comcast finally rolled out a data usage meter to users in the Portland, Oregon area so we can actually tell when we’re in danger of exceeding that 250GB ceiling. I find this usage meter incredibly helpful in achieving my goal of using as much of my monthly 250GB data allotment as I possibly can. I feel it’s my duty to get my full money’s worth.

Unfortunately, the meter is buried several pages deep in Comcast’s account site, which is a slow and ugly beast that requires a login, several redirects, and a click or two. So I whipped up a little Ruby script to do the dirty work for me and just print out my current usage total.

Before using the script, you’ll need to install the Mechanize gem:

gem install mechanize

Here’s the script:

#!/usr/bin/env ruby

require 'rubygems'
require 'mechanize'

URL_LOGIN = 'https://login.comcast.net/login?continue=https://login.comcast.net/account'
URL_USERS = 'https://customer.comcast.com/Secure/Users.aspx'

abort "Usage: #{$0} <username> <password>" unless ARGV.length == 2

agent = Mechanize.new

agent.follow_meta_refresh = true
agent.redirect_ok = true
agent.user_agent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6'

login_page = agent.get(URL_LOGIN)

login_form = login_page.form_with(:name => 'signin')
login_form.user = ARGV[0]
login_form.passwd = ARGV[1]

redirect_page = agent.submit(login_form)
redirect_form = redirect_page.form_with(:name => 'redir')

abort 'Error: Login failed' unless redirect_form

account_page = agent.submit(redirect_form, redirect_form.buttons.first)

users_page = agent.get(URL_USERS)
usage_text = users_page.search("div[@class='usage-graph-legend']").first.content

puts usage_text.strip

Save it to an executable file (I called it capmon.rb), then run it like so, passing in your Comcast.net username and password (they’ll be sent securely over HTTPS):

./capmon.rb myusername mypass

The script will log into your Comcast account, go through all those painful redirects and clicks, and eventually spit out your usage stats, which will look something like this:

166GB of 250GB

Couldn’t be simpler! Naturally, this script won’t work for you unless you’re a Comcast customer in a region where the usage meter is currently available. Also, the script will break if Comcast changes their login flow or page structure, but I’ll try to keep this post updated if that happens.

This script is available as a GitHub gist as well. If you’d like to modify it and make it better, please fork the gist.

Read about Storage Lite on YUIBlog

I’ve written an article on YUIBlog introducing my new Storage Lite YUI 3 Gallery module, which provides a lightweight, cross-browser client-side storage API without relying on any browser plugins. This is an updated version of the storage library I mentioned last year in my post about the development of Yahoo! Search Pad.

Head over to YUIBlog to read the article, and feel free to fork Storage Lite on GitHub and add awesomeness (or subtract crappiness).