wonko.com

Hi! I'm Ryan Grove: Sorcerer at SmugMug, lover of movies, eater of pie, connoisseur of awesome.

Posts tagged with “code”

Displaying items 1 - 10 of 27

Simple makefile to minify CSS and JS

I recently needed a quick and easy way to minify CSS and JS for the new YUI Library website (launching soon!). In the past I’ve written powerful and complicated tools for doing static asset management and minification, but this time I wanted something simple.

A good old-fashioned makefile turned out to be the perfect tool for the job. Here’s what I came up with. Feel free to use it in your own projects. This version requires the YUI Compressor, but that can easily be replaced with Closure Compiler, Uglify, or any other tool of your choice.

# Patterns matching CSS files that should be minified. Files with a -min.css
# suffix will be ignored.
CSS_FILES = $(filter-out %-min.css,$(wildcard \
	public/css/*.css \
	public/css/**/*.css \
))

# Patterns matching JS files that should be minified. Files with a -min.js
# suffix will be ignored.
JS_FILES = $(filter-out %-min.js,$(wildcard \
	public/js/*.js \
	public/js/**/*.js \
))

# Command to run to execute the YUI Compressor.
YUI_COMPRESSOR = java -jar yuicompressor-2.4.6.jar

# Flags to pass to the YUI Compressor for both CSS and JS.
YUI_COMPRESSOR_FLAGS = --charset utf-8 --verbose

CSS_MINIFIED = $(CSS_FILES:.css=-min.css)
JS_MINIFIED = $(JS_FILES:.js=-min.js)

# target: minify - Minifies CSS and JS.
minify: minify-css minify-js

# target: minify-css - Minifies CSS.
minify-css: $(CSS_FILES) $(CSS_MINIFIED)

# target: minify-js - Minifies JS.
minify-js: $(JS_FILES) $(JS_MINIFIED)

%-min.css: %.css
	@echo '==> Minifying $<'
	$(YUI_COMPRESSOR) $(YUI_COMPRESSOR_FLAGS) --type css $< >$@
	@echo

%-min.js: %.js
	@echo '==> Minifying $<'
	$(YUI_COMPRESSOR) $(YUI_COMPRESSOR_FLAGS) --type js $< >$@
	@echo

# target: clean - Removes minified CSS and JS files.
clean:
	rm -f $(CSS_MINIFIED) $(JS_MINIFIED)

# target: help - Displays help.
help:
	@egrep "^# target:" Makefile

To use this, save it as a makefile, customize it as necessary, and then run make minify to minify your .js and .css files. Minified files will be saved with a -min suffix alongside the originals. Only files that have changed since the last time you minified them will be processed.

This file is also available as a gist if you’d like to fork it and improve it. Enjoy!

Sanitize 2.0.0 released

Version 2.0.0 of Sanitize, my whitelist-based HTML filtering library for Ruby, is now available. This release includes several new features and some changes to existing features. I’ll cover the big stuff in this blog post; for the complete list of changes, see the HISTORY.md file.

Installing

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Sanitize is fully compatible with Ruby 1.8.7, 1.9.1 and 1.9.2.

Transformers

The most significant change in this release is that Sanitize’s core filtering logic is now implemented entirely as a set of always-on transformers. This simplifies the core code and means that Sanitize itself is now built on the same powerful transformer architecture that you can use in your own apps to enhance or alter Sanitize’s functionality.

The environment object provided as input to transformers now contains a slightly different set of data, and transformer output has been simplified. Transformers are no longer required to return anything, and are expected to make any desired alterations directly to the current node and/or document.

Sanitize now has the ability to traverse the document and execute transformers using either depth-first traversal (the default behavior, same as before) or breadth-first traversal (new in 2.0.0). If necessary, you can even run one set of transformers using one traversal method and another using the other method. This allows for greater flexibility and less complexity when writing certain types of transformers.

The README has more details on these changes and new features.

Other notable changes

  • Sanitize now outputs HTML4/HTML5 markup by default instead of XHTML (e.g., <img src="foo.jpg"> instead of <img src="foo.jpg" />, etc.). If you prefer the old behavior, you can set the :output config to :xhtml.
  • Some new elements and attributes (including several HTML5 elements) have been added to the built-in basic and relaxed whitelists. See HISTORY.md for the complete list.
  • Elements like <br>, <p>, and others are now replaced with whitespace when they’re removed in order to preserve the readability of the remaining text content. The list of elements that will be replaced with whitespace when removed is configurable using the :whitespace_elements setting.

Be aware that if you expect specific output from Sanitize in your unit tests, you may need to update your tests. The HTML output from this release may not precisely match the output from previous releases.

Try it out, report bugs

As always, you can try out Sanitize’s built-in filters using the test page at sanitize.pieisgood.org. Please use Sanitize’s GitHub issue tracker to report bugs and file feature requests.

Ruby script to retrieve and display Comcast data usage

Update (2011-04-03): Comcast’s user account pages now appear to require JavaScript, which makes it impossible to scrape the usage data using a simple script. As a result, this script no longer works.

Comcast has often advertised their high speed Internet service as providing “unlimited” data transfer, but when they say “unlimited”, what they really mean is “limited to 250GB a month”.

Just before the new year, Comcast finally rolled out a data usage meter to users in the Portland, Oregon area so we can actually tell when we’re in danger of exceeding that 250GB ceiling. I find this usage meter incredibly helpful in achieving my goal of using as much of my monthly 250GB data allotment as I possibly can. I feel it’s my duty to get my full money’s worth.

Unfortunately, the meter is buried several pages deep in Comcast’s account site, which is a slow and ugly beast that requires a login, several redirects, and a click or two. So I whipped up a little Ruby script to do the dirty work for me and just print out my current usage total.

Before using the script, you’ll need to install the Mechanize gem:

gem install mechanize

Here’s the script:

#!/usr/bin/env ruby

require 'rubygems'
require 'mechanize'

URL_LOGIN = 'https://login.comcast.net/login?continue=https://login.comcast.net/account'
URL_USERS = 'https://customer.comcast.com/Secure/Users.aspx'

abort "Usage: #{$0} <username> <password>" unless ARGV.length == 2

agent = Mechanize.new

agent.follow_meta_refresh = true
agent.redirect_ok = true
agent.user_agent = 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6'

login_page = agent.get(URL_LOGIN)

login_form = login_page.form_with(:name => 'signin')
login_form.user = ARGV[0]
login_form.passwd = ARGV[1]

redirect_page = agent.submit(login_form)
redirect_form = redirect_page.form_with(:name => 'redir')

abort 'Error: Login failed' unless redirect_form

account_page = agent.submit(redirect_form, redirect_form.buttons.first)

users_page = agent.get(URL_USERS)
usage_text = users_page.search("div[@class='usage-graph-legend']").first.content

puts usage_text.strip

Save it to an executable file (I called it capmon.rb), then run it like so, passing in your Comcast.net username and password (they’ll be sent securely over HTTPS):

./capmon.rb myusername mypass

The script will log into your Comcast account, go through all those painful redirects and clicks, and eventually spit out your usage stats, which will look something like this:

166GB of 250GB

Couldn’t be simpler! Naturally, this script won’t work for you unless you’re a Comcast customer in a region where the usage meter is currently available. Also, the script will break if Comcast changes their login flow or page structure, but I’ll try to keep this post updated if that happens.

This script is available as a GitHub gist as well. If you’d like to modify it and make it better, please fork the gist.

Monkeypatch to fix Ruby Net::IMAP + Dovecot response parsing bug

The Net::IMAP standard library distributed with Ruby 1.8.6, 1.8.7, and 1.9.1 contains a response parsing bug that can cause an endless hang (in 1.8.x) or raise an exception (in 1.9.1) when switching between mailboxes on a Dovecot 1.2.x server.

The bug has been fixed in Ruby’s SVN trunk and should eventually make it into the 1.9.2 release, but if you’re using Net::IMAP with a current or older Ruby release and need a fix for this, the following monkeypatch (which just replaces the old buggy method with the fixed one from SVN) should do the trick.

Fortunately, this fix is the only difference from the 1.8.6, 1.8.7, and 1.9.1 versions of this method, so the monkeypatch works for all three versions. Just add it to your own code at some point after requiring Net::IMAP.

if RUBY_VERSION <= '1.9.1'
  module Net # :nodoc:
    class IMAP # :nodoc:
      class ResponseParser # :nodoc:
        private

        # This monkeypatched method is the one included in Ruby SVN trunk as
        # of 2010-02-08.
        def resp_text_code
          @lex_state = EXPR_BEG
          match(T_LBRA)
          token = match(T_ATOM)
          name = token.value.upcase
          case name
          when /\A(?:ALERT|PARSE|READ-ONLY|READ-WRITE|TRYCREATE|NOMODSEQ)\z/n
            result = ResponseCode.new(name, nil)
          when /\A(?:PERMANENTFLAGS)\z/n
            match(T_SPACE)
            result = ResponseCode.new(name, flag_list)
          when /\A(?:UIDVALIDITY|UIDNEXT|UNSEEN)\z/n
            match(T_SPACE)
            result = ResponseCode.new(name, number)
          else
            token = lookahead
            if token.symbol == T_SPACE
              shift_token
              @lex_state = EXPR_CTEXT
              token = match(T_TEXT)
              @lex_state = EXPR_BEG
              result = ResponseCode.new(name, token.value)
            else
              result = ResponseCode.new(name, nil)
            end
          end
          match(T_RBRA)
          @lex_state = EXPR_RTEXT
          return result
        end
      end

    end
  end
end

If you’re a Larch user, the latest Larch development gem includes this fix.

Sanitize 1.2.0 released

Version 1.2.0 of Sanitize, my whitelist-based HTML sanitizing library for Ruby, is now available. Consult the HISTORY file for a complete list of changes.

Introducing Transformers

This release adds a major new feature called transformers. Transformers allow you to filter and alter HTML nodes using your own custom logic, on top of (or instead of) Sanitize’s core filter. A transformer is any Ruby object that responds to call() (such as a lambda or proc) and returns either nil or a Hash containing certain optional response values.

To use one or more transformers, pass them to the :transformers config setting:

Sanitize.clean(html, :transformers => [transformer_one, transformer_two])

Input

Each registered transformer’s call() method will be called once for each element node in the HTML, and will receive as an argument an environment Hash that contains Sanitize config information and a reference to a Nokogiri::XML::Node object.

The transformer has full access to the Nokogiri::XML::Node that’s passed into it and to the rest of the document via the node’s document() method. Any changes will be reflected instantly in the document and passed on to subsequently-called transformers and to Sanitize itself. A transformer may even call Sanitize internally to perform custom sanitization if needed.

Transformers have a tremendous amount of power, including the power to completely bypass Sanitize’s built-in filtering.

Output

A transformer may return either nil or a Hash. A return value of nil indicates that the transformer does not wish to act on the current node in any way. A returned Hash may contain instructions that tell Sanitize to whitelist certain attributes or nodes, or to replace the current node with a new node (see the README for specifics).

Example: Transformer to whitelist YouTube video embeds

The following example demonstrates how to create a Sanitize transformer that will safely whitelist valid YouTube video embeds without having to blindly allow other kinds of embedded content, which would be the case if you tried to do this by just whitelisting all <object>, <embed>, and <param> elements:

lambda do |env|
  node      = env[:node]
  node_name = node.name.to_s.downcase
  parent    = node.parent

  # Since the transformer receives the deepest nodes first, we look for a
  # <param> element or an <embed> element whose parent is an <object>.
  return nil unless (node_name == 'param' || node_name == 'embed') &&
      parent.name.to_s.downcase == 'object'

  if node_name == 'param'
    # Quick XPath search to find the <param> node that contains the video URL.
    return nil unless movie_node = parent.search('param[@name="movie"]')[0]
    url = movie_node['value']
  else
    # Since this is an <embed>, the video URL is in the "src" attribute. No
    # extra work needed.
    url = node['src']
  end

  # Verify that the video URL is actually a valid YouTube video URL.
  return nil unless url =~ /^http:\/\/(?:www\.)?youtube\.com\/v\//

  # We're now certain that this is a YouTube embed, but we still need to run
  # it through a special Sanitize step to ensure that no unwanted elements or
  # attributes that don't belong in a YouTube embed can sneak in.
  Sanitize.clean_node!(parent, {
    :elements   => ['embed', 'object', 'param'],
    :attributes => {
      'embed'  => ['allowfullscreen', 'allowscriptaccess', 'height', 'src', 'type', 'width'],
      'object' => ['height', 'width'],
      'param'  => ['name', 'value']
    }
  })

  # Now that we're sure that this is a valid YouTube embed and that there are
  # no unwanted elements or attributes hidden inside it, we can tell Sanitize
  # to whitelist the current node (<param> or <embed>) and its parent
  # (<object>).
  {:whitelist_nodes => [node, parent]}
end

For more details on transformers, consult the README.

Installing

To install or upgrade Sanitize via RubyGems, run:

gem install sanitize

Pretty JSLint output for TextMate

My coworker Stoyan Stefanov wrote a helpful blog post a few weeks ago describing how to create a simple TextMate bundle that allows you to quickly run the current file through JSLint. I’ve extended Stoyan’s bundle command to prettify the JSLint output for display in an HTML window.

Here’s what the ouput looks like (click for full size):

Screenshot of TextMate JSLint bundle output

To use this command, just follow the instructions in Stoyan’s blog post using the script below in place of his bundle command, then select “Show as HTML” from the Output dropdown below the command edit box.

#!/usr/bin/env ruby
require 'cgi'

lint = `java org.mozilla.javascript.tools.shell.Main ~/Library/JSLint/jslint.js "$TM_FILEPATH"`

lint.gsub!(/^(Lint at line )(\d+)(.+?:)(.+?)\n(?:(.+?)\n\n)?/m) do
  "<p><strong>#{CGI.escapeHTML($1)}<a href=\"txmt://open?url=file://TM_FILEPATH&line=#{CGI.escapeHTML($2)}\">#{CGI.escapeHTML($2)}</a>#{CGI.escapeHTML($3)}</strong>#{CGI.escapeHTML($4)}" <<
    ($5 ? "<pre>#{CGI.escapeHTML($5)}</pre>" : '')
end

lint.gsub!(/^(jslint:)(.+?)$/, '<p><strong>\1</strong>\2</p>')
lint.gsub!(/TM_FILEPATH/, ENV['TM_FILEPATH']) 

print <<HTML
<!doctype>
<html>
<head>
  <style type="text/css">
    p { margin-bottom: 0; }
    pre {
      background: #f5f5f5;
      border: 1px solid #cfcfcf;
      font-size: 12px;
      margin-top: 2px;
      padding: 2px 4px;
    }
  </style>
</head>
<body>
  #{lint}
</body>
</html>
HTML

Update (2009-05-07): Added line number linkage courtesy of Steve Spencer.

It's never easy

Non-developers (or developers unfamiliar with the details of a particular feature) sometimes say that a seemingly straightforward change “should be easy to implement” without knowing whereof they speak.

Usually what they mean is that in the perfect world that exists in their imagination it should be easy. Sadly, this is an imperfect world full of shitty code, and even when the code’s not shitty it may not always be possible to accommodate new requirements without refactoring.

Never tell a developer something should be easy to implement unless:

  1. The universe is perfect, or
  2. You plan to implement it yourself

This blogget was inspired by my hellish week and a coincidental tweet by @AptanaLoriHC.

Sanitize: A whitelist-based Ruby HTML sanitizer

Merry Christmas, Internets! My gift to you this year is Sanitize, a whitelist-based HTML sanitizer written in Ruby. Given a list of acceptable elements and attributes, Sanitize will remove all unacceptable HTML from a string.

Using a simple configuration syntax, you can tell Sanitize to allow certain elements, certain attributes within those elements, and even certain URL protocols within attributes that contain URLs. Any HTML elements or attributes that you don’t explicitly allow will be removed.

Because it’s based on Nokogiri, a full-fledged HTML parser, rather than a bunch of fragile regular expressions, Sanitize has no trouble dealing with malformed or maliciously-formed HTML. When in doubt, Sanitize always errs on the side of caution.

Using Sanitize is easy. First, install it:

gem install sanitize

Then call it like so:

require 'rubygems'
require 'sanitize'

html = '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'

Sanitize.clean(html) # => 'foo'

By default, Sanitize removes all HTML. You can use one of the built-in configs to tell Sanitize to allow certain attributes and elements:

Sanitize.clean(html, Sanitize::Config::RESTRICTED)
# => '<b>foo</b>'

Sanitize.clean(html, Sanitize::Config::BASIC)
# => '<b><a href="http://foo.com/" rel="nofollow">foo</a></b>'

Sanitize.clean(html, Sanitize::Config::RELAXED)
# => '<b><a href="http://foo.com/">foo</a></b><img src="http://foo.com/bar.jpg" />'

Or, if you’d like more control over what’s allowed, you can provide your own custom configuration:

Sanitize.clean(html, :elements => ['a', 'span'],
    :attributes => {'a' => ['href', 'title'], 'span' => ['class']},
    :protocols => {'a' => {'href' => ['http', 'https', 'mailto']}})

For more details, see the Sanitize Documentation.

Trogdor: Burninatingly fast search using Yahoo! BOSS

Everyone and their dog seems to have written an example of how to use the Yahoo! Search BOSS API to build a simple search tool. I wanted to take that one step further and build something that would serve both as an example and as a usable service, and that could be extended and enhanced by other developers. Since my day job involves constant tradeoffs between making Yahoo! Search slower (by adding features) and making it faster (by optimizing those features), my primary goal here was to make something as fast as technically possible.

To do this, I wrote a very simple JavaScript module called Trogdor that uses dynamic script nodes to make cross-domain JSONP requests to the BOSS API as you type your query. Search results are returned and rendered almost instantly on each keystroke, and you can use the up and down arrow keys (or tab) and enter to quickly select the result you want—no mouse necessary.

Trogdor doesn’t require a JavaScript framework and works great in all modern browsers (and even some ancient, crappy browsers like IE6). The entire package (HTML, CSS and JS) weighs just a smidge under 2KB after minification and gzip, and it’s wonderfully fast.

Try it out for yourself at pieisgood.org/search and be sure to grab the heavily-commented source code on GitHub. If you’ve got ideas for features and improvements, fork the repo and go nuts (and be sure to let me know what you come up with). You’re also welcome to use Trogdor (modified or unmodified) in your own projects, although I do ask that you please use your own BOSS API key rather than the one included in the example.

Update, 11/26: Changed the name of the library from FastSearch to Trogdor, since dragons are awesome (and apparently there’s a Microsoft search product called FastSearch).

Try to use one var statement per scope in JavaScript

JavaScript’s var statement declares and optionally initializes one or more variables in the scope of the current function (or as global variables when used outside a function). Since var accepts multiple declarations, separated by commas, there’s usually no reason to use it more than once per function; it’s just a waste of bytes.

Overuse of var statements is one of the most common problems I see in JavaScript code. I was guilty of it myself for quite a while and it took me a long time to break the habit.

Bad:

function getElementsByClassName(className, tagName, root) {
  var elements = [];
  var root     = root || document;
  var tagName  = tagName || '*';
  var haystack = root.getElementsByTagName(tagName);
  var regex    = new RegExp('(?:^|\\s+)' + className + '(?:\\s+|$)');

  for (var i = 0, length = haystack.length; i < length; ++i) {
    var el = haystack[i];

    if (el.className && regex.test(el.className)) {
      elements.push(el);
    }
  }

  return elements;
}

There are several things wrong with the example above.

The most obvious problem is that I’ve used the var statement no less than seven times. Somewhat less obvious, but far worse: I’ve used it inside a loop, which means that I’m unnecessarily redeclaring a variable on each iteration. I’ve also unnecessarily redeclared two variables that were passed in as function arguments.

Naturally, there’s a much better way to do this.

Good:

function getElementsByClassName(className, tagName, root) {
  root    = root || document;
  tagName = tagName || '*';

  var elements = [],
      haystack = root.getElementsByTagName(tagName),
      length   = haystack.length,
      regex    = new RegExp('(?:^|\\s+)' + className + '(?:\\s+|$)'),
      el, i;

  for (i = 0; i < length; ++i) {
    el = haystack[i];

    if (el.className && regex.test(el.className)) {
      elements.push(el);
    }
  }

  return elements;
}

There are circumstances in which it is actually necessary to redeclare a variable within a single scope, but they’re very rare, and are more often than not a warning sign that you need to rethink the code you’re writing.