wonko.com

Hi! I'm Ryan Grove: Sorcerer at SmugMug, lover of movies, eater of pie, connoisseur of awesome.

Ruby script to sync email from any IMAP server to Gmail

Update (2009-03-16): This script has been superseded by Larch, a full-fledged Ruby application that does the same thing, only faster and more reliably.

Last night after Gmail began rolling out IMAP support, I started investigating ways to copy my huge email archive (thousands and thousands of messages dating back to 2003) from my IMAP server to Gmail’s IMAP server.

Copying the messages from one account to the other in Thunderbird works, but it’s glacially slow, needs babysitting, and is prone to creating duplicate messages unless the entire copy operation works right the first time. Great for copying a few messages, not so great for copying thousands.

I also investigated imapsync, a Perl script that’s somewhat faster and more reliable than Thunderbird and doesn’t create duplicate messages, but for some reason using imapsync results in the messages on Gmail being timestamped with the time they were imported rather than the time they were sent or received, which is unacceptable. I tried using the --syncinternaldates option to rectify this, but it didn’t work.

So, since the best way to get something done right is to do it yourself, I set about writing my own tool to transfer my email. Thanks to Ruby and Net::IMAP, this turned out to be pretty easy.

Here’s what I came up with. It’s not pretty, it’s not user friendly, and it doesn’t do much error checking, but it’s extremely fast, it works, and if it fails at any point you can just run it again and it’ll pick up where it left off. Share and enjoy.

#!/usr/bin/env ruby
require 'net/imap'

# Source server connection info.
SOURCE_NAME = 'username@example.com'
SOURCE_HOST = 'mail.example.com'
SOURCE_PORT = 143
SOURCE_SSL  = false
SOURCE_USER = 'username'
SOURCE_PASS = 'password'

# Destination server connection info.
DEST_NAME = 'username@gmail.com'
DEST_HOST = 'imap.gmail.com'
DEST_PORT = 993
DEST_SSL  = true
DEST_USER = 'username@gmail.com'
DEST_PASS = 'password'

# Mapping of source folders to destination folders. The key is the name of the
# folder on the source server, the value is the name on the destination server.
# Any folder not specified here will be ignored. If a destination folder does
# not exist, it will be created.
FOLDERS = {
  'INBOX' => 'INBOX',
  'sourcefolder' => 'gmailfolder'
}

# Maximum number of messages to select at once.
UID_BLOCK_SIZE = 1024

# Utility methods.
def dd(message)
   puts "[#{DEST_NAME}] #{message}"
end

def ds(message)
   puts "[#{SOURCE_NAME}] #{message}"
end

def uid_fetch_block(server, uids, *args)
  pos = 0

  while pos < uids.size
    server.uid_fetch(uids[pos, UID_BLOCK_SIZE], *args).each {|data| yield data }
    pos += UID_BLOCK_SIZE
  end
end

@failures = 0
@existing = 0
@synced   = 0

# Connect and log into both servers.
ds 'Connecting...'
source = Net::IMAP.new(SOURCE_HOST, SOURCE_PORT, SOURCE_SSL)

ds 'Logging in...'
source.login(SOURCE_USER, SOURCE_PASS)

dd 'Connecting...'
dest = Net::IMAP.new(DEST_HOST, DEST_PORT, DEST_SSL)

dd 'Logging in...'
dest.login(DEST_USER, DEST_PASS)

# Loop through folders and copy messages.
FOLDERS.each do |source_folder, dest_folder|
  # Open source folder in read-only mode.
  begin
    ds "Selecting folder '#{source_folder}'..."
    source.examine(source_folder)
  rescue => e
    ds "Error: select failed: #{e}"
    next
  end

  # Open (or create) destination folder in read-write mode.
  begin
    dd "Selecting folder '#{dest_folder}'..."
    dest.select(dest_folder)
  rescue => e
    begin
      dd "Folder not found; creating..."
      dest.create(dest_folder)
      dest.select(dest_folder)
    rescue => ee
      dd "Error: could not create folder: #{e}"
      next
    end
  end

  # Build a lookup hash of all message ids present in the destination folder.
  dest_info = {}

  dd 'Analyzing existing messages...'
  uids = dest.uid_search(['ALL'])

  if uids.length > 0
    uid_fetch_block(dest, uids, ['ENVELOPE']) do |data|
      dest_info[data.attr['ENVELOPE'].message_id] = true
    end
  end

  dd "Found #{uids.length} messages"

  # Loop through all messages in the source folder.
  uids = source.uid_search(['ALL'])

  ds "Found #{uids.length} messages"

  if uids.length > 0
    uid_fetch_block(source, uids, ['ENVELOPE']) do |data|
      mid = data.attr['ENVELOPE'].message_id

      # If this message is already in the destination folder, skip it.
      if dest_info[mid]
        @existing += 1
        next
      end

      # Download the full message body from the source folder.
      ds "Downloading message #{mid}..."
      msg = source.uid_fetch(data.attr['UID'], ['RFC822', 'FLAGS',
          'INTERNALDATE']).first

      # Append the message to the destination folder, preserving flags and
      # internal timestamp.
      dd "Storing message #{mid}..."

      tries = 0

      begin
        tries += 1
        dest.append(dest_folder, msg.attr['RFC822'], msg.attr['FLAGS'],
            msg.attr['INTERNALDATE'])

        @synced += 1
      rescue Net::IMAP::NoResponseError => ex
        if tries < 10
          dd "Error: #{ex.message}. Retrying..."
          sleep 1 * tries
          retry
        else
          @failures += 1
          dd "Error: #{ex.message}. Tried and failed #{tries} times; giving up on this message."
        end
      end
    end
  end

  source.close
  dest.close
end

puts "Finished. Message counts: #{@existing} untouched, #{@synced} transferred, #{@failures} failures."

Update: Now includes Steve K’s patch to fix BadResponseError exceptions. Thanks Steve!

Update (2009-03-02): Brought the script up to date with several bug fixes and enhancements (including those contributed in comments below). Thanks everyone!

Update (2009-03-16): This script has been superseded by Larch, a full-fledged Ruby application that does the same thing, only faster and more reliably.