laurence.io

April 21, 2012 at 8:08am
Home

A Better Website Editing Workflow: Word to Markdown

For the SULS website, most people prefer to submit content in Word format (docx or doc). This presents a slight problem since the website uses Markdown for content.

The solutions:

  1. manually convert the word files to markdown (extremely tedious and spirit crushing);
  2. teach everyone how to use markdown;
  3. automatically convert the word files to markdown.

The second option seems reasonable, until attempted in practice. Markdown is an extremely simple format - and most people seem to get it really quickly. The problem is that the workflows for producing it are terrible. If people don’t use a dedicated editor on Mac, they tend to use TextEdit which spits out RTF files which have to be manually converted regardless. While software like Mou exists which is fantastic, not everyone has the latest version of OS X (which is required by Mou) or even runs Mac. On Windows, the WYSIWYG editor situation is pretty horrific as well.

Most of the Word content is actually pretty trivial (no complex formatting) - therefore automated conversion actually has a chance of working well. So, how do we do it? David on StackOverflow helpfully provides a solution:

textutil -convert html file.doc | pandoc -f html -t markdown -o file.md file.html

This actually works surprisingly well - however it does have some minor issues converting links and other things. It does reduce a lot of the legwork out of manually converting a lot of the content.

The main shortcoming seems to be with textutil's conversion of doc to html. This could perhaps be remedied using Google Docs. One potential solution is to pull down all the website content dynamically from Google Docs as html at compile time and then convert it to md. This could provide a nice and seamless editing workflow.