A Better Website Editing Workflow: Word to Markdown
For the SULS website, most people prefer to submit content in Word format (docx
or doc
). This presents a slight problem since the website uses Markdown for content.
The solutions:
- manually convert the word files to markdown (extremely tedious and spirit crushing);
- teach everyone how to use markdown;
- automatically convert the word files to markdown.
The second option seems reasonable, until attempted in practice. Markdown is an extremely simple format - and most people seem to get it really quickly. The problem is that the workflows for producing it are terrible. If people don’t use a dedicated editor on Mac, they tend to use TextEdit which spits out RTF
files which have to be manually converted regardless. While software like Mou exists which is fantastic, not everyone has the latest version of OS X (which is required by Mou) or even runs Mac. On Windows, the WYSIWYG editor situation is pretty horrific as well.
Most of the Word content is actually pretty trivial (no complex formatting) - therefore automated conversion actually has a chance of working well. So, how do we do it? David on StackOverflow helpfully provides a solution:
textutil -convert html file.doc | pandoc -f html -t markdown -o file.md file.html
This actually works surprisingly well - however it does have some minor issues converting links and other things. It does reduce a lot of the legwork out of manually converting a lot of the content.
The main shortcoming seems to be with textutil
's conversion of doc
to html
. This could perhaps be remedied using Google Docs. One potential solution is to pull down all the website content dynamically from Google Docs as html
at compile time and then convert it to md
. This could provide a nice and seamless editing workflow.