html2text is a Python script that convers a page of HTML into clean, easy-to-read plain ASCII text. Better yet, that ASCII also happens to be valid Markdown (a text-to-HTML format).

Also known as: html to text, htm to txt, htm2txt, ...


html2text is available under the GNU GPL 2.0.

Download the latest: html2text.py


2006-02-22: 2.24. preliminary support for dt/dd
????-??-??: 2.23. fix for python2.1
2004-08-27: 2.21. old bug with extra closing list tags (tx Jonathan)
2004-08-26: 2.2. text wrapping (tx++ Joey Schulze!), supress dupe links (tx Ricardo Reyes), python2.1 support.
2004-08-23: 2.12. added hr (tx merlin)
2004-06-30: 2.11. python2.1 codec support.
2004-06-27: 2.1. better module, unicode support. expand ndash.
2004-03-27: 2.01a. fix bug w/ charrefs in links. tx Ian G.
2004-03-19: 2.0a. complete rewrite, supports Markdown
2003-03-16: 1.0. port to Python
2000-06-19: html2text.tcl (with Lars Pind)

Aaron Swartz (me@aaronsw.com)