OpenRefine (formerly Google Refine) is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.

Please note that since October 2nd, 2012, Google is not actively supporting this project, which has now been rebranded to OpenRefine. Project development, documentation and promotion is now fully supported by volunteers. Find out more about the history of OpenRefine and how you can help the community.

2017 OpenRefine User Survey

It’s been a while since our last user survey (see the result from the 2014 edition), we would like to know who you are, how you use OpenRefine and what your expectations are. So here it is the 2017 edition of the OpenRefine user survey! Thank you for sharing it with your friends, coworker, and communities!

Take the survey

Using OpenRefine - The Book

Using OpenRefine, by Ruben Verborgh and Max De Wilde, offers a great introduction to OpenRefine. Organized by recipes with hands on examples, the book covers the following topics:

  1. Import data in various formats
  2. Explore datasets in a matter of seconds
  3. Apply basic and advanced cell transformations
  4. Deal with cells that contain multiple values
  5. Create instantaneous links between datasets
  6. Filter and partition your data easily with regular expressions
  7. Use named-entity extraction on full-text fields to automatically identify topics
  8. Perform advanced data operations with the General Refine Expression Language

Introduction to OpenRefine

1. Explore Data

OpenRefine can help you explore large data sets with ease. You can find out more about this functionality by watching the video below and going through these articles

2. Clean and Transform Data

3. Reconcile and Match Data

OpenRefine can be used to link and extend your dataset with various webservices. Some services also allow OpenRefine to upload your cleaned data to a central database. A growing list of extensions and plugins is available on the wiki.