Page 1

Data wrangling Sometimes we have to do dirty jobs

Michele Mauri DensityDesign Research Lab


Data often is messy and needs to be cleaned or at least converted


My data cleaning toolkit


1. Textwrangler * ** http://www.barebones.com/products/textwrangler/

* (notepad++ for winduz) ** (actually, any advanced texteditor)


1. Textwrangler useful to: - remove text formatting - clean hidden characters - replace separator charachters - structure data - apply regexp


2. Open Refine http://openrefine.org/


2. Open Refine useful to: - convert formats - reconcile data - structure data - enrich (link) data with freebase - apply GREL functions


3. Data wrangler http://vis.stanford.edu/wrangler/


3. Data Wrangler useful to: - reformat data values - correct erroneous or missing values - (re)structure dataset


4. Excel http://oďŹƒce.microsoft.com/en-us/excel/


4. Excel useful to: - use formulas - rearrange & filter - pivot tables


5. Code (processing, javascript‌)


5. Code useful to: - do everything

Data Wrangling  

Our data cleaning toolkit

Advertisement