Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
22 free tools for data visualization and analysis

22 free tools for data visualization and analysis



|Views: 84|Likes:
Published by Ernani Marques

More info:

Published by: Ernani Marques on Jun 22, 2012
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





22 free tools for data visualization and analysis
Got data? These useful tools can turn it into informative, engaging graphics.
Sharon Machlis
April 20, 2011
 (Computerworld) You may not think you've got much in common with an investigative journalist or anacademic medical researcher. But if you're trying to extract useful information from an ever-increasing inflow of data, you'll likely find visualization useful -- whether it's to show patternsor trends with graphics instead of mountains of text, or to try to explain complex issues to anontechnical audience.There are many tools around to help turn data into graphics, but they can carry hefty pricetags. The cost can make sense for professionals whose primary job is to find meaning inmountains of information, but you might not be able to justify such an expense if you or yourusers only need a graphics application from time to time, or if your budget for new tools issomewhat limited. If one of the higher-priced options is out of your reach, there are asurprising number of highly robust tools for data visualization and analysis that are availableat no charge.Here's a rundown of some of thebetter-known options, many ofwhich were demonstrated at theComputer-Assisted Reporting(CAR) conferencelast month.Others are not as well known but show great promise. They range from easy enough for abeginner (i.e., anyone who can do rudimentary spreadsheet data entry) to expert (requiringhands-on coding). But they all share one important characteristic: They're free. Your onlyinvestment: time.
Data cleaning
Before you can analyze and visualize data, it often needs to be "cleaned." What does thatmean? Perhaps some entries list "New York City" while others say "New York, NY" and youneed to standardize them before you can see patterns. There might be some records withmisspellings or numerical data-entry errors. The following two tools are designed to help getyour data in tip-top shape to be analyzed.
What it does:
This Web-based service from Stanford University's Visualization Group isdesigned for cleaning and rearranging data so it's in a form that other tools such as aspreadsheet app can use.Click on a row or column, and DataWrangler will suggest changes. For example, if you clickon a blank row, several suggestions pop up such as "delete row" or "delete empty rows."There's also a history list that allows for easy undo -- a feature that's also available inGoogleRefine(reviewed next).
 Want to see all the tools at once?
For quick reference,check out our chartlisting 22 free datavisualization tools.
Page 1 of 2122 free tools for data visualization and analysis21/04/2011http://www.computerworld.com/s/article/print/9215504/22_free_tools_for_data_visua...
What's cool:
Text editing is especially easy. For example, when I selected "Alabama" in onerow of sample data headlined "Reported crime in Alabama" and then selected "Alaska" in thenext group of data, it led to a suggestion to extract every state name. Hover your mouse overa suggestion, and you can see affected rows highlighted in red.
I found that unexpected changes occurred as I attempted to exploreDataWrangler's options; I constantly had to click "clear" to reset. And not all suggestions areuseful ("promote row to header" seemed an odd suggestion when the row was blank) or easyto understand ("fold split 1 using 2 as key").And while the fact that DataWrangler is a Web-based service makes it convenient to use,don't forget that it sends your data off to an external site -- which means it isn't an option forsensitive internal information. However, there are plans for a future release of a stand-alonedesktop version. Another important thing to keep in mind is that DataWrangler is currentlyalpha code, and its creators say it's "still a work in progress."
Skill level:
Advanced beginner.
Runs on:
Any Web browser.
Learn more:
There's a screencast on theData Wrangler home page. Also, seethis poston using DataWrangler to format data (from Tableau Public's blog).
Google Refine 
What it does:
Google Refine can be described as a spreadsheet on steroids for taking a firstlook at both text and numerical data. Like Excel, it can import and export data in a number offormats including tab- and comma-separate text files and Excel, XML and JSON files.Refine features several built-in algorithms that find text items that are spelled differently but
DataWrangler helps format table data so it can be better used and analyzed by other applications.Click to view larger image. Google Refine can make data 'cleaner' by helping to find errors or different versions of the same proper names.Clickto view larger image. 
Page 2 of 2122 free tools for data visualization and analysis21/04/2011http://www.computerworld.com/s/article/print/9215504/22_free_tools_for_data_visua...
actually should be grouped together. After importing your data, you simply select
edit cells --> cluster and edit 
and select which algorithm you want to use. After Refine runs, you decidewhether to accept or reject each suggestion. For example, you could say yes to combiningMicrosoftand Microsoft Corp., but no to combining Coach Inc. with CQG Inc. If it's offeringtoo few or too many suggestions, you can change the strength of the suggestion function.There are also numerical options that offer quick and easy overviews of data distributions.This functionality can reveal anomalies that might be the result of data input errors -- such as$800,000 instead of $80,000 for a salary entry, or it could expose inconsistencies -- such asdifferences in the way compensation data is reported from entry to entry, with some showing,say, hourly wages and others showing weekly pay or yearly salaries.Beyond data housekeeping, Google Refine offers some useful analysis tools, such as sortingand filtering.
What's cool:
Once you get used to which commands do what, this is a powerful tool for datamanipulation and analysis that strikes a good balance between functionality and ease of use.The undo/redo list of every action you've taken lets you roll back when needed. And textfunctions handle Java-syntax regular expressions, allowing you to look for patterns (such as,say, three numbers followed by two digits) as well as specific text strings and numbers.Finally, while this is a browser-based application, it works with files on your desktop, so yourdata remains local.
Although Google Refine looks like a spreadsheet, you can't do typicalspreadsheet calculations with it; for that, you must export to a conventional spreadsheetapplication. If you've got a large data set, carve out some time in your day to go through all ofRefine's suggested changes, since it can take a while. And, depending on the data set, beprepared when looking for text items to merge: You're likely to get either a lot of falsepositives or missed problems -- or both.
Skill level:
Advanced beginner. Knowledge of data analysis concepts is more important thantechnical prowess; power Excel users who understand data-cleaning needs should becomfortable with this.
Runs on:
Windows, Mac OS X (if it appears to do nothing after loading on a Mac, point abrowser manually to ), Linux.
Learn more:
Thesethree screencastsgive a good overview of why and how you'd useRefine; there's alsofairly detailed documentationon the Google Code project area.
Statistical analysis
Sometimes you need to combine graphical representation of your data with heftier numericalanalysis.
The R Project for Statistical Computing 
What it does:
R is a general statistical analysis platform (the authors call it an "environment")that runs on the command line. Need to find means, medians, standard deviations,correlations? R can handle that and much more, including "linear and generalized linearmodels, nonlinear regression models, time series analysis, classical parametric andnonparametric tests, clustering and smoothing," according to theproject website.
Page 3 of 2122 free tools for data visualization and analysis21/04/2011http://www.computerworld.com/s/article/print/9215504/22_free_tools_for_data_visua...

Activity (2)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->