Professional Documents
Culture Documents
• There are easily dozens of tools in each category, but we’re going to
focus on three tools
• If your total data size is only a few megabytes, investing in a big data
distributed-processing platform like a Hadoop cluster would be a
massively wasteful use of computing power and budget.
• So generally, smaller data corresponds to smaller infrastructure needs
and bigger data corresponds to bigger infrastructure needs.
• Any dataset that you decide to wrangle using SQL must be rectangular
and must also conform to a specific schema.
• As with cells in Excel, the record fields in SQL can have a variety of
types.
• Different versions of SQL support different field types, but the basic
set of dates, times, strings, and numbers are universal.
Trifacta Wrangler
• Trifacta, unlike Excel and SQL, can handle structured, semistructured,
and unstructured data.
• Like the other two tools, Trifacta supports a variety of different data
types, from the most basic integers, strings, and Booleans, to more
complex custom types like dates, US states, and phone numbers.
Transformation Paradigms