After the data quality is good, it is important to understand the
context of the data being applied to the problem. For example, if
a tree is losing its leaves in the middle of the summer, it is a sign that the tree is unhealthy. The same tree that has lost leaves in the middle of a cold winter day is a normal occurrence. Therefore, without understanding the context of data, you will likely misin- terpret results. At the same time, there is considerable attention paid to correlation between data elements. What are the relation- ships between conditions? In the example of the health of trees, there is a direct correlation between the seasons and the color and amount of leaves on the trees. But you also have to be care- ful about correlations. You might find a correlation that makes no sense because the context is wrong. There may seem to be a correlation between leaves falling off the trees and the number of coats being purchased online. While both events are happening because the weather is colder, there is no relationship between trees and coats.
For the business to effectively use machine learning to support
business strategy, you need these statistical methods to find patterns and anomalies in these data sets. With the best data available and in the right volume and the best level of cleanli- ness, it is possible to create a model by using the most appropri- ate machine learning algorithm based on the business problem being addressed. This model is only the beginning of the machine learning workflow.
By leveraging massive amounts of data, it is possible to model
data, train the data, and then begin to learn from that data in order to improve the ability to make decisions. The value of learn- ing from data means that the machine learning system is able to look at underlying patterns and anomalies that aren’t necessar- ily obvious. Are there relationships between what customers buy with the time to repair? Are there impacts of weather on sales during a period of time? Are there indications in social media data that indicate subtle changes in customer perceptions or buying patterns? Being able to model massive amounts of data from dif- ferent data sources can add insights that no single human could have understood by simply relying on data available in isolation.
There has been much discussion about correlation of data as an
analytic method. While data correlation is incredibly important, it can sometimes be misleading. There may seem to be a correlation between the consumption of orange juice in June and the rise in