You are on page 1of 9



Homework 4 (Introduce A Big Data / Iot/ Fintech Application)

Big Data Analytics
Instructor: - Ting-Ying Chien

Created by:



1. Introduction
Why do Machine Learning on Big Data?

Nowadays, traditional analytics tools are not well well-matched to capturing the
full value of big data. It is because the volume of data is too large for comprehensive
analysis, and the range of potential correlations and relationships between disparate
data sources are too great for any analyst to test all hypotheses and derive all the value
buried in the data.

Basic analytical methods used in business intelligence and enterprise reporting

tools reduce to reporting sums, counts, simple averages and running SQL queries.
Online analytical processing is merely a systematized extension of these basic analytics
that still rely on a human to direct activities specify what should be calculated.

Machine learning is ideal for exploiting the opportunities hidden in big data. It
delivers on the promise of extracting value from big and disparate data sources with far
less reliance on human direction. It is data driven and runs at machine scale. It is well
suited to the complexity of dealing with disparate data sources and the huge variety of
variables and amounts of data involved. And unlike traditional analysis, machine
learning thrives on growing datasets. The more data fed into a machine learning system,
the more it can learn and apply the results to higher quality insights.

Freed from the limitations of human scale thinking and analysis, machine
learning can discover and display the patterns buried in the data.

What is Skytree?

Skytree is a California-based software company that develops and publishes a

machine learning platform for advanced analytics. The company offers a platform that
gives organizations the power to discover analytic insights, predict future trends, make
recommendations, and reveal untapped markets and customers.

The leader in enterprise-grade machine learning, Skytree installs on your

existing infrastructure, uses the entire data set and utilizes high performance algorithms
and automated model building to deliver more accurate predictive models in less time.
Why Skytree?

Daily, data scientists and analysts play a crucial role in driving optimal
executive-level decisions. Yet even the best prepared data-driven organizations are
challenged by the growing volume, velocity and variety of data. As the expectations of
analytics teams grows, the resources of infrastructure, time and PhD-level staff become
more limited. Skytree addresses these limitations by making more accurate predictive
machine learning models easier and faster to produce and use.

There are some key capabilities of Skytree

1. Highly Scalable Algorithms

Skytree speeds up machine learning methods by up to 150x compared to open

source options. Employing deeply optimized algorithms, Skytree performs
analytics in-memory and utilizes the latest high-performance computing techniques.
By taking fewer mathematical steps to achieve the same result, Skytree has proven
to be the fastest machine learning software on the market.
2. Artificial Intelligence for Data Scientists
Even citizen data scientists can build accurate machine learning models with
Skytrees ground breaking AutoModel technology. Using patent-pending global
optimization analytics to automate algorithm and parameter selection, Skytree saves
weeks or months of effort. Instead of manually running hundreds of experiments to
determine the best algorithm and parameters, Skytree does it with one click.

3. Self-Documenting Models

Skytree allows data scientists to visualize and understand the logic behind ML
decisions. Skytree provides visual documentation, which logs every data set used,
data split done, transformation applied, algorithm run, and results obtained for each
model built with Skytree.

4. Model Interpretability
Results are easy to explain and justify to peers, management and regulators with
Skytrees unparalleled interpretation tools. By gaining visibility into the logic
behind the machine learning results including variable importance data scientists
can better understand and reproduce the decisions of the automated modeling.

5. End-to-End Platform
Skytree isnt just another toolset or set of machine learning libraries; its an end-
to-end enterprise platform for machine learning on big data. Our software is
designed to solve robust predictive problems with data preparation capabilities,
advanced machine learning algorithms and options to build and deploy models in a
variety of formats.
6. Programmatic and GUI Access
Administrators, senior data scientists and citizen data scientists alike can access
Skytree via the easy-to-adopt GUI or programmatically in Java, Python or the
Skytree Command Line Interface. Because models are self-documenting, theres a
full audit trail that logs every dataset used, data split done, transformation
performed, algorithm run and result obtained.

2. Applications

Skytree offers its machine learning enhanced data science solution on premise
or as a cloud deployment. Its software, Skytree Infinity, help organizations mine Big
Data, but tens of thousands of times faster than the regular methods. One prominent
example of Skytree implementation in industry is the Telecommunications business
sector, where the collected data from the variety of resources and shapes such as call
records, customer service logs, and geospatial and weather data place a significant
challenge for servise providers to analyze to acquire valuable insights.

Table 1 lists a set of measures used to evaluate the User Value of Skytrees
machine learning solution. Considering the complexity associated with understanding
and applying machine learning algorithm in a real industrial application, Skytrees
Infinity provides relatively easy to use scalable environment to uncover important
insights from Big Data, which reflects positive users experience.

Fast Learning Medium

User Interface Positive
User Experience Positive
Process impact Low
User Feedback Positive
Wow effect Medium

Table 1. User value indicators for Skytree

3. Technologies and Product
Skytree has machine learning methods that include: random decision
forests, kernel density estimation, K-means, singular value decomposition, gradient
boosting, decision tree, 2-point correlation, range searching, K-nearest neighbors
algorithm, linear regression, support vector machine, and logistic regression
Skytree Server software operates in Linux on a single server computer or multi
node cluster, and is intended for use by modelers for development of machine learning
models, and for production deployments (in real time or batch usage). It is designed to
connect with existing IT infrastructure. It can be configured to accept data streams and
compute results from multiple sources. The resulting analytics are returned through the
same channels.
Standard data sources include both structured and unstructured data from:[6]

Relational Databases (RDBMS)

Hadoop Systems (HDFS)
Flat File Databases (e.g. CSV)

What is Skytree for?

1. Skytree for Financial Services

Financial Services are no stranger to massive data sets and robust analytics.
With a well-established history of harnessing sophisticated computational capabilities
to manage efforts such as trading and risk management, its critically important to
staying competitive that your analytics deliver faster results with higher accuracy.

2. Skytree for Government

(State-of-the-Art Machine Learning Software for Mission Critical Results)
Government agencies are tasked with the challenge of providing citizens with
more efficient, effective, and transparent services with strict and often decreasing
budgets. Government agencies can use machine learning to increase operational
efficiencies by analyzing datasets, finding patterns and anomalies, and making
predictions about future events.
Skytrees state-of-the art machine learning software can analyze both structured
and unstructured data sets in real-time to produce fast, accurate and scalable results that
are up to 10,000 times faster than previous approaches. Skytree can help government
a. Detect and prevent fraudulent transactions, accounts and vendors
b. Identify anomalies or signatures to address proliferation, terrorism, money
laundering, counterfeit devices, threats and other criminal activity

Examples of high value analytics use cases

3. Skytree for Asset Intensive Industries

Machine learning can increase the value and operational efficiencies of complex
equipment, proactively detect potential failures and increase operational readiness of
complex systems. Skytree Machine Learning delivers advanced predictive and
prescriptive maintenance analytics that:
Detect potential failure of equipment through analysis of sensors and operating
Analyze underlying causes of parts failure
Identify defects in the manufacturing process
Develop efficient preventive maintenance schedules

Examples of High Value Analytics Use Cases

4. Skytree for Retail/eCommerce

Skytree provides a powerful, proven, repeatable solution to help retailers find

new customers, upsell existing customers and prevent customer churn. Many retailers
struggle to address churn, understand the causes of drop-off and identify the indicators
that cause it.

Examples of High Value Analytics Use Cases

4. Use Cases
Companies large and small are using Skytree to improve outcomes at the
financial core of their industry risk estimation in insurance, credit/risk scoring in
banking and diagnoses in healthcare, for example. But across all industries, decision
makers face the kinds of problems that machine learning excels at solving, problems
like churn, fraud and prescriptive maintenance. There are four parts of this;

a. Fraud Detection
b. Recommendation
c. Failure Prediction & Prescriptive maintenance
d. Churn Mitigation & Purchase Prediction
For instance, below is business challenge, skytree solution and business benefit on
failure prediction & prescriptive maintenance for battery manufacturer.
Business challenge
Power systems company ships smart batteries with built-in diagnostic tools
Batteries are replaced when they fail, resulting in interruption and downtime
Skytree solution
Utilized diagnostic measurements such as impedance, voltage and temperature
to develop a prediction system for impending failure
Business benefit
Unexpected, costly downtime avoided by servicing before failure
Improved customer satisfaction by preventing failure cases

5. Conclusion
Skytree is a California-based software company that develops and publishes a
machine learning platform for advanced analytics. Skytree enables data scientists to
build more accurate models faster, make the leading enterprise-grade machine-learning
platform for big data and uses the full dataset and automated modeling using artificial
Skytree also satisfies the specific requirements of discerning data scientists and
IT organizations.


Skytree. CrunchBase. CrunchBase. Retrieved 12 July 2017.

Wikipedia Skytree, Inc,_Inc, Retrieved 12

July 2017.

Vincenzo Morabito The Future of Digital Business Innovation: Trends and

Practices, ISBN 978-3-319-26873-6, ISBN 978-3-319-26874-2 (e-book) DOI