Professional Documents
Culture Documents
Findings summary
DATE WHO
Nov 2020 Veronika Monohan
Matthew Hicks
Santosh Balasubramanian
Objective
Identify the main pain points in the process of building a cloud-
based data analytics solution and reveal opportunities for
introducing optimizations and automated actions in the spirit of
"intelligent workspaces“ which would allow customers to benefit
from intelligent detection and actions regarding solution
Background patterns, opportunities, problems, and recommendations.
Methodology
Hybrid. Exploratory interviews and concept testing.
Participants
10 external data professionals who have experience in building
cloud-based data analytics solutions for their work including:
• Designing the solution;
• Configuring and deploying the solution;
• Optimizing the cost and security aspects of the solution;
• Managing the access and resources used by the solution.
©©
Microsoft Corporation
Microsoft Corporation
Azure
Azure
Main challenges
© Microsoft Corporation
Azure
Main challenges
© Microsoft Corporation
Azure
Main challenges
© Microsoft Corporation
Azure
Main challenges
• Exploring data and trying to find patterns in new datasets that have not
been used before, especially when they are large is amongst the main
challenges because it takes time and has a high level of ambiguity. For P4
it takes staring at the data from different views, defining something
which is certain and exploring it in a narrow focus to identify initial
patterns.
Data related challenges • Detecting changes in the source data coming from the business is tricky
and requires P10 to run hourly schema integrity checks as part of their
Based on participants’ current projects
monitoring.
• P8 wants the ability to transform the data while migrating it by swapping
some of the old tables with new. Currently, he has to take three separate
steps – load the data to Snowflake, use SQL to conduct transformations ,
and then orchestrate it in Ruby. He wishes to have the ability to blend the
first two steps.
• Main data related challenge for P7 is the extra time he needs to spend to
verify the results of data transformations in Databricks which take
significantly longer compared to SQL. (E.g., getting results in SQL takes 3-
5 seconds and in Databricks, 25-30 seconds.)
© Microsoft Corporation
Azure
Main challenges
© Microsoft Corporation
Azure
Most expensive and time-consuming
aspects
0 1 2 3 4 5 6 7 8 9 10
Compute 9
DE, DS, DA, SA
Most expensive aspects
Infrastructure and services 8
When asked what are the most expensive DE, DS, DA, SA
aspects of their solution, participants
named compute, infrastructure and
services cost and developers’ Developer time and salary DS, DE, SA 5
reimbursement. These were their top-of-
mind unaided responses.
Storage DE 1
© Microsoft Corporation
Azure
Time
© Microsoft Corporation
Azure
Specific behaviors and gaps related to
monitoring, data, cost & access control
Behavior patterns
5/10 use at least one dedicated monitoring tool, e.g., Sentry, Airflow,
Monitoring SnowPlow or CloudWatch. (DS,DA,DE)
6/10 have a dashboard as part of their monitoring solution. (DS,DA,DE)
The main success criteria for monitoring is
knowing about the failures and being able to 5/10 have setup additional alerts, e.g., email or slack message, and like to get
fix them before your stake holders start notified when there is some anomaly so they can take quick actions to fix it if
complaining. needed. (Mostly DE, but also DS and DA)
2/10 find it important to have the ability to manually setup thresholds for the
monitoring alerts. (DS, SA)
Gaps
Monitoring is harder for serverless than provisioned compute, because the
jobs run for longer time.
When looking at the job, P8 wants to see “the separate executions for the job
instead of only seeing a green dot” once the job has run.
© Microsoft Corporation
Azure
Cost control
Most of the participants said that it’s tricky to predict the cost, because it is
hard to estimate the exact inputs your solution needs. Therefore, most of
them apply a trial-and-error approach.
“If there is an unexpected cost, there isn’t much you can do. You can only post-
Cost control fix it”. – P1
Cost control strategies
The main success criteria for cost control Setting spending caps is one of the most common cost control strategies
is when the incurred cost matches the mentioned by 5 participants. While it is not ideal solution for them, it helps
initial estimate. them prevent unexpectedly high bills.
Others try to estimate the cost of the different components of the solution
by using online calculators provided by the cloud platforms, and then
monitor on a daily or weekly basis the actual cost. This strategy was
mentioned by participants with more flexible budgets.
Only one participant shared that he uses a tool called Cloudability which
shows the AWS cost broken down by different resource levels and doesn’t
need to setup spending limits. (P10)
© Microsoft Corporation
Azure
Cost control
Most participants said that their organization is not currently tracking the cost
incurred by separate users or processes. Some said that they try to get project
Cost control level estimates instead.
Only one reported that his company’s DevOps team is trying to track the cost
by service and API call. Yet, the way to achieve it is cumbersome, because it
6/10 participants were involved in cost requires the developers to enter many tags which is very time-consuming.
control for the project. (P3)
“The biggest annoyance is that I have to enter up to 10 or 15 different tags that
are basically the same and I have to enter them in like as many as like 10-20
different spots where technically it's the same service, so it could be easier”. –
P3
© Microsoft Corporation
Azure
Access
Manual way
of setting up
access; 6
0 1 2 3 4 5 6 7
© Microsoft Corporation
Azure
Access
© Microsoft Corporation
Azure
Data
© Microsoft Corporation
Azure
The ideal solution
• 4/10 want to remove the setup process and get started with their actual
data related work. The current process has too many steps such as
manually setting up the server endpoints, storage, VM-s and instances
Ideal solution which takes several days. Ideally, participants would like to have an
automated one-click deployment with perfect security around it.
Simplified setup and deployment process
“I’m not getting paid for building the infrastructure.” – P7
• In addition to that they want to have a template, so they can automate
the deployment of different environments in a CICD process, so they
don’t have to do the same setup all over again. They might have
between 3-4 and 20-30 environments. In addition, using the template is
the best way to replicate the solution.
© Microsoft Corporation
Azure
Magic wand
4/10 want to skip the process of cleaning and transforming the data to the
right format so they can jump straight into the part of the work they enjoy
most which is analyzing it. The most common reasons for transforming the
data are:
Ideal solution • There is corrupt data which needs to be removed because it causes
processes to fail. (P10)
Getting clean, reliable data • The data comes in a wrong/different format and needs to be
converted/standardized. (P3, P4, P8)
• Dealing with upstream dependencies. (P3)
“If I can just get to a point where I can start plotting things, analyzing things,
making predictions, looking for anomalies, looking for trending data, making
forecasts… That part, at least for me, that’s the part I enjoy. However,
everything leading up to that it feels like routine, automatable work that still
ends up consuming an inordinate amount of time for me”. – P3
© Microsoft Corporation
Azure
Magic wand
© Microsoft Corporation
Azure
Optimizations
© Microsoft Corporation
Azure
Optimizations
P6 wants the ability to provide his business users with self-serve reports
where they can drag, drop and create their own insights. This would
save time because the business users will not have to reach out to the
Optimizations developers to get reports.
Self-serve reports for business “End-2-end self-serve reports” – P6
stakeholders
© Microsoft Corporation
Azure
Optimizations
P10 wants decoupling of compute and storage because they have hit
Optimizations the storage limit of the current cluster, they use but are far from the
compute capacity. Having the option to select storage and compute
Decoupling of compute and storage. separately would potentially improve the cost-efficiency of their project.
Decoupling of reads and writes
He also wants to decouple the reads and writes. The reason is that when
they process new writes, their users normal read processes get
disrupted.
© Microsoft Corporation
Azure
Optimizations
© Microsoft Corporation
Azure
Recommendations
• About cost savings – queries that are rarely used or how to optimize queries. (3)
“We've noticed no one ever queries this data that's being loaded. Do you need to load
it? If not, then you can like turn it off and you don’t need to pay for it” – P5
Recommendations • System recommendations about better resource management based on the actual
usage in terms of storage and resources. (P1, P2)
• Recommendation about the most appropriate storage or folder structure based on the
data and the features they have, e.g., buckets or blob type storage. (P1)
• Tips about problematic areas in the solution, e.g., code which can cause it to break.
(P2)
• Recommendations about security concerns or how to improve security. (P2, P7)
• Data cleansing and prep. (P3)
• How to improve the query performance, e.g., in terms of structuring the queries. (P5,
P6, P7)
• Reports which can be reused (P6)
© Microsoft Corporation
Azure
Recommendations
• Exploratory data analysis takes a lot of time for new data - scatterplots and stats, how
many values are 0, nice to get a solid view of basic stats so he can know very quickly
when a data field is empty too often, edge cases, outliers;
Recommendations • Reuse the existing results.
• Automatic alerts about inbuilt analytics in email shape about the system performance
in case of high throughput.
• What tables are not used, so he can remove or offload them. This way he can save
time and cost and improve the performance. (P10)
• Switching from one service to another which would save cost.
• How to optimize code that might cause breakdowns.
© Microsoft Corporation
Azure
Automatic actions
© Microsoft Corporation
Azure
Intelligent features
© Microsoft Corporation
Azure
Features budget
© Microsoft Corporation
Azure
Features budget
© Microsoft Corporation
Azure
Automatic alerting
9/10 participants would like to have automatic alerting as part of their solution
Participants found that automatic alerting would bring them the following
benefits:
• Spot anomalies in data or query performance.
Automatic alerting •
•
Spot changes in the incoming data.
Save troubleshooting time.
• Increase the availability and improve their customer’s satisfaction.
Configure alerts for a range of
automatically-detected conditions, such "I'm ranking these (features) based on how I feel they would contribute to the
as changes, trends, or anomalies in your relationship we have with our customers. So, this would help us build our trust
data, code, resources, or user activity. most.“ – P8
This feature would resolve some of the main pain points for P8 and P10.
“This would be fantastic, because it covers what I mentioned before - a recurring
issue that happens several times a week and takes several hours to fix, caused by
significant changes in the data.” – P8
“My bread and butter.” – P10
His team built a whole framework for alerting that covers changes, trends and
anomalies. Their custom solution doesn’t cover code, resources or user activity,
which also would be very useful.
© Microsoft Corporation
Azure
Automatic alerting
The main concern about this feature is that it doesn’t alert them too much. Ideally,
they would like to be able to setup thresholds. (3/9)
Automatic alerting “Great, but careful, it might give lots of false positives, nice to have a framework to
this but let the devs select the thresholds.” – P3
Configure alerts for a range of
automatically-detected conditions, such
as changes, trends, or anomalies in your
data, code, resources, or user activity.
© Microsoft Corporation
Azure
Result caching
Appealing to: Data engineers, solution architects, SQL & Spark users
© Microsoft Corporation
Azure
Result caching
Appealing to: Data engineers, solution architects, SQL & Spark users
4/9 said they think this type of feature already exists in some other
services/providers
“Fantastic, but most backends already have this, absolutely essential feature.” –
Intelligent result caching P3
“It’s currently available in Redshift, not sure it works the same way, but R caches
If you have recurring pipelines, jobs, or the computation for recurring queries, the first time running a query always takes
queries in your analytics solution, you much longer than later.” – P10
can save time and money by having the
system automatically reuse the results
of any redundant computations that are
run in each recurrence. The system
caches partial computations for reuse in
future iterations of recurring jobs /
pipelines.
© Microsoft Corporation
Azure
Data profiling
Most appealing to: Data scientists, data analysts, solution architects & Spark users
Data profiling According to participants, the feature will increase their efficiency and
reduce the cost. Some said that knowing more about their data skew,
size etc., will help them setup their schemas and tune their queries
Get insights into the data you are proactively.
exploring, ingesting, or using in your P3 specified that this is his “number one” feature from all, because he
analytics solution, such as data skew, trusts that it can be easily automated, and currently he wastes lots of
size, format, and schema. time on exploring his data.
P4 described this features as one of his dream solutions: “This is what I
was talking about - basic exploratory stats, I want that.”
To make this feature even better, participants want to see also info about
data volume and what data is missing (rows, columns).
© Microsoft Corporation
Azure
Data profiling
Most appealing to: Data scientists, data analysts, solution architects & Spark users
© Microsoft Corporation
Azure
Central monitoring
© Microsoft Corporation
Azure
Central monitoring
4/10 did not find that their work will benefit from central monitoring. According to
P10 alerts are more helpful for day-to-day management, but higher level
Central solution monitoring and managers might find this type of monitoring helpful.
“Not sure why we need a central solution - each individual team can monitor
exploration their solution. Why I need to have a central place, in my project? I can check the
analytics of my resources, what's the point of having a central place? I’m not
A central UI/experience to monitor all concerned how another team is using resources”. – P6
elements of the analytics solution,
allowing you to observe and explore “Not too bad but I don’t want it.” – P9
the solution and its status from high to “Good BUT more relevant for higher management level than day to day
low altitudes; navigate between management level. Day to day - alert when something fails or an issue happens.”
compute resources, activity history,
– P10
and code artifacts based on how they
are related and used in the solution.
© Microsoft Corporation
Azure
Predictions
7/10 participants found it useful but were skeptical that it can actually work.
Predictions, They can trust it for components that the system can automatically keep track of such
as computational failures, disk size or memory problems, as well as prediction on data
recommendations, and size which will be accumulated over time and query performance.
optimizations “Really good, because I have to do this manually, this would be very helpful and save
time.” – P7
Receive contextual guidance,
predictions, and recommendations to “If it's in terms of queries - it would be helpful, if it is on a table level - not sure if it would
help you build your solution, predict be useful - not sure how the guidance would work here” – P6
computational failures, or optimize
data based on certain
conditions/indicators
© Microsoft Corporation
Azure
Predictions
However, 4/10 participants were “skeptical” and had mixed feelings about the solution
architecture recommendations, because the usage of data is very different across
companies, and they were not sure how
Predictions,
“Very skeptical of recommendations on building solutions, but predicting failures and …
recommendations, and could be really helpful” – P4
“This would cost us more, I wouldn't like to increase the cost because of it;” – P9
optimizations
"Sounds like science fiction, but I would love to have something like this!
Receive contextual guidance, It would speed up every project. In the beginning we usually have freeze time when
predictions, and recommendations to trying to identify the potential architecture of the solution based on the problem at hand.
help you build your solution, predict This can not only save time but uncover something I didn't think about. Mostly useful for
computational failures, or optimize a new project.” – P10
data based on certain
conditions/indicators One participant said this feature is similar to a feature which currently exists in
BigQuery.
© Microsoft Corporation
Azure
Thanks!
© Copyright Microsoft
© Microsoft Corporation. All rights reserved.
Corporation Azure
Compute
© Microsoft Corporation
Azure