You are on page 1of 18

TALK TO YOUR DATA WITH AMAZON QUICKSIGHT Q :

N L Q O N O LY M P I C G A M E

Speakers
Ying Wang, Senior Data Architect, AWS, Global Specialty Practice
Nov 27, 2021

- Amazon Confidential -
AGENDA

• Demo: throw the question to Q!


• How to look for public dataset?
• How to clean the data?
• UI tool
• Python
• SQL
• How to link the datasets with questions?
• How to build the NLQ topic?
• Q model introduction
• Q&A
- Amazon Confidential -

Q DEMO
HOW TO LOOK FOR PUBLIC DATASET?
https://www.kaggle.com/datasets
https://data.world/datasets/olympics
https://www.reddit.com/r/datasets/comments/oqrg5u/dataset_of_to
kyo_2020_2021_olympics/
https://github.com/Vinay-gupta9/2021-Tokyo-Olympics-Medals

Type your request and then get the appropriate datasets!


HOW TO CLEAN THE DATA?

source

https://aws.amazon.com/glue/features/databrew/
https://pandas.pydata.org/
https://numpy.org/
https://www.w3schools.com/sql/
HOW TO LINK YOUR DATASETS WITH QUESTIONS?

source

Tips and tricks:


Understand the business domain knowledge
Understand the raw data
Understand your user source
- Amazon Confidential -

Q DEMO AGAIN
WHAT IS AMAZON QUICKSIGHT Q?

Ask natural language questions about your data


and get answers in seconds

Type your questions and get instant


answers
CHALLENGES

Hi Dan. I’m trying to compare sales trend in California vs New York


this year, but I couldn’t find it in the dashboard. Can you help pull
this number for me real quick?

Hey Amy, how’s it going. Can you cut a ticket please and we’ll add it
to
our backlog.

I kind of need it urgently for a quarter planning meeting


tomorrow. When do you think you can get to it?

I’m in the middle of a sprint to onboard another team. Also have a


long
backlog of ad-hoc request. Probably towards end of next week.
CHALLENGES

“How can we help our “How do we enable our business users to


business users get to self-serve so that our team is not
the answer faster?” drowned on the ad-hoc request?”

Takes days Thinly Staffed


or weeks BI Teams
KEY BENEFITS

Natural Get Instant Ask about Get


Language Answers all of your started in
Questions data minutes
Get answers instantly
Ask questions in plan to ad-hoc questions No need to specify Get started with
business language. No not found in your a particular dataset your existing
need to learn any dashboards. or dashboard to datasets with just a
syntax. ask question. few clicks.
HARD PROBLEMS WE ARE TACKLING

Query Understanding Data


Understanding
Daily_rpt wk_rpt_w_goals Daily_rpt_customer
“Show me weekly revenue week over snapshot_date snapshot_date_wk snapshot_date
state rev_gross state
week for California compared to rev_gross rev_net rev_gross
rev_net goal_net rev_net
York New
product_id product_id account_id
in 2020” product_name product_name account_name
rpt_date rpt_date rpt_date

There’s infinite different ways Data is cryptic, dirty and overlapping. How
you can ask this same do we identify the right data to query?
question!
NAMED ENTITY RECOGNITION (NER)

• NER Identifies the entities in


the question. Show me the weekly sales for thelast 2 years forA mazon Web
Services
• Entities represent the users intent and
are used by downstream ML models
to link to the data.
Date aggregation Date filter
• NER need to chunk entities correctly
(e.g., Amazon Web Services, 3
words, represents a single entity.)
Show me the weekly sales for the
last 2 years for
Amazon Web
• NER is schema aware, meaning it Services
uses the topic’s dataset schema to
inform what should be identified as
an entity.
Metric Cell Value Filter
• NER is cell-value aware, meaning that
it uses the topic’s dataset cell values to
inform what should be identified as an
entity.
NAMED ENTITY LINKING (NEL)

• NEL links the entities in the Order Date Order Date


question to fields or values
(deduped across datasets)

• Linking is performed based on the Show me the weekly sales for thelast 2 years forAmazon Web
top 10 results returned by the Services
index
• Entities can be linked to a field or
a cell value Revenue
Customer = Amazon Web Services
Q TOPIC CREATION BEST PRACTICES
5 STEPS FOR A GREAT Q TOPIC

1. Name your new topic and add a data set

2. Review and edit field settings

3. Start asking questions

4. Share your Topic

5. Continuously review questions and


feedback
Q&
A
THANK YOU

BTW:
DATA IS BEAUTIFUL
DATA SCIENCE IS FUN
DATA SCIENTIST IS COOL

You might also like