Professional Documents
Culture Documents
In star schema, The fact tables While in snowflake schema, The fact tables,
and the dimension tables are dimension tables as well as sub dimension
1. contained. tables are contained.
It takes less time for the While it takes more time than star schema
4. execution of queries. for the execution of queries.
10. It has high data redundancy. While it has low data redundancy.
3.
3. Mention what is the responsibility of a Data analyst?
->Data analysts work with data to help their organizations make
better business decisions. Using techniques from a range of
disciplines, including computer programming, mathematics, and
statistics, data analysts draw conclusions from data to describe,
predict, and improve business performance. They form the core of
any analytics team and tend to be generalists versed in the methods
of mathematical and statistical analysis.
Since one of the main goals of data cleansing is to make sure that the dataset is free of
unwanted observations, this is classified as the first step to data cleaning. Unwanted
observations in a dataset are of 2 types, namely; the duplicates and irrelevances.
Duplicate Observations
A data is said to be a duplicate if it is repeated in a dataset, with it having more than one
occurrence. This usually arises when the dataset is created as a result of combining
data from two or more sources.
This can also occur in some other cases, including when a respondent makes more
than one submission to a survey or error during data entry.
Irrelevant Observations
Irrelevant observations are those that don’t actually fit the specific problem that you’re
trying to solve. Like having the price when you are only dealing with quantity.
For example, if you were building a model for prices of apartments in an estate, you
don’t need data showing the number of occupants of each house. Irrelevant
observations mostly occur when data is generated by scraping from another data
source.
After removing unwanted observations, the next thing to do is to make sure that the
wanted observations are well-structured. Structural errors may occur during data
transfer due to a slight human mistake or incompetency of the data entry personnel.
Some of the things one should look out for when fixing data structure include;
typographical errors, grammatical blunders, and so on. The data structure is mostly
concerned with categorical data.
Here, we correct misspelled words and summarize category headings that are too long.
This is very important because long category headings may not be fully shown on the
graph.
Data Security - You have to ensure that only the party who is entitled to
view the data has the required permission and access to it. You don’t want
sales data to published across or for that matter any irrelevant access to
anybody within or outside the organization.
Technical challenges - Many times a data analyst does not have enough
access to the data to work with. It is really challenging to work with the data
engineering team or the DB owner who are sometimes not easy to work with
as there are several justifications required for getting the required access.
And many other technical challenges for that sake.
Visual Changes - This has happened few times with me. There are certain
color palettes that are specific to your company or brand. You may have
created a great looking dashboard or graphs but sometimes you will have to
tone it down so that it aligns with the existing template and is already
familiar. Yes, sometimes other will tell you what colors to use that all.
6. List of some best tools that can be useful for data-analysis?
K-NN is a lazy learner while K-Means is an eager learner. An eager learner has
a model fitting that means a training step but a lazy learner does not have a
training phase.
K-NN performs much better if all of the data have the same scale but this is not
true for K-means.
Requirement Analysis
Test Planning
Test Design
Test Environment Setup
Test Execution
Test Closure
Q 7) What would you do if you have a large suit to execute in a very less
time?
In that case, we should prioritize the test case at first instance and execute the high priority
test cases first and then move on to the lower priority ones. This way we can make sure
that the important aspects of the software are tested.
Alternatively, we may also seek customer preference that which is the most important
functions of the software according to them, and we should start testing from those areas
and then gradually move to those areas which are of less importance.
Q 9) Suppose you find a bug in production, how would you make sure that
the same bug is not introduced again?
The best way is to immediately write a test case for the production defect and include it
in the regression suite. Also, many a time we can also think of alternate test cases or
similar kinds of test cases and include them in our planned execution .
Q 12) How would you ensure that your testing is complete and has good
coverage?
Requirements traceability matrix and Test coverage matrices will help us to determine
that our test cases have good coverage. Requirement traceability matrices will help us
to determine that the test conditions are enough so that all the requirements are
covered. Coverage matrices will help us to determine that the test cases are enough to
satisfy all the identified test conditions in RTM.
Q 13) What are the different artifacts you refer when you write the test
cases?
The main artifacts used are:
Q 18) In case you have any doubts regarding your project, how do you
approach?
In case of any doubts, first, try to get it clear by reading the available application help.
In case of doubts still persisting, ask your supervisor or the senior member of your
team.
Q 20) How do you determine which piece of software require how much
testing?
We can know this factor by finding out the Cyclomatic Complexity.
The technique helps to identify the below 3 questions for the programs/features: