Data Science Interview Best

1) Mention what is the responsibility of a Data analyst?
Responsibility of a Data analyst include,
Provide support to all data analysis and coordinate with customers and staffs
Resolve business associated issues for clients and performing audit on data
Analyze results and interpret data using statistical techniques and provide ongoing reports
Prioritize business needs and work closely with management and information needs
Identify new process or areas for improvement opportunities
Analyze, identify and interpret trends or patterns in complex data sets
Acquire data from primary or secondary data sources and maintain databases/data systems
Filter and “clean” data, and review computer reports
Determine performance indicators to locate and correct code problems
Securing database by developing access system by determining user level of access
2) What is required to become a data analyst?
To become a data analyst,
Robust knowledge on reporting packages (Business Objects), programming language (XML, Javascript, or
ETL frameworks), databases (SQL, SQLite, etc.)
Strong skills with the ability to analyze, organize, collect and disseminate big data with accuracy
Technical knowledge in database design, data models, data mining and segmentation techniques
Strong knowledge on statistical packages for analyzing large datasets (SAS, Excel, SPSS, etc.)
3) Mention what are the various steps in an analytics project?
Various steps in an analytics project include
Problem definition
Data exploration
Data preparation
Modelling
Validation of data
Implementation and tracking
4) Mention what is data cleansing?
Data cleaning also referred as data cleansing, deals with identifying and removing errors and
inconsistencies from data in order to enhance the quality of data.
5) List out some of the best practices for data cleaning?
Some of the best practices for data cleaning includes,
Sort data by different attributes
For large datasets cleanse it stepwise and improve the data with each step until you achieve a good data
quality
For large datasets, break them into small data. Working with less data will increase your iteration speed
To handle common cleansing task create a set of utility functions/tools/scripts. It might include,
remapping values based on a CSV file or SQL database or, regex search-and-replace, blanking out all
values that don’t match a regex
If you have an issue with data cleanliness, arrange them by estimated frequency and attack the most
common problems
Analyze the summary statistics for each column ( standard deviation, mean, number of missing values,)
Keep track of every date cleaning operation, so you can alter changes or remove operations if required
ID-100353945
6) Explain what is logistic regression?
Logistic regression is a statistical method for examining a dataset in which there are one or more
independent variables that defines an outcome.
7) List of some best tools that can be useful for data-analysis?

Tableau
RapidMiner
OpenRefine
KNIME
Google Search Operators
Solver
NodeXL
io
Wolfram Alpha’s
Google Fusion tables
8) Mention what is the difference between data mining and data profiling?
The difference between data mining and data profiling is that
Data profiling: It targets on the instance analysis of individual attributes. It gives information on various
attributes like value range, discrete value and their frequency, occurrence of null values, data type,
length, etc.
Data mining: It focuses on cluster analysis, detection of unusual records, dependencies, sequence
discovery, relation holding between several attributes, etc.
9) List out some common problems faced by data analyst?
Some of the common problems faced by data analyst are
Common misspelling
Duplicate entries
Missing values
Illegal values
Varying value representations
Identifying overlapping data

10) Mention the name of the framework developed by Apache for processing large data set for an
application in a distributed computing environment?
Hadoop and MapReduce is the programming framework developed by Apache for processing large data
set for an application in a distributed computing environment.
11) Mention what are the missing patterns that are generally observed?
The missing patterns that are generally observed are
Missing completely at random

Missing at random
Missing that depends on the missing value itself
Missing that depends on unobserved input variable
12) Explain what is KNN imputation method?
In KNN imputation, the missing attribute values are imputed by using the attributes value that are most
similar to the attribute whose values are missing. By using a distance function, the similarity of two
attributes is determined.
13) Mention what are the data validation methods used by data analyst?
Usually, methods used by data analyst for data validation are
Data screening
Data verification
14) Explain what should be done with suspected or missing data?
Prepare a validation report that gives information of all suspected data. It should give information like
validation criteria that it failed and the date and time of occurrence
Experience personnel should examine the suspicious data to determine their acceptability
Invalid data should be assigned and replaced with a validation code
To work on missing data use the best analysis strategy like deletion method, single imputation methods,
model based methods, etc.
15) Mention how to deal the multi-source problems?
To deal the multi-source problems,

Restructuring of schemas to accomplish a schema integration
Identify similar records and merge them into single record containing all relevant attributes
without redundancy
16) Explain what is an Outlier?
The outlier is a commonly used terms by analysts referred for a value that appears far away and
diverges from an overall pattern in a sample. There are two types of Outliers
Univariate
Multivariate
19) Mention what are the key skills required for Data Analyst?
A data scientist must have the following skills
Database knowledge
Database management
Data blending
Querying
Data manipulation
Predictive Analytics
Basic descriptive statistics
Predictive modeling
Advanced analytics
Big Data Knowledge
Big data analytics
Unstructured data analysis
Machine learning
Presentation skill
Data visualization
Insight presentation
Report design
20) Explain what is collaborative filtering?
Collaborative filtering is a simple algorithm to create a recommendation system based on user

behavioral data. The most important components of collaborative filtering are users- items- interest.
23) Explain what is Map Reduce?
Map-reduce is a framework to process large data sets, splitting them into subsets, processing each
subset on a different server and then blending results obtained on each.
A good example of collaborative filtering is when you see a statement like “recommended for you” on
online shopping sites that’s pops out based on your browsing history.
21) Explain what are the tools used in Big Data?
Tools used in Big Data includes
Hadoop
Hive
Pig
Flume
Mahout
Sqoop
22) Explain what is KPI, design of experiments and 80/20 rule?
KPI: It stands for Key Performance Indicator, it is a metric that consists of any combination of
spreadsheets, reports or charts about business process
Design of experiments: It is the initial process used to split your data, sample and set up of a data for
statistical analysis
80/20 rules: It means that 80 percent of your income comes from 20 percent of your clients
24) Explain what is Clustering? What are the properties for clustering algorithms?
Clustering is a classification method that is applied to data. Clustering algorithm divides a data set into
natural groups or clusters.
Properties for clustering algorithm are
Hierarchical or flat
Iterative
Hard and soft
Disjunctive
25) What are some of the statistical methods that are useful for data-analyst?
Statistical methods that are useful for data scientist are
Bayesian method
Markov process
Spatial and cluster processes
Rank statistics, percentile, outliers detection
Imputation techniques, etc.
Simplex algorithm
Mathematical optimization
26) What is time series analysis?
Time series analysis can be done in two domains, frequency domain and the time domain. In Time
series analysis the output of a particular process can be forecast by analyzing the previous data by the
help of various methods like exponential smoothening, log-linear regression method, etc.
28) What is a hash table?
In computing, a hash table is a map of keys to values. It is a data structure used to implement an
associative array. It uses a hash function to compute an index into an array of slots, from which desired
value can be fetched.
29) What are hash table collisions? How is it avoided?
A hash table collision happens when two different keys hash to the same value. Two data cannot be
stored in the same slot in array.
To avoid hash table collision there are many techniques, here we list out two
Separate Chaining:
It uses the data structure to store multiple items that hash to the same slot.
Open addressing:
It searches for other slots using a second function and store item in first empty slot that is found
29) Explain what is imputation? List out different types of imputation techniques?
During imputation we replace missing data with substituted values. The types of imputation techniques
involve are
Single Imputation
Hot-deck imputation: A missing value is imputed from a randomly selected similar record by the help of
punch card
Cold deck imputation: It works same as hot deck imputation, but it is more advanced and selects donors
from another datasets
Mean imputation: It involves replacing missing value with the mean of that variable for all other cases
Regression imputation: It involves replacing missing value with the predicted values of a variable based
on other variables
Stochastic regression: It is same as regression imputation, but it adds the average regression variance to
regression imputation
Multiple Imputation
Unlike single imputation, multiple imputation estimates the values multiple times
30) Which imputation method is more favorable?
Although single imputation is widely used, it does not reflect the uncertainty created by missing data at
random. So, multiple imputation is more favorable then single imputation in case of data missing at
random.
32) Explain what is the criteria for a good data model?
Criteria for a good data model includes

It can be easily consumed
Large data changes in a good model should be scalable
It should provide predictable performance
A good model can adapt to changes in requirements
To fulfill the responsibilities a data analyst must possess a vast and rich skillset:
1. Degree and Domain Expertise

2. Knowledge of Programming
3. Knowledge of Data Analysis Tools
4. Understanding of Statistics and Machine Learning Algorithms
5. Knowledge of Data Visualization Tools
What are the best qualities of a data analyst?

To become a Data Analyst, you will need to:
 be good with numbers.
 have great IT skills, including being familiar with databases and query languages.
 have good analytical skills.
 be good at solving problems.
 to be able to work to deadlines.
 pay attention to details and accuracy.
 be able to work as part of a team.
Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly

formatted, duplicate, or incomplete data within a dataset. When combining multiple data
sources, there are many opportunities for data to be duplicated or mislabeled.
Top 10 Data Analytics Tools You Need To Know In 2023

 R and Python.
 Microsoft Excel.
 Tableau.
 RapidMiner.
 KNIME.
 Power BI.
 Apache Spark.
 QlikView.
Difference between Data Profiling and Data
Mining
Data profiling refers to a process of analyzing the gathered information and collecting
insights and statistics about the data. It plays a vital role for any organization since it
helps in assessing the quality of data by identifying an issue in the data set. There are
multiple methods of conducting data profiling in organizations such as mean, mode,
percentile, frequency, maxima, minima, etc. On the other hand, data mining refers to the
process of extracting useful data, patterns in the existing database. It is the process of
evaluating the existing database and transforming the raw data into useful information.
Read on the article to know the difference between data profiling and data mining.
What is Data Profiling?

Data profiling is also known as data archaeology. It is a process of evaluating data from
an existing source and analyzing and summarizing useful information about that data.
The primary task of data profiling is to identify issues like incorrect values, anomalies,
and missing values in the initial phases of data analysis. It can be done for many
reasons, but the most common part of data profiling is to find the quality of data as a
component of a huge project. Data profiling is linked with ETL (Extract, Transform, and
Load) process to transfer data from one system to another.
Data profiling Techniques

There are three different techniques of data profiling
o Structure discovery
o Content discovery
o Relationship discovery
Structure discovery
Play Videox
In structure discovery, the structural identity of the database should be maintained

properly. For example, in any organization, consider employees' attendance sheet; the
name column can not have numbers, and the phone number columns should have a
fixed number of digits. It helps the management team to maintain the accuracy and
consistency of the data.
Content discovery
Content discovery refers to the detailed analysis of structural discovery. It specifically

focuses on individual elements for null, ambiguous and redundant data.
Relationship discovery
Relationship discovery establishes the relationship between various identities. It finds

the key relationships and reduces the data overlaps.
Methods of data profiling

Data profiling can be performed in various ways; these are some methods that can be
used.
Cross Profiling
It counts how many times every value appears within each column in a table. It helps to
discover the trends and patterns within the data.
Cross column
The primary purpose of this method is to look across the column to perform key and
dependency analysis. Key analysis scans the total values in a table to place a potential
primary key. Dependency analysis finds the relationships within the sets of data. Both
these analyses find the relationships and dependencies within a table.
Cross table profiling
Cross table profiling looks across tables to identify the potential foreign keys. It helps
find the differences and similarities in syntax and data types between tables to
determine which data might be redundant and which could be mapped together.
What is Data Mining?

Data mining refers to a process used by various organizations to transform raw data into
useful information. Many organizations use software to discover data, trends, and
patterns in a huge amount of data to understand more about customer behaviours and
develop better marketing strategies. Data mining has broad applications in various
fields, like the IT sector and science and technology. Data mining is also known as KDD
(Knowledge Discovery in Data).
What Is Data Modeling?

Data modeling is the process of creating a data model. Throughout the
process, developers will work closely with stakeholders (i.e., potential users of
the to-be-created systems). Data modeling begins with collecting information
about business requirements and rules from stakeholders. These principles
are then converted into data structures, which are used to create a concrete
database design.
Data modeling involves various methods and techniques. Here are two
approaches to building data models (a third option is a combination of both):
 Bottom-up data modeling: Bottom-up data modeling focuses on

identifying and resolving the data requirements of small groups within a
company and then integrating these individual results together to meet
the requirements of the whole company.
 Top-down data modeling: Top-down data modeling initially focuses on
the principles and the problems of the whole company. Then, the model
is broken down into smaller pieces to meet the specific requirements of
each group within a company.
Types of Data Models

There are three main data model types. These types of data models are
usually known as perspectives or stages of data modeling processes:
 Conceptual data model: A conceptual data model describes business

concepts typically used at the start of a project when high-level
information is required. It refers to a structured view of data, which
supports business processes throughout the company. Its leading
objective is to identify the entity classes, rather than the specific
characteristics of each entity.
 Logical data model: A logical data model is similar to a conceptual
model, but it is more detailed. Besides entities, it defines the data
structure — the attributes of entities — and the relationships among
them.
 Physical data model: A physical data model describes the data model
in a high level of detail. It expands upon a logical data model with
primary and foreign keys, data types, validation rules, triggers, and
procedures. This is typically used in preparation for an implementation
process.
There are three main data model types, including physical, logical, and conceptual models.
Image credit: Fintechos
Advantages of Data Models

Data models offer many benefits to a business, including:
 Improved quality: Before creating and building any applications, it's

recommended to create data models to serve as their blueprints first.
Data models encompass well-defined terms and enable alignment
among stakeholders. Moreover, data models help to reveal initial
problems before they escalate and require consideration of various
approaches — ultimately increasing application quality and
performance.
 Data source revelation: When preparing a data model, a critical step is
identifying the essential data sources to be used in future applications.
It's important to know which data will be needed and how they will be
structured and organized. Companies can decide which data is more or
less important based on different departments or business processes.
 Clear data flow within organizations: A data model enables a clear
flow of data throughout an organization. In other words, a well-designed
data model can serve as a valuable reference point.. Everyone knows
where to find the necessary data, who is responsible for the data, how
to work with the data, how the data are interconnected, and more. Data
modeling provides documentation to facilitate cross-organizational
communication.
 Cost reduction: Data models can help reveal potential problems for
future development. Mistakes and failures can be prevented in advance,
thus allowing for easy corrections without additional costs.
 Faster performance: A well-designed data model is the basis of every
database that’s part of the prepared application. The rules for converting
the data model into a database must be precisely defined to achieve the
data model's optimal performance and fast response. Because
developers know the principles they must follow, they can develop the
overall application without any significant flaws to improve its
performance.
Data Models in GoodData

GoodData's data model is broken down into a logical data model and a
physical data model (PDM). A logical data model (LDM) describes how data is
organized in a workspace, and you can define datasets, customize datasets,
and manage relationships between the datasets. A PDM specifies how the
data elements in the LDM are written to the data warehouse on a technical
level. The user's interface (UI) sits on the top of the data models. It's a
framework for building analytical applications on top of the users' data by
using multiple libraries. You can create utility programs that interact with
GoodData's platforms, create different visualizations of stored data using
visual components, embed insights, or create new features based on your
specific analytical requirements. Check our documentation to learn more
about GoodData's UI.
The GoodData LDM’s key feature is that users can interact with the LDM
without needing to know how to query the PDM — the required knowledge for
the analytics used for database technologies and SQL are removed. Only the
business intelligence engineers have to directly interact with the physical data
model. The LDM helps and assists users in answering analytical questions
without preparing datasets specific to use cases. The LDM contains entities
based on real-life objects separately, as well as the relationships among them
and their specified attributes. It ensures that different analysis results make
sense.
GoodData's Logical Data Model

As previously mentioned, there are many tools supporting data modeling.
GoodData offers different possibilities for how users can create and customize
data models:
 GoodData LDM Modeler: The GoodData LDM Modeler supports the

creation, testing, and deployment of LDMs in workspaces on the
GoodData platform. Users can define datasets, add attributes, and
develop connections through a drag-and-drop interface. To find out
more about the GoodData LDM Modeler, read our blog post on what is
a logical data model.
 Rest APIs: Rest APIs allow developers to programmatically create and
manage LDMs, such as upload an LDM; load and modify workspaces;
and create and manage metrics, insights, and dashboards. To learn
more about GoodData's Rest APIs, see the API Reference page in our
documentation.
 Gray pages: Gray pages are used to make small changes in your
workspaces. Developers can use the gray pages, a form-based
interface, to make these changes to the technical description of your
deployed workspaces. Read our documentation about Access the Gray
Pages for a Workspace to find out more about gray pages.
The writer goes on to define the four criteria of a good data model: “ (1) Data in a good model
can be easily consumed. (2) Large data changes in a good model are scalable. (3) A good
model provides predictable performance. (4)A good model can adapt to changes in
requirements, but not at the expense of 1-3.”
Time series analysis is a specific way of analyzing a sequence of data points collected
over an interval of time. In time series analysis, analysts record data points at consistent
intervals over a set period of time rather than just recording the data points intermittently or
randomly.
A PivotTable is an interactive way to quickly summarize large amounts of data. You can
use a PivotTable to analyze numerical data in detail, and answer unanticipated questions about
your data. A PivotTable is especially designed for: Querying large amounts of data in many
user-friendly ways.
32) Explain what is the criteria for a good data model?
Criteria for a good data model includes
It can be easily consumed
Large data changes in a good model should be scalable
It should provide predictable performance
A good model can adapt to changes in requirements
Cash, stocks, bonds, mutual funds, and bank deposits are all are examples of financial
assets. Unlike land, property, commodities, or other tangible physical assets, financial assets do
not necessarily have inherent physical worth or even a physical form.
Dashen Bank | Always One Step Ahead!
What are services provided by Dashen Bank?

This enables consumers to conveniently pay for their purchases, utility bill payments,
person-to-person (P2P) transfers, electronic mobile top-up, Cash-in, and Cash-out
What is the mission of Bank?

Here is a typical mission statement for a bank: “We will be the preferred provider of
targeted financial service in our communities based on strong customer
relationships. We will strengthen these relationships by providing the right solutions
that combine our technology, experience, and financial strength
What is Dashen Bank slogan?
Always One Step Ahead!

Dashen Bank - Always One Step Ahead!
Who are the stakeholders of Dashen Bank?
Dashen has four shareholders who each hold 19pc shares, including the three
brothers of the Ethio-Saudi tycoon, Sheik Mohammed Al-Amoudi, and Nyala
motors.
How many branches have Dashen Bank?

Headquartered in Addis Ababa, the Bank is one of the largest private banks in Ethiopia;
currently operating through a network of more than 500+ branches, 16 dedicated
Forex windows and 9 Forex bureaus, more than 400+ATMs and over 1400+ point-of-
sale (POS) terminals spread across the country.
what is data modeling & its stages
Data modeling is the process of diagramming data flows. When creating a new or alternate database
structure, the designer starts with a diagram of how data will flow into and out of the database. This
flow diagram is used to define the characteristics of the data formats, structures, and database handling
functions to efficiently support the data flow requirements. After the database has been built and
deployed, the data model lives on to become the documentation and justification for why the database
exists and how the data flows were designed.
What are the 3 Types of Data Models?

Conceptual data models, logical data models and physical data models make up the three types of
data model. While they require different approaches to build, each type of data model conveys the
same information, from different perspectives. Leveraging all three types of data models helps
address different stakeholders’ needs and levels of expertise.
This is because the three data models address the use of data assets from different degrees of
abstraction. The models increase in complexity – starting with conceptual, through to physical data
models. The models are used in different stages of the development process to foster the alignment
of business goals and requirements with how data resources are used.
 Conceptual data models are used to communicate business structures and concepts at a high level of
abstraction. These models are constructed without taking system constraints into account and are usually
developed by business stakeholders and data architects to define and organize the information that is
needed to develop a system.
 Logical data models are concerned with the types, attributes, and relationships of the entities that will
inhabit the system. A logical model is often created by a data architect and used by business analysts. The
goal is to develop a platform-independent representation of the entities and their relationships. This stage
of data modeling provides organizations with insight pertaining to the limitations of their current
technologies.
 Physical data models are used to define the implementation of logical data models employing a particular
database management system (DBMS). They are built with the current – or expected to be – technological
capabilities. Database developers and analysts work with physical data models to enact the ideas and
processes refined by conceptual and logical models.
“Data Governance is a system of decision rights and accountabilities for

information-related processes, executed according to agreed-upon models which
describe who can take what actions with what information, and when, under what
circumstances, using what methods.”
Data governance (DG) refers to the overall management of the availability,

usability, integrity, and security of the data employed in an enterprise.
What is data governance in simple terms?

Data governance is all about data quality and reliability – it establishes the rules,
policies, and procedures that ensure data accuracy, reliability, compliance, and security.
Master data management is another term for the concept of a centralized, single source
for enterprise data (one version of the truth).
Data Governance Examples

 Data Usability. If you want your employees to use your data, it needs to be accessible
and easy to understand. ...
 Metadata. Metadata is qualitative information that describes the other data you've
collected at your business. ...
 Data Security. ...
 Data Quality. ...
 Data Integration. ...
 Data Preservation.
Most Compelling Benefits of Big Data and Analytics

1. Customer Acquisition and Retention. ...
2. Focused and Targeted Promotions. ...
3. Potential Risks Identification. ...
4. Innovate. ...
5. Complex Supplier Networks. ...
6. Cost optimization. ...
7. Improve Efficiency.
Advantages of Big Data
1. Better Decision Making
Companies use big data in different ways to improve their B2B operations, advertising,
and communication. Many businesses including travel, real estate, finance, and
insurance are mainly using big data to improve their decision-making capabilities. Since
big data reveals more information in a usable format, businesses can utilize that data to
make accurate decisions on what consumers want or not and their behavioral
tendencies.
Big data facilitates the decision-making process by providing business intelligence and
advanced analytical insights. The more customer data a business has, the more
detailed overview it can gain about its target audience.
Data-driven insights reveal business trends and behaviors and allow companies to
expand and compete by optimizing their decision-making. Furthermore, these insights
enable businesses to create more tailored products and services, strategies, and well-
informed campaigns to compete within their industry.
2. Reduce costs of business processes
The surveys conducted by New Vantage and Syncsort (now Precisely) reveals that big
data analytics has helped businesses to reduce their expenses significantly. 66.7% of
survey respondents from New Vantage claimed that they have started using big data to
reduce expenses. Furthermore, 59.4% of survey respondents from Syncsort claimed
that big data tools helped them reduce costs and increase operational efficiency.
Do you know that Big data analytics tools like Cloud-Based Analytics and Hadoop can
help reduce costs for storing big data?
3. Fraud Detection
Financial companies, in particular, use big data to detect fraud. Data analysts
use machine learning algorithms and artificial intelligence to detect anomalies and
transaction patterns. These anomalies of transaction patterns indicate something is out
of order or a mismatch giving us clues about possible frauds.
Fraud detection is significantly important for credit unions, banks, credit card companies
to identify account information, materials, or product access. Any industry, including
finance, can better serve its customers by early identification of frauds before something
goes wrong.
For instance, credit card companies and banks can spot fraudulent purchases or stolen
credit cards using big data analytics even before the cardholder notices that something
is wrong.
4. Increased productivity
According to a survey from Syncsort, 59.9% of survey respondents have claimed that
they were using big data analytics tools like Spark and Hadoop to increase productivity.
This increase in productivity has, in turn, helped them to improve customer retention
and boost sales.
Modern big data tools help data scientists and analysts to analyze a large amount of
data efficiently, enabling them to have a quick overview of more information. This also
increases their productivity levels.
Besides, big data analytics helps data scientists and data analysts gain more

information about themselves so that they can identify how to be more productive in
their activities and job responsibilities.
Therefore, investing in big data analytics offers a competitive advantage for all
industries to stand out with increased productivity in their operations.
5. Improved customer service
Improving customer interactions is crucial for any business as a part of

their marketing efforts.
Since big data analytics provide businesses with more information, they can utilize that
data to create more targeted marketing campaigns and special, highly personalized
offers to each individual client.
The major sources of big data are social media, email transactions, customers’ CRM
(customer relationship management) systems, etc. So, it exposes a wealth of
information to businesses about their customers’ pain points, touchpoints, values, and
trends to serve their customers better.
Moreover, big data helps companies understand how their customers think and feel and
thereby offer them more personalized products and services. Offering a personalized
experience can improve customer satisfaction, enhance relationships, and, most of all,
build loyalty.
6. Increased agility
Another competitive advantage of big data is increasing business agility. Big data

analytics can help companies to become more disruptive and agile in markets.
Analyzing huge data sets related to customers enables companies to gain insights
ahead of their competitors and address the pain points of customers more efficiently
and effectively.
On top of that, having huge data sets at disposal allows companies to improve
communications, products, and services and reevaluate risks. Besides, big data helps
companies improve their business tactics and strategies, which are very helpful in
aligning their business efforts to support frequent and faster changes in the industry.
Disadvantages
1. Lack of talent
According to a survey by AtScale, the lack of big data experts and data scientists has
been the biggest challenge in this field for the past three years. Currently, many IT
professionals don’t know how to carry out big data analytics as it requires a different
skill set. Thus, finding data scientists who are also experts in big data can be
challenging.
Big data experts and data scientists are two highly paid careers in the data science
field. Therefore, hiring big data analysts can be very expensive for companies,
especially for startups. Some companies have to wait for a long time to hire the required
staff to continue their big data analytics tasks.
2. Security risks
Most of the time, companies collect sensitive information for big data analytics. Those
data need protection, and security risks can be demerits due to the lack of proper
maintenance.
Besides, having access to huge data sets can gain unwanted attention from hackers,
and your business may be a target of a potential cyber-attack. As you know, data
breaches have become the biggest threat to many companies today.
Another risk with big data is that unless you take all necessary precautions, important
information can be leaked to competitors.
3. Compliance
The need to have compliance with government legislation is also a drawback of big
data. If big data contains personal or confidential information, the company should make
sure that they follow government requirements and industry standards to store, handle,
maintain, and process that data.
So, data governance tasks, transmission, and storage will become more difficult to
manage as the big data volumes increase.
Conclusion
Nowadays, big data analytics can be crucial for any company. However, every

business should figure out whether the pros of big data analytics outweigh the cons,
particularly in their own case and situation. If they decide that advantages are greater,
they also need to find out ways to overcome the disadvantages.
WHAT IS ETL (EXTRACT, TRANSFORM,
LOAD)?
More Data Engineering Topics
ETL, which stands for “extract, transform, load,” are the three processes that, in
combination, move data from one database, multiple databases, or other sources to a
unified repository—typically a data warehouse. It enables data analysis to provide
actionable business information, effectively preparing data for analysis and business
intelligence processes.
As data engineers are experts at making data ready for consumption by working with
multiple systems and tools, data engineering encompasses ETL. Data engineering
involves ingesting, transforming, delivering, and sharing data for analysis. These
fundamental tasks are completed via data pipelines that automate the process in a
repeatable way. A data pipeline is a set of data-processing elements that move data
from source to destination, and often from one format (raw) to another (analytics-ready).
<="" a="" alt="data engineering - data engineering training" class="ppc-image"

src="https://www.snowflake.com/wp-content/uploads/2021/09/300X250-blu-3-1.webp"
srcset="https://www.snowflake.com/wp-content/uploads/2021/09/300X250-blu-3-1.webp"
width="380px" style="box-sizing: border-box; outline: 0px; line-break: strict; border: none; max-
width: 100%; height: auto; position: relative;">
WHAT IS THE PURPOSE OF ETL?

ETL allows businesses to consolidate data from multiple databases and other sources
into a single repository with data that has been properly formatted and qualified in
preparation for analysis. This unified data repository allows for simplified access for
analysis and additional processing. It also provides a single source of truth, ensuring
that all enterprise data is consistent and up-to-date.
ETL PROCESS
There are three unique processes in extract, transform, load. These are:
Extraction, in which raw data is pulled from a source or multiple sources. Data could
come from transactional applications, such as customer relationship management
(CRM) data from Salesforce or enterprise resource planning (ERP) data from SAP, or
Internet of Things (IoT) sensors that gather readings from a production line or factory
floor operation, for example. To create a data warehouse, extraction typically involves
combining data from these various sources into a single data set and then validating the
data with invalid data flagged or removed. Extracted data may be several formats, such
as relational databases, XML, JSON, and others.
Transformation, in which data is updated to match the needs of an organization and

the requirements of its data storage solution. Transformation can involve standardizing
(converting all data types to the same format), cleansing (resolving inconsistencies and
inaccuracies), mapping (combining data elements from two or more data models),
augmenting (pulling in data from other sources), and others. During this process, rules
and functions are applied, and data cleansed to prevent including bad or non-matching
data to the destination repository. Rules that could be applied include loading only
specific columns, deduplicating, and merging, among others.
Loading, in which data is delivered and secured for sharing, making business-ready

data available to other users and departments, both within the organization and
externally. This process may include overwriting the destination’s existing data.
ETL VERSUS ELT

ELT (extract load transform) is a variation in which data is extracted and loaded and
then transformed. This sequence allows businesses to preload raw data to a place
where it can be modified. ELT is more typical for consolidating data in a data
warehouse, as cloud-based data warehouse solutions are capable of scalable
processing.
Extract, transform, load is especially conducive to advanced analytics. For example,

data scientists commonly load data into a data lake and then combine it with another
data source or use it to train predictive models. Maintaining the data in a raw (or less
processed) state allows data scientists to keep their options open. This approach is
quicker as it leverages the power of modern data processing engines and cuts down on
unnecessary data movement.
ETL TOOLS
ETL tools automate the extraction, transforming, and loading processes, consolidating
data from multiple data sources or databases. These tools may have data profiling, data
cleansing, and metadata-writing capabilities. A tool should be secure, easy to use and
maintain, and compatible with all components of an organization’s existing data
solutions.
What is ETL?
 Knowledge center»
 Data integration»
 ETL vs ELT: Defining the Difference»
 What is ETL?
Related articles
 What is Reverse ETL? Meaning and Use Cases
 Data Extraction Tools: Improving Data Warehouse Performance
 Best Practices for Managing Data Quality: ETL vs ELT
 Data Wrangling vs. ETL
 Data Wrangling: Speeding Up Data Preparation
Quick answer? ETL stands for "Extract, Transform, and Load."

In the world of data warehousing, if you need to bring data from multiple different data sources into
one, centralized database, you must first:
 EXTRACT data from its original source

 TRANSFORM data by deduplicating it, combining it, and ensuring quality, to then
 LOAD data into the target database
ETL tools enable data integration strategies by allowing companies to gather data from multiple data
sources and consolidate it into a single, centralized location. ETL tools also make it possible for different
types of data to work together.
A typical ETL process collects and refines different types of data, then delivers the data to a data lake or
data warehouse such as Redshift, Azure, or BigQuery.
ETL tools also makes it possible to migrate data between a variety of sources, destinations, and analysis
tools. As a result, the ETL process plays a critical role in producing business intelligence and executing
broader data management strategies. We are also seeing the process of Reverse ETL become more
common, where cleaned and transformed data is sent from the data warehouse back into the business
application.
How ETL works

The ETL process is comprised of 3 steps that enable data integration from source to destination: data
extraction, data transformation, and data loading.
Step 1: Extraction
Most businesses manage data from a variety of data sources and use a number of data analysis tools to
produce business intelligence. To execute such a complex data strategy, the data must be able to travel
freely between systems and apps.
Before data can be moved to a new destination, it must first be extracted from its source — such as a
data warehouse or data lake. In this first step of the ETL process, structured and unstructured data is
imported and consolidated into a single repository. Volumes of data can be extracted from a wide range
of data sources, including:
 Existing databases and legacy systems

 Cloud, hybrid, and on-premises environments
 Sales and marketing applications
 Mobile devices and apps
 CRM systems
 Data storage platforms
 Data warehouses
 Analytics tools
Although it can be done manually with a team of data engineers, hand-coded data extraction can be
time-intensive and prone to errors. ETL tools automate the extraction process and create a more
efficient and reliable workflow.
Step 2: Transformation
During this phase of the ETL process, rules and regulations can be applied that ensure data quality and
accessibility. You can also apply rules to help your company meet reporting requirements. The process
of data transformation is comprised of several sub-processes:
 Cleansing — inconsistencies and missing values in the data are resolved.

 Standardization — formatting rules are applied to the dataset.
 Deduplication — redundant data is excluded or discarded.
 Verification — unusable data is removed and anomalies are flagged.
 Sorting — data is organized according to type.
 Other tasks — any additional/optional rules can be applied to improve data quality.
Transformation is generally considered to be the most important part of the ETL process. Data
transformation improves data integrity — removing duplicates and ensuring that raw data arrives at its
new destination fully compatible and ready to use.
See why Talend was named a Leader in the 2022 Magic Quadrant™ for Data Integration Tools
for the seventh year in a row
Get the report

Step 3: Loading
The final step in the ETL process is to load the newly transformed data into a new destination (data lake
or data warehouse.) Data can be loaded all at once (full load) or at scheduled intervals (incremental
load).
Full loading — In an ETL full loading scenario, everything that comes from the transformation assembly
line goes into new, unique records in the data warehouse or data repository. Though there may be
times this is useful for research purposes, full loading produces datasets that grow exponentially and can
quickly become difficult to maintain.
Incremental loading — A less comprehensive but more manageable approach is incremental loading.
Incremental loading compares incoming data with what’s already on hand, and only produces additional
records if new and unique information is found. This architecture allows smaller, less expensive data
warehouses to maintain and manage business intelligence.
ETL use case: business intelligence
Data strategies are more complex than they’ve ever been; SaaS gives companies access to data from
more data sources than ever before. ETL tools make it possible to transform vast quantities of data into
actionable business intelligence.
Consider the amount of raw data available to a manufacturer. In addition to the data generated by
sensors in the facility and the machines on an assembly line, the company also collects marketing, sales,
logistics, and financial data (often using a SaaS tool).
All of that data must be extracted, transformed, and loaded into a new destination for analysis. ETL
enables data management, business intelligence, data analytics, and machine learning capabilities by:
Delivering a single point-of-view

Managing multiple data sets in a world of enterprise data demands time and coordination, and can
result in inefficiencies and delays. ETL combines databases and various forms of data into a single,
unified view. This makes it easier to aggregate, analyze, visualize, and make sense of large datasets.
Providing historical context
ETL allows the combination of legacy enterprise data with data collected from new platforms and
applications. This produces a long-term view of data so that older datasets can be viewed alongside
more recent information.
Improving efficiency and productivity

ETL Software automates the process of hand-coded data migration and ingestion, making it self-service.
As a result, developers and their teams can spend more time on innovation and less time managing the
painstaking task of writing code to move and format data.
Finding an ETL solution

ETL can be accomplished in one of two ways. In some cases, businesses may task developers with
building their own ETL. However, this process can be time-intensive, prone to delays, and expensive.
Most companies today rely on an ETL tool as part of their data integration process. ETL tools are known
for their speed, reliability, and cost-effectiveness, as well as their compatibility with broader data
management strategies. ETL tools also incorporate a broad range of data quality and data governance
features.
When choosing which ETL tool to use, you’ll want to consider the number and variety of connectors
you’ll need as well as its portability and ease of use. You’ll also need to determine if an open-source tool
is right for your business since these typically provide more flexibility and help users avoid vendor lock-
in.
ELT — the next generation of ETL

ELT is a modern take on the older process of extract, transform, and load in which transformations take
place before the data is loaded. Over time, running transformations before the load phase is found to
result in a more complex data replication process. While the purpose of ETL is the same as ELT, the
method is evolved for better processing.
ELT vs ETL
Traditional ETL software extracts and transforms data from different sources before loading it into a
data warehouse or data lake. With the introduction of the cloud data warehouse, there was no longer
the need for data cleanup on dedicated ETL hardware before loading into your data warehouse or data
lake. The cloud enables a push-down ELT architecture with two steps changed from the ETL pipeline.
 EXTRACT Extract the data from multiple data sources and connectors
 LOAD Load it into the cloud data warehouse
 TRANSFORM Transform it using the power and scalability of the target cloud platform
If you are still on premises and your data isn't coming from several different sources, ETL tools still fit
your data analytics needs. But as more businesses move to a cloud data architecture (or hybrid), ELT
processes are more adaptable and scalable to evolving needs of cloud-based businesses.
ETL process vs ELT processes

ETL tools require processing engines for running transformations prior to loading data into a destination.
On the other hand, with ELT, businesses use the processing engines in the destinations to efficiently
transform data within the target system itself. This removal of an intermediate step streamlines the data
loading process.
21 Mostly Asked Banking Interview

Questions and Answers
Vidhi Shukla / Aug. 17, 2022512315
Share
Banking offers numerous career options for freshers and experienced professionals.
Nevertheless, along with the academic qualifications and aptitude, the interview process
has to be cleared which consists of varied types of interview questions.

A job in banking may or may not require experience but it does require an impressive
interview round.
The finance and banking industry attempts a range of entrances for graduates from
various academic regulations such as corporate banking, Customer relationship
management, researchers or tax analysts, analysts etc.

In the article below, let's explore the top 21 banking interview questions and answers
which will help you clear the interview with flying colours.
Top 21 Banking Interview Questions and Answers are:
Question 1: Brief me about yourself?
Answer: It is the first fundamental question that every interviewer asks a candidate to
start the conversation and know about the person. So, always be positive and introduce
yourself starting with your name, qualification and all the other required information that
is important for an interviewer to know. Just complete it within 2 minutes so that it
should not be extended as a boring conversation.
Question 2: Why do you want to join the banking sector?
Answer: In this question, be logical and answer it by telling why banking sectors have
influenced people with all the facts and figures, ready as to why the banking sector is the
fastest-growing sector. Do not start by telling that you want to have a stable career or
some personal view. Just make it well versed which can form a correct opinion of your
answer.
Question 3: What are the types of accounts in a bank?
Answer: Be straight forward and start your answer by telling the information which can
match the question asked by an Interviewer. The types of accounts in banks are:
 Checking Account: You can access the account as saving account but, unlike saving account,
you cannot earn interest on this account. The benefit of opening a checking account in a bank is
there is no limit for withdrawal.
 Money Market Account: This account gives both the benefit of savings account and checking
accounts. You can withdraw the amount and yet you can earn higher interest on it. This type of
account can be opened with a minimum balance.
 Certificate of Deposit Account (CD): By the opening of such account you have to deposit your
money for the fixed period like five years or seven years, and you will earn the interest on it. The
rate of interest will be decided by the bank, and you cannot withdraw the funds until the fixed
period expires.
 Saving Account: You can save your money in such account and also earn interest on it. The
number of withdrawal is limited and need to maintain the minimum amount balance in the
account to remain active.
Question 4: What are the necessary documents a person requires to open an account
in a bank?
Answer: As per the RBI advised banks to follow the Know Your Customer (KYC)
guidelines where the bank obtains some personal information of the account holder. The
primary document that is needed to open an account are photographs, proof of identity
proof like Aadhar card or Pan Card etc., and address proof as well.
Question 5: What are the types of Commercial Banks?
Answer: The types of Commercial Banks are:

 Retail or Consuming Bank: - It is small to the midsize branch that directly deals with
consumer’s transaction rather than corporate or other banks.
 Corporate or business banking: - Corporate banking deals with cash management,
underwriting, financing and issuing of stocks and bonds.
 Non- traditional Options: - There are many non-banks entities that offer financial services like
that of the bank. The entities include credit card companies, credit card report agencies and credit
card issuers.
 Securities and Investment Banking: - Investment banking manages portfolios of financial
assets, commodity and currency, corporate finance, fixed income, debt and equity writing etc.
Question 6: What is the annual percentage rate (APR)?
Answer: APR is known as the Annual percentage rate. It is a charge or interest that the
bank imposes on their customers for using their services like loans, credit cards etc. The
interest is calculated annually.
Question 7: What is Amortization and negative amortization?
Answer: Amortization refers to the repayment of the loan by instalment to cover

principal amount with interest whereas, negative amortization is when the repayment of
the loan is less than the loans accumulated interest, then negative amortization takes
places.
Question 8: What is the debt to income ratio?
Answer: Debt to income ratio is calculated by dividing a loan applicant’s total debt
payment by his gross income.
Question 9: What is loan grading?
Answer: Loan grading is the classification of the loan based on various risks and
parameters like repayment risk, borrowers credit history etc. The system places a loan on
one to six categories, based on the stability and risk associated with the loan.
Question 10: What do you mean by Co-Maker?
Answer: A person who signs a note to guarantee the payment of the loan on behalf of
the main loan applicant’s is known as Co-maker or signer.
Question 11: What is the line of credit?
Answer: Line of credit is an agreement between the bank and a borrower, to provide a
certain amount of loans on borrower’s demand. The borrower can withdraw the amount
at any moment and pay the interest only on the amount withdraw.
Question 12: How banks earn a profit?
Answer: The bank earns profit in various ways:
 Accepting deposit
 Banking Value chain
 Interest spread
 Providing funds to borrowers on interest
 Additional charges on services like checking account maintenance, online bill payment etc.
Question 13: What is the payroll card?
Answer: Payroll cards are types of smart cards issued by banks to facilitate salary
payments between employer and employees. Through payroll card, the employer can
load salary payments onto an employee’s smart card, and employee can withdraw the
salary even though if he or she doesn’t have an account in the bank.
Question 14: What is the card-based payment?
Answer: There are two types of card payments:
 Credit Card Payment

 Debit Card Payment
Question 15: What is a Payday loan?
Answer: A Payday loan refers to a small amount and a short term loan available at the
high-interest rate.
Question 16: What is a charge off?

Answer: Charge off is a declaration by a lender to a borrower for non- payment of the
remaining amount when borrower badly falls into debt. The unpaid amount is settled as a
bad debt.
More related questions for banking interview:
Question 17: What are the different types of loans offered by commercial banks?
Question 18: What are the different types of fixed deposit?
Question 19: What is a home equity loan?
Question 20: What is the interbank deposit?
Question 21: What are the non-performing assets of the company?
So, these are the question and answer that can easily help you to clear the interview panel
and get the job position in the banking sector. You can also surf for more questions
through Google that can lend you as a helping hand.
What are the common ways to operate your bank

accounts?
Bank accounts can be operated through:
1. Branches of Bank.
2. Mobile Banking.
3. ATM.
4. Internet Banking.
What are the software banking applications available

in the industry?
The 8 commonly used software banking applications are:
1. Internet banking system

2. ATM banking
3. Core banking system
4. Loan management system
5. Credit management system
6. Investment management system
7. Stock-market management system
8. Financial management system
10 Top Bank Interview

Questions
Bank interview questions will explore your technical experience and skills as well
as the key job competencies as they relate to the banking position.
The personal interview for banking jobs can be tough, it is important to come
across as comfortable and confident.
Carefully consider these frequently asked banking interview questions and use
the excellent interview answer help and guidelines to prepare your own winning
bank interview answers. Get the banking job you want!
10 Key Bank Interview Questions and

Answers
1. Why should this bank hire you?
Focus on specific experience and training in your career history.
Emphasize what qualifies you for this banking job and how you can add value to
both the position and the bank. Look at the banking job requirements such as:
 accuracy
 customer care
 computer skills
 numeracy skills
 communication skills
Highlight how you have demonstrated these skills previously.
If there are areas of the job function that you do not yet have experience in, then
highlight what skills you have that will facilitate learning and performing these
tasks.
 For example your ability to remain calm under pressure and communicate
clearly will help you in dealing with customers.
 Emphasize qualities like loyalty, integrity, confidentiality and commitment.
 State your technical knowledge and confirm your understanding of the
basics of bank products and services.
Those banking interview questions that ask you to provide an example of how
you have previously demonstrated a skill or ability are called behavioral based
interview questions.
These type of questions are used to assess whether you have the necessary
competencies to perform in a banking job. The behavioral interview question and
answer guide will help you to prepare for these bank interview questions.
2. Tell me about a situation where you had to deal with an angry

customer.
Customer service skills are essential for any front line banking job. Your example
should display:
 good listening skills

 the ability to clarify the customer's needs
 the ability to respond with patience and empathy
Show how you develop positive customer relationships by trying to meet the
customer's expectations and taking responsibility for the customer's satisfaction.
Find out more about customer service job interview questions and answers and
be well prepared for any customer service orientated interview questions.
Bank job candidates may be asked to define good customer service in order to
evaluate their customer service orientation.
3. Tell me about a time where you had to use your discretion and
tact to do your job properly.
You are often required to display diplomacy and tact with customers in a banking
environment. Provide an example of a challenging situation where you had to
handle the customer carefully and with discretion.
Discuss how you used your sensitivity and communication skills to manage the
situation.
4. How do you check your work for accuracy?

In asking bank interview questions about your attention to detail and accuracy
the interviewer wants to know if you are able to accurately carry out your tasks
with close attention to all aspects of your work.
Provide specific examples of how you check your outputs for accuracy and
completeness and what you do if you find a mistake.
5. Tell me about a time you had to explain a process or situation

to a confused customer. How did you approach this?
Banking means dealing with a variety of people. The ability to adjust to the
customer and the situation is an important job requirement.
Your example should clearly indicate how you changed your communication style
to meet the customer's needs.
6. Tell me about a time you felt pressured by conflicting work

demands, how did you respond?
Staying calm under pressure is a key requirement of working in the front line of a
bank. Your ability to maintain performance under stress and to use appropriate
coping techniques should be demonstrated.
Discuss the resources you use to meet the different work demands including
prioritizing, planning, scheduling and asking for assistance when appropriate.
7. Tell me about a time you saw a co-worker do something that

you didn't think was appropriate. What did you do?
Integrity and honesty are core to a banking job. Focus on taking immediate
action and using the right resources to deal with the problem. Emphasize your
commitment to adhering to company policies and regulations.
Your judgment is also under scrutiny here so describe your motivation to take
action.
8. How many scheduled days have you missed during the last
four months?
Bank interview questions will explore your reliability. Be honest about this as it
can always be verified with a reference check.
Focus on your reliability, punctuality and your willingness to work extra hours if
needed.
9. What are the most important qualities for a bank teller job?
Bank interview questions like this are asked to explore your understanding of
banking job requirements. Focus on technical skills such as:
 numeracy
 computer literacy
 product and services knowledge
Discuss key job competencies including accuracy, customer service orientation,
judgment, integrity, reliability and the ability to cope under pressure.
Point out your strengths as they relate to these qualities. Use this list of
strengths to help you.
The bank teller job description gives you a complete understanding of the job

requirements.
10. What made you decide to work for this bank?

This is your opportunity to shine. Do your homework and research the bank
before your interview. Make a list of three or four reasons why this is the right
bank for you. Use the resources at preparing for a job interview to help you with
this.

Data Science Interview Best

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Science Interview Best

Uploaded by

Copyright:

Available Formats

1) Mention what is the responsibility of a Data analyst?

Responsibility of a Data analyst include,

Identify new process or areas for improvement opportunities

Analyze, identify and interpret trends or patterns in complex data sets

Filter and “clean” data, and review computer reports

Determine performance indicators to locate and correct code problems

Securing database by developing access system by determining user level of access

2) What is required to become a data analyst?

To become a data analyst,

3) Mention what are the various steps in an analytics project?

Various steps in an analytics project include

Implementation and tracking

4) Mention what is data cleansing?

5) List out some of the best practices for data cleaning?

Some of the best practices for data cleaning includes,

Sort data by different attributes

6) Explain what is logistic regression?

7) List of some best tools that can be useful for data-analysis?

Google Search Operators

Google Fusion tables

The difference between data mining and data profiling is that

9) List out some common problems faced by data analyst?

Some of the common problems faced by data analyst are

Varying value representations

Identifying overlapping data

The missing patterns that are generally observed are

Missing completely at random

Usually, methods used by data analyst for data validation are

14) Explain what should be done with suspected or missing data?

To deal the multi-source problems,

A data scientist must have the following skills

Collaborative filtering is a simple algorithm to create a recommendation system based on user

23) Explain what is Map Reduce?

21) Explain what are the tools used in Big Data?

Tools used in Big Data includes

Properties for clustering algorithm are

Statistical methods that are useful for data scientist are

28) What is a hash table?

29) What are hash table collisions? How is it avoided?

30) Which imputation method is more favorable?

32) Explain what is the criteria for a good data model?

Criteria for a good data model includes

1. Degree and Domain Expertise

What are the best qualities of a data analyst?

Data cleaning is the process of fixing or removing incorrect, corrupted, incorrectly

Top 10 Data Analytics Tools You Need To Know In 2023

What is Data Profiling?

Data profiling Techniques

In structure discovery, the structural identity of the database should be maintained

Content discovery refers to the detailed analysis of structural discovery. It specifically

Relationship discovery establishes the relationship between various identities. It finds

Methods of data profiling

Cross table profiling

What is Data Mining?

What Is Data Modeling?

 Bottom-up data modeling: Bottom-up data modeling focuses on

Types of Data Models

 Conceptual data model: A conceptual data model describes business

Advantages of Data Models

 Improved quality: Before creating and building any applications, it's

Data Models in GoodData

GoodData's Logical Data Model

 GoodData LDM Modeler: The GoodData LDM Modeler supports the

Criteria for a good data model includes