Professional Documents
Culture Documents
Introduction:
Overviews of Big Data: The definition of big data is data that contains greater variety,
arriving in increasing volumes and with more velocity. Big data refers to the large volume of
data and also the data is increasing with a rapid speed with respect to time. It includes
structured and unstructured and semi-structured data which is so large and complex and it
cannot be managed by any traditional data management tool. Specialized big data
management tools are required to store and process the data.
Data Analytics: Data Analytics refers to the process of analysing the raw data and finding
out conclusions about that information. It helps in taking raw data and uncovering patterns
by examining it to extract valuable insights from it. The aim behind data analytics is to
enhance productivity and business gain. It helps companies to better understand their
customers, planning strategies accordingly and develop products. Descriptive, Diagnostic,
Predictive, and Prescriptive are the four basic types of data analytics.
Use of Data Analytics: There are some key domains and strategic planning techniques in
which the Data Analytics has played a very important role:
• Improved Decision-Making – If we will have supporting data in favour of a decision
that then we will be able to implement them with even more success probability. For
example, if a certain decision or plan has to lead to better outcomes then there will
be no doubt in implementing them again.
• Better Customer Service – Churn modelling is the best example of this in which we
try to predict or identify what leads to customer churn and change those things
accordingly so, that the attrition of the customers is as low as possible which is a
most important factor in any organization.
• Efficient Operations – Data Analytics can help us understand what is the demand of
the situation and what should be done to get better results then we will be able to
streamline our processes which in turn will lead to efficient operations.
• Effective Marketing – Market segmentation techniques have been implemented to
target this important factor only in which we are supposed to find the marketing
techniques which will help us increase our sales and leads to effective marketing
strategies.
The Data Scientist: Data scientists are a new breed of analytical data expert who have
the technical skills to solve complex problems – and the curiosity to explore what problems
need to be solved. They're part mathematician, part computer scientist and part trend-
spotter.
A data scientist uses data to understand and explain the phenomena around them,
and help organizations make better decisions.
Data scientists are analytical data experts with technical skills to solve complex
problems. They work with several elements related to mathematics, statistics, and
computer science and collect, analyse, and interpret large amounts of data. They are
responsible for providing insights beyond statistical analyses.
• Storage: With vast amounts of data generated daily, the greatest challenge is
storage (especially when the data is in different formats) within legacy systems.
Unstructured data cannot be stored in traditional databases.
• Processing: Processing big data refers to the reading, transforming, extraction, and
formatting of useful information from raw information. The input and output of
information in unified formats continue to present difficulties.
• Security: Security is a big concern for organizations. Non-encrypted information is at
risk of theft or damage by cyber-criminals. Therefore, data security professionals
must balance access to data against maintaining strict security protocols.
• Finding and Fixing Data Quality Issues: Many of you are probably dealing with
challenges related to poor data quality, but solutions are available.
• Scaling Big Data Systems: Database sharding, memory caching, moving to the cloud
and separating read-only and write-active databases are all effective scaling
methods. While each one of those approaches is fantastic on its own, combining
them will lead you to the next level.
• Evaluating and Selecting Big Data Technologies: Companies are spending millions on
new big data technologies, and the market for such tools is expanding rapidly. In
recent years, however, the IT industry has caught on to big data and analytics
potential.
• Big Data Environments: In an extensive data set, data is constantly being ingested
from various sources, making it more dynamic than a data warehouse. The people in
charge of the big data environment will fast forget where and what each data
collection came from.
• Real-Time Insights: The term "real-time analytics" describes the practice of
performing analyses on data as a system is collecting it. Decisions may be made
more efficiently and with more accurate information thanks to real-time analytics
tools, which use logic and mathematics to deliver insights on this data quickly.
• Data Validation: Before using data in a business process, its integrity, accuracy, and
structure must be validated. The output of a data validation procedure can be used
for further analysis, BI, or even to train a machine learning model.
• Healthcare Challenges: Electronic health records (EHRs), genomic sequencing,
medical research, wearables, and medical imaging are just a few examples of the
many sources of health-related big data.
Statistical Concepts:
Procedure:
1. State the hypotheses - This step involves stating both null and
alternative hypotheses. The hypotheses should be stated in such a
way that they are mutually exclusive. If one is true then other must be
false.
2. Formulate an analysis plan - The analysis plan is to describe how to
use the sample data to evaluate the null hypothesis. The evaluation
process focuses around a single test statistic.
3. Analyze sample data - Find the value of the test statistic (using
properties like mean score, proportion, t statistic, z-score, etc.) stated
in the analysis plan.
4. Interpret results - Apply the decisions stated in the analysis plan. If the
value of the test statistic is very unlikely based on the null hypothesis,
then reject the null hypothesis.