You are on page 1of 20

Lecture №6

Data analysis. Data management.


Aim
of the lecture: to know about data analysis bases, methods of collection,
classification and prediction.
•Having studied this session, you will be able to get an idea about:
•Data analysis.
•Data management.
• Visualization of data.
1. Data Science Process
1. Data analysis bases. The process
• Data analysis, is a process for obtaining raw data, and
subsequently converting it into information useful for decision-
making by users.[1]
• Ways of planning the gathering of data to make its analysis
easier, more precise or more accurate.

• 1. "Transforming Unstructured Data into Useful Information", Big Data, Mining, and Analytics,
Auerbach Publications, pp. 227–246, 2014-03-12, doi:10.1201/b16666-14, ISBN
978-0-429-09529-0, retrieved 2021-05-29
1. Data analysis bases. Methods of collection,
classification and prediction.
• Data analysis is a process of inspecting, cleansing, transforming, and
modelling data with the goal of discovering useful information,
informing conclusions, and supporting decision-making.[1]
• Data analysis used in different business, science, and social science
• Data analysis plays a role in making decisions more scientific and
helping businesses operate more effectively.[3]
Data analysis. (2022, August 24). In Wikipedia. https://en.wikipedia.org/wiki/Data_analysis
1. Data analysis bases. Methods of collection,
classification and prediction.
https://en.wikipedia.org/wiki/Data_mining
• Data mining is the process of extracting and discovering patterns in
large data sets involving methods at the intersection of
machine learning, statistics, and database systems.[1]
• Data mining is an interdisciplinary subfield of computer science and
statistics with an overall goal of extracting information (with
intelligent methods) from a data set and transforming the
information into a comprehensible structure for further use.[1][2][3][4]
• Aside from the raw analysis step, it also involves database and
data management aspects, data pre-processing, … visualization, and
online updating.[1]
1. Data analysis bases. Methods of collection,
classification and prediction.
• The process consists of:
1. Data Requirement Gathering: Ask yourself why you’re doing this
analysis, what type of data analysis you want to use, and what data
you are planning on analyzing.
2. Data Collection: Guided by the requirements you’ve identified, it’s
time to collect the data from your sources. Sources include case
studies, surveys, interviews, questionnaires, direct observation, and
focus groups. Make sure to organize the collected data for analysis.
3. Data Cleaning: Not all of the data you collect will be useful, so it’s
time to clean it up. This process is where you remove white spaces,
duplicate records, and basic errors. Data cleaning is mandatory
before sending the information on for analysis.
1. Data analysis bases. Methods of collection,
classification and prediction.
• The process consists of:
4. Data Analysis:
Data analysis tools include Excel, Python, R, Looker, Rapid Miner,
Chartio, Metabase, Redash, and Microsoft Power BI.
5. Data Interpretation: Now that you have your results, you need to
interpret them and come up with the best courses of action, based on
your findings.
6. Data Visualization: Data visualization is a fancy way of saying,
“graphically show your information in a way that people can read and
understand it.” You can use charts, graphs, maps, bullet points…
Types of Data Analysis

• Diagnostic Analysis: Diagnostic analysis answers the question, “Why did this
happen?” Using insights gained from statistical analysis (more on that later!),
analysts use diagnostic analysis to identify patterns in data.
• Predictive Analysis: Predictive analysis answers the question, “What is most likely
to happen?” By using patterns found in older data as well as current events,
analysts predict future events. While there’s no such thing as 100 percent
accurate forecasting,.
• Prescriptive Analysis: Mix all the insights gained from the other data analysis
types, and you have prescriptive analysis. Sometimes, an issue can’t be solved
solely with one analysis type, and instead requires multiple insights.
Types of Data Analysis

Statistical analysis
It's the science of collecting, exploring and presenting large
amounts of data to discover underlying patterns and trends.

Statistics are applied every day – in research, industry and


government – to become more scientific about decisions that
need to be made.
Types of Data Analysis

Text Analysis: Also called “data mining,”


• text analysis uses databases and data mining tools to discover
patterns residing in large datasets.
• It transforms raw data into useful business information.
• Text analysis is arguably the most straightforward and the most direct
method of data analysis.
Data Management.

• Introduction to Data Management.


Depends on the area of implementation (Epidemiology vs business
project
The Data Management System. The data management system is the set
of procedures and people through which information is processed. Tools
for collection, manipulation, storage and retrieval of information.
The purpose is to ensure:
a) high quality data
b) accurate, appropriate, and defensible analysis and interpretation
Data Management.
• Data management is the process of ingesting, storing, organizing and
maintaining the data created and collected by an organization.
Effective data management is a crucial piece of deploying the IT
systems that run business applications and provide analytical
information to help drive operational decision-making and strategic
planning by corporate executives, business managers and other end
users.
• The data management process includes a combination of
different functions that collectively aim to make sure that the
data in corporate systems is accurate, available and
accessible.
The Data Management System.
•Acquire data and prepare them for analysis
•The data management system includes the overview of the flow of
data from research subjects to data analysts. Before it can be analyzed,
data must be collected, reviewed, coded, computerized, verified,
checked, and converted to forms suited for the analyses to be conducted.
The process must be adequately documented to provide the foundation
for analyses and interpretation.
The Data Management System.

•Maintain quality control and data security. Threats to data quality


arise at every point where data are obtained and/or modified. The value
of the research will be greatly affected by quality control, but achieving
and maintaining quality requires activities that are often mundane and
difficult to motivate.
The Data Management System.
•Quality control includes:
•Preventing and detecting errors in data through written procedures, training,
verification procedures, and avoidance of undue complexity
•Avoiding or eliminating inconsistencies, errors, and missing data through
review of data collection forms (ideally while access to the data source is still
possible to enable uncertainties to be resolved) and datasets
•Assessing the quality of the data through notes kept by interviewers, coders,
and data editors, through debriefing of subjects, and through reviews or repetition
of data collection for subsamples
•Avoiding major misinterpretations and oversights through “getting a feel” for
the data.
The Data Management System.
Security concerns :
(1) legal,
(2) safety of the information,
(3) protection from external sources,
(4) protection from internal sources.
The Data Management System.
Support inquiries, review, reconstruction, and archiving

Special care is needed to design a data management system that can


prevent the possibility of linking data to individual subjects. For
example, standard data collection procedures such as
•the use of ID numbers,
•inclusion of exact dates on all forms,
•and recording of supplemental information..
The Data Management System.
The data are merely the objects being manipulated by the data
management system.
•Two-way communication
•Consistency Consistency is essential in the implementation of the protocol, in
the data collection process, and with regards to decisions made during the project.
•Lines of authority and responsibilityAuthority and responsibility need to be
clearly defined and the designated persons accessible to the staff.
•Flexibility. The data management system must be flexible to respond to changes
in the protocol, survey instruments, and staff changes.
•Simplicity. Integration. Standardization. Рilot testing.
Visualization of data https://www.youtube.com/watch?
v=YaGqOPxHFkc

• Data visualization is the graphical representation of information and


data. By using visual elements like charts, graphs, and maps, data
visualization tools provide an accessible way to see and understand
trends, outliers, and patterns in data.
• Data visualization ”refers to transforming figures and raw data into
visual objects: points, bars,“ line plots, maps, etc. By combining user-
friendly and aesthetically pleasing features, these visualizations make
research and data analysis much quicker and are also a powerful
communication tool.
• Aim - to make data easier for the human brain to understand and
pull insights from.

You might also like