You are on page 1of 6

UNIT-2 DATA ANALYSIS

Data Collection?

Data collection is the process of collecting and evaluating information or data from
multiple sources to find answers to research problems, answer questions, evaluate
outcomes, and forecast trends and probabilities. It is an essential phase in all types of
research, analysis, and decision-making, including that done in the social sciences,
business, and healthcare.

Accurate data collection is necessary to make informed business decisions, ensure


quality assurance, and keep research integrity.

During data collection, the researchers must identify the data types, the sources of data,
and what methods are being used. We will soon see that there are many different data
collection methods. There is heavy reliance on data collection in research, commercial,
and government fields.

Before an analyst begins collecting data, they must answer three questions first:

 What’s the goal or purpose of this research?

 What kinds of data are they planning on gathering?

 What methods and procedures will be used to collect, store, and process the
information?

Additionally, we can break up data into qualitative and quantitative types. Qualitative
data covers descriptions such as color, size, quality, and appearance. Quantitative data,
unsurprisingly, deals with numbers, such as statistics, poll numbers, percentages, etc.

What is Data Classification?


Data classification is broadly defined as the process of organizing data by relevant
categories so that it may be used and protected more efficiently. On a basic level, the
classification process makes data easier to locate and retrieve. Data classification is of
particular importance when it comes to risk management, compliance, and data
security.
Data classification involves tagging data to make it easily searchable and tractable. It
also eliminates multiple duplications of data, which can reduce storage and backup
costs while speeding up the search process. Though the classification process may
sound highly technical, it is a topic that should be understood by your organization’s
leadership.

Types of Data Classification


Data classification often involves a multitude of tags and labels that define the type of
data, its confidentiality, and its integrity. Availability may also be taken into consideration
in data classification processes. Data’s level of sensitivity (or sensitivity level) is often
classified based on varying levels of importance or confidentiality, which then correlates
to the security control and protection strategy measures put in place to protect each
classification level.

There are three main types of data classification that are considered industry standards:

 Content-based classification software inspects and interprets files looking


for sensitive information
 Context-based classification looks at application, location, or creator among other
variables as indirect indicators of sensitive information
 User-based classification depends on a manual, end-user selection of each
document. User-based classification relies on user knowledge and discretion at
creation, edit, review, or dissemination to flag sensitive documents.

Data management and why is it important?

Data management is the process of ingesting, storing, organizing and maintaining


the data created and collected by an organization. Effective data management is a
crucial piece of deploying the IT systems that run business applications and
provide analytical information to help drive operational decision-making and
strategic planning by corporate executives, business managers and other end users.

The data management process includes a combination of different functions that


collectively aim to make sure the data in corporate systems is accurate, available
and accessible. Most of the required work is done by IT and data management
teams, but business users typically also participate in some parts of the process to
ensure that the data meets their needs and to get them on board with policies
governing its use.
This comprehensive guide to data management further explains what it is and
provides insight on the individual disciplines it includes, best practices for
managing data, challenges that organizations face and the business benefits of a
successful data management strategy. You'll also find an overview of data
management tools and techniques. Click through the hyperlinks on the page to read
more articles about data management trends and get expert advice on managing
corporate data.

Importance of data management

Data increasingly is seen as a corporate asset that can be used to make better-
informed business decisions, improve marketing campaigns, optimize business
operations and reduce costs, all with the goal of increasing revenue and profits. But a
lack of proper data management can saddle organizations with incompatible data silos,
inconsistent data sets and data quality problems that limit their ability to run business
intelligence (BI) and analytics applications -- or, worse, lead to faulty findings.

Data management has also grown in importance as businesses are subjected to an


increasing number of regulatory compliance requirements, including data privacy and
protection laws such as GDPR and the California Consumer Privacy Act (CCPA). In
addition, companies are capturing ever-larger volumes of data and a wider variety of
data types -- both hallmarks of the big data systems many have deployed. Without good
data management, such environments can become unwieldy and hard to navigate.

What Is Big Data Management?

Big data management refers to the organization, administration and governance of large
volumes of unstructured and structured data. A high level of data quality and
accessibility for business intelligence and big data analytics applications is the aim of
big data management. Businesses, enterprises, and governments use big data
management strategies to tackle the vast and rapidly expanding data pools that typically
have hundreds of terabytes or even petabytes of data stored in various file formats.
Facebook, for instance, gets over 500 terabytes of new data into their databases daily.
A company's ability to locate valuable information in extensive stacks of unstructured
and semi-structured data from a variety of disparate sources, such as call records,
system logs, images, social media sites, and sensors, is aided by effective big data
management.

Big data management includes the following processes:

 Using a centralized interface or dashboard to monitor and ensure the availability of all
big data resources

 Maintaining the database to get better outcomes.

 Monitoring big data analytics, big data reporting and other similar solutions and
implementing them

 Efficient design and implementation of data cycle processes

 Control access and security of big data repositories

 Data visualization to reduce volume and improve big data operations

 Data visualization techniques allow multiple users to use it simultaneously.

 Capturing and storing data from all resources.

 Competitive advantage: Big data management gives businesses a competitive edge


because it enables analytics, which gives them an advantage over their rivals.

Big Data Management Benefits


As much as there are challenges to implementing big data management, there are
numerous benefits. Let us take a look at some of them.

 Higher Revenue: When data is managed correctly, organizations have increased


revenue. With enhanced data quality solutions, there is an increase in revenue as
well.

 Better customer service: Big data initiatives almost always state customer service as
the primary objective. Big data management gives the benefit of better customer
service.

 Better Marketing: With timely and personalized customer communications, the


marketing quality also has a big increase from big data management. This is
primarily due to better data quality.

 Cost Effective: Big data management increases the efficiency of efforts to decrease
expenses. With big data implementation, processes become more cost-effective.
 Accurate Analytics: The accuracy and dependability of big data analytics can be
improved by big data management practices. When well-formed data enters the
analytics solution, the organization is prepared for the solution's high-quality business
insights.

 Competitive advantage: Big data management gives businesses a competitive edge


because it enables analytics, which gives them an advantage over their rivals.

Outlier in data mining

"Outliers" refer to the data points that exist outside of what is to be expected. The major
thing about the outliers is what you do with them. If you are going to analyze any task
to analyze data sets, you will always have some assumptions based on how this data is
generated. If you find some data points that are likely to contain some form of error,
then these are definitely outliers, and depending on the context, you want to overcome
those errors. The data mining process involves the analysis and prediction of data that
the data holds. In 1969, Grubbs introduced the first definition of outliers.

Difference between outliers and noise

Any unwanted error occurs in some previously measured variable, or there is any
variance in the previously measured variable called noise. Before finding the outliers
present in any data set, it is recommended first to remove the noise.

Types of Outliers

Outliers are divided into three different types

1. Global or point outliers


2. Collective outliers
3. Contextual or conditional outliers

Global Outliers

Global outliers are also called point outliers. Global outliers are taken as the simplest
form of outliers. When data points deviate from all the rest of the data points in a given
data set, it is known as the global outlier. In most cases, all the outlier detection
procedures are targeted to determine the global outliers. The green data point is the
global outlier.
Data visualization?

Data visualization is the process of using visual elements like charts,


graphs, or maps to represent data. It translates complex, high-volume,
or numerical data into a visual representation that is easier to
process. Data visualization tools improve and automate the visual
communication process for accuracy and detail. You can use the
visual representations to extract actionable insights from raw data.

Data visualization important?


Modern businesses typically process large volumes of data from
various data sources, such as the following:

 Internal and external websites


 Smart devices
 Internal data collection systems
 Social media

But raw data can be hard to comprehend and use. Hence, data
scientists prepare and present data in the right context. They give it a
visual form so that decision-makers can identify the relationships
between data and detect hidden patterns or trends. Data visualization
creates stories that advance business intelligence and support data-
driven decision-making and strategic planning.

benefits of data visualization?


Some benefits of data visualization are as follows:

Strategic decision-making

Key stakeholders and top management use data visualization to


interpret data meaningfully. They save time through faster data
analysis and the ability to visualize the bigger picture. For example,
they can identify patterns, discover trends, and gain insights to remain
ahead of the competition.

You might also like