Professional Documents
Culture Documents
IT’S TYPES
2. Economic status (poor, middle income, wealthy), Course grades (A+, A-, B+, B-, C)
3.DISCRETE DATA Education level (Elementary, High School, College, Graduate, Post-graduate)
3. Number of employees
The number of employees a company has is another type of discrete data. Companies may
track their number of employees because this information is relevant to their growth
goals. Some companies also try to maintain a specific ratio of management to lower-level
4.CONTINUOUS employees to ensure every employee receives guidance and direction in their roles.
DATA
4. The amount of time required to complete a project, the height of children ,the amount of
time it takes to sell shoes.
SIMILARITY AND DISSIMILARITY
A similarity measure is a mathematical function that quantifies the degree of similarity between two objects or
data points. It is a numerical score measuring how alike two data points are. It takes two data points as input and
produces a similarity score as output, typically ranging from 0 (completely dissimilar) to 1 (identical or perfectly
similar). A similarity measure can be based on various mathematical techniques such as Cosine similarity, Jaccard
similarity, and Pearson correlation coefficient. Similarity measures are generally used to identify duplicate
records, equivalent instances, or identifying clusters.
A dissimilarity measure is a mathematical function that quantifies the degree of dissimilarity between two
objects or data points. It is a numerical score measuring how different two data points are. It takes two data
points as input and produces a dissimilarity score as output, ranging from 0 (identical or perfectly similar) to 1
(completely dissimilar). A few dissimilarity measures also have infinity as their upper limit. A dissimilarity
measure can be obtained by using different techniques such as Euclidean distance, Manhattan distance, and
Hamming distance. Dissimilarity measures are often used in identifying outliers, anomalies, or clusters.
STATISTICAL SIGNIFICANCE
Statistical significance refers to the claim that a set of observed data are not
the result of chance but can instead be attributed to a specific cause.
Statistical significance is important for academic disciplines or practitioners
that rely heavily on analyzing data and research, such as economics,
finance, investing, medicine, physics, and biology.
A high degree of statistical significance indicates that an observed
relationship is unlikely to be due to chance. The calculation of statistical
significance is subject to a certain degree of error.
Statistical significance can be misinterpreted when researchers do not use
language carefully in reporting their results. Several types of significance tests
are used depending on the research being conducted.
CONCLUSION
• Data = Knowledge. Good data provides indisputable evidence, while anecdotal evidence, assumptions, or abstract
observation might lead to wasted resources due to taking action based on an incorrect conclusion.
• Data allows organizations to measure the effectiveness of a given strategy: When strategies are put into place to
overcome a challenge, collecting data will allow you to determine how well your solution is performing, and
whether or not your approach needs to be tweaked or changed over the long-term.
• Data allows organizations to more effectively determine the cause of problems. Data allows organizations to
visualize relationships between what is happening in different locations, departments, and systems.
• Data is a key component to systems advocacy. Utilizing data will help present a strong argument for systems
change. Whether you are advocating for increased funding from public or private sources, or making the case for
changes in regulation, illustrating your argument through the use of data will allow you to demonstrate why
changes are needed.
• Data increases efficiency. Effective data collection and analysis will allow you to direct scarce resources where they
are most needed. If an increase in significant incidents is noted in a particular service area, this data can be
dissected further to determine whether the increase is widespread or isolated to a particular site. If the issue is
isolated, training, staffing, or other resources can be deployed precisely where they are needed, as opposed to
system-wide. Data will also support organizations to determine which areas should take priority over others.
ACKNOWLEDGEMENT
I would like to express my gratitude and special thanks to my respected Professor Miss
Prithwa Ghosh to provide me a golden opportunity to work in this project on “A Brief
overview on Data and it’s types” and helped me a lot to complete this project on
time. I came to know about various unknown fields while I was doing this project.