You are on page 1of 9

Component 2 Part A

Data collection and analysis

Characteristics of data and information

Holiday Date

First column – The customer’s email. Emails have the format of username@format.com.
This format has three parts. The username tasty is unique the address owner that can
have a combination of letters, numbers, and special characters, the @ symbol that
separates the username and the domain name, and the domain name which shows the
organization that is providing the email service.

Second and fourth column – Date of survey and Holiday Date. The dates of the survey
and the holiday are put in short form to reduce the amount of time typing the date. This
helps the company know the age of the data.

Third column – Gender. This is in a single letter format. If the person putting in the data
is made then the letter M will be shown and if the person with the data is female, then
the letter F is shown. The data is in short form to save time and reduce the risk of
human error. This helps us know if the user is as male or a female.

Fifth column – Holiday Duration. The data is shown in a numerical format to reduce the
amount of time used to put in the data, the number shows the number of days that the
holiday was. The use of just numbers makes it easier to perform calculations and know
how many days the holiday was.
Component 2 Part A

Sixth columns – holiday type. The data is shown in a word format. Text is put in this
format is it is easier to read and know what holiday is being shown. It is also useful as
there is wide selection of holidays shown.

Seventh, eight, tenth column – Accommodation, booking, transport ratings. These data
are shown in a numerical format. This is because it is in ratings from 1-10. This is data
that is imputed to show ethe quality of the activities done during the trip. This will help
the company a lot as it shows trends and areas that will need looking at.

Ninth column – Transport. The information is shown in a textual format. This makes it
easier to know exactly what transport is being shown. The data here shows the type of
transport being used by each person.

Twelfth columns – Price paid. The format for this data is in the currency of the British
pound which uses two decimal places. This data can be used to find trends and make
calculations.

Supermarket data

First column – customer number. This data is in a numerical format. It shows the unique
number that every customer has so that data is not mixed with another. The number is
seven digits long.

Second column – Date. The data is shown in a numerical format. The data shows the
exact data that the data was collected. the data is shown in short form as to reduce the
amount of time it takes for the data to be collected. This data shows what part of the
year particular items are being sold the most and the nature of things being bought by
people at months.
Component 2 Part A

Third column – Time. The time is show in short form to save time. The time shows the
exact time the checking out happened, and the data was collected. time was set in the
24-hour format. The time can show what exact part of the day most shopping is done.

Fourth column – amount spent. The amount spent shows the amount of money spent by
each customer. It is shown in the British pound currency format. The data can be used
to show the number of sales the supermarket has.

Fifth column – gender. Gender is shown in a single letter format. This reduces the risk of
human error. F is being used if the customer is female and M is being used if the
customer is male. It is faster to type one letter than the whole word. This is called coding
data for quick entry.

Sixth column – self checkout. This shows whether a person used self-checkout or not.
The letter Y was used if they did and N if they did not this makes it quicker for data to be
typed and avoid human error.

Seventh column – minutes to process. This is the amount of time it took for all items to
be scanned. It is put in a numerical value and is easier to be used for calculations and
pattern recognition.

Holiday data collection

The data has been collected for the company’s database, the electoral roll and a survey
company who gets data from customers via email. According to the company 50% of the
customers reached out to via email responded.

The electoral roll is a list of names and emails of people who are registered to vote in
government elections. Because the holiday company is using data from this roll it is
secondary data. The company has been given legal right to use this data for their own
gain. Using the government’s data is good because it is always accurate. This downside
of using electoral roll is that not everyone is able to vote in government elections.

The holiday company paid the survey company to collect data for them. This is
secondary data because they are using data that another party has collected for
themselves. The data company used an email-based system to collect data. This will
explain the 50% return as most people get annoyed by emails concerning surveys even
if there is a reward after completing the survey.

Supermarket data collection

The supermarket used primary data collection as data was collected by


themselves automatically with the use of computers. The data collected data on
whether they used self-checkout or not, the amount they spent, the amount of
time it took for their checkout to process, the time and the customer’s number.
The data was collected in 2018 so the data is quite old. Self-checkout is done on
the machine, which is inside the supermarket. It is also easy for users to use
and give data. This means that there is no need to ask for more unnecessary
data.
Component 2 Part A

The supermarket estimated that 90% of customers have a loyalty card.


Collecting data via a loyalty card is easier as you do not have to go through the
process of using questionnaires and data is collected automatically so human
error is avoided. Rewards are handed to owners of loyalty cards, so they are
more willing to give out data. Due to the introduction of the GDPR customers are
told what data on them is being collected.

Quality of data collected.


Holiday Data

The holiday company uses both primary and secondary data. The primary data is
rotten from the booking system, so it is logged in the computer when a
customer books a holiday. This makes it easier to collect data as the use of
computers reduces the risk of human error as it will be done in a website form
with questions and text boxes. This however does not mean that the use of
computers completely nullifies the risk of human error. There is still the chance
that the person filling the survey makes a typo or does a miskick. There is also
the case of people ignoring questions if they do not understand it. This however
is easily counter able by making questions required. If it that it the case, it is still
possible that the person filling in data is lying due to confusion.

Secondary data was collected from the electoral roll from the local government.
Data collected for the electoral roll the local government registers were
91% accurate and 84% complete. The parliamentary registers were 91%
accurate and 85% complete. This is good as the higher the accuracy of the data,
the more likely that you will find trends and patterns that are more useful and
relevant to you. Have more than 80% of people complete the surveys is good as
the data has more participants so patterns are more prominent leading to better
decisions. Electoral rolls are usually updated annually, this is good as to make
sure that trends are more up to date.

Secondary data was collected from specialist survey company. Due to only 50%
of surveys being completed the data is less valid than if all the surveys were
completed. Another issue with the nature of the surveys is the opinion system.
People are more like to exaggerate on the whole experience when they did or
did not like one part of the trip. This could lead to patterns that show wrong or
exaggerated trends. This data however is still able to be used to see trends in
various parts of the company.

Supermarket data
Component 2 Part A

The supermarket collected primarily. The data is collected when the customers
use their loyalty card. This is good because it is always fully accurate, and it
reduces the risk of human error. It is also faster to do as swiping your card is
instantaneous. The data is dependable because it is big data with a lot of
samples, so trends are more obvious and valid to the company. Because data is
collected as items are sold, the data will no contain missing information
normally.

The information could have been better than it is now. For example, the type of
item bought could have been mentioned to see what kind of items are bought at
separate times and different months the most and the least. The payment
method could have also been useful. The data is also quite outdated so trends
could be more different that it was before. Having up to data is particularly good
for a supermarket as the trends could help the company know what to restock
and market more than others.

Decision making
Holiday company

The holiday company’s data can be used for many purposes such as:

- Personalise communications and offers based on interests.


- Track satisfaction overtime, as well as see trends in popular holidays and
transport types to provide offers that will attract profit from lower areas.
- Analyse minimum, maximum and average amounts spent.
- Pricing analysis to create strategies to enable the company to have
competitive pricing that will favour the customers’ budgets.
- Look at ratings and recommendations to see where they falter and fixit by
making improvements based on feedback.

Supermarket

The data the supermarket has collected can help the company in many ways.

- Customer id will help in monitoring shopping behaviour and make offers


suitable to them.
- The date and time will help with looking for trends and finding peak
periods to optimize staffing and inventory according to the numbers.
- Gender data will help with marketing specific targets and portion of
certain products.
Component 2 Part A

- Self-checking data and minute to process data will enable the


supermarket to optimize its system and make sure that the process of
checking out is as seamless and short as possible.

Factors that affect quality of data

Collection method

Different methods of data can lead to different outcomes, surveys can lead to
users having bias response due to opinion questions. Observation can lead to
something called observation bias which leads to interpretations that best
suit the observer’s point of view. Manual forms of data that require humans
manually putting in data could lead to typos and human error. The data being
collected at the point of sale is more accurate as it is quicker and avoids any
bias or human error.

Completeness

If the data is not complete, the missed data is deemed useless and a waste
of time. Some parts of the data are quite crucial for finding trends. For
example, in the supermarket, without data from the daytime periods, you will
not be able to find s lot of the trends as there is more shopping in the
daytime than in the nighttime.

Consistency

Consistent data has a consistent format and that all the data is arranged in
the same way. This makes it easier for data to be entered and analysed by
everyone. Data being collected at a consistent part in time makes it easier as
well. Collecting data at 12:00am Sunday will ensure data is collected form
the whole week before consistently and trends can be shown more efficiently.

Accuracy

Data collection method effects accuracy. Data collected by humans are


subject to bias and human error. While data collected computationally lacks
doors. Computer glitches can also cause the data to be inaccurate and lack
sense.

Validity

Valid data has to do with if the data is correct or not. Invalid data will make
the analyses incorrect, and it causes problems when analysing information.
This will lead to you making wrong decisions for the company and causing
you to lose money.
Component 2 Part A

Timeliness

Data being used must be up to date. Using old data will make the trend
analysis incorrect as it may have changed over time. Promoting a product
that was only famous 5 years ago will not be as effective as trends make
have changed. Money spent on advertisement will be wasted.

How can data be improved?


To make sure data is accurate we need to improve data validation. Validation
is making sure that the data follows rules when entering data.

Range check

A range checks that the data put in conforms with a particular range. An
example where this is used is in rating columns where you would want data
to be from the numbers 1 through 10.

Type check

Type checks ensure that the data being shown is in the expected data type.
This will make sure that the data is compatible with the other data shown and
makes it sensible to find patterns. An example of this is making sure that the
holiday types shown is the company offering a valid holiday type.

Lookup check
Component 2 Part A

This is used when data shown is from pre-existing set of data. Checking this
will makes sure that the data is valid and is found in the allows range. This
helps in making data more accurate and consistent. The holiday company
used this when checking that the booking system ratings are their actual
ratings, and the data is accurate.

Presence check

This is a validation check that makes sure that there are not any sections
that are left blank. This will help the holiday company as important data as
the date the survey was done could be left out and it will be a lot more
difficult to see trends or progress.

Length check

This is a check that makes sure that the data put in is in its correct length. An
example is making sure that phone numbers have eleven numbers and dates
use six numbers and three slashes. The holiday company will benefit from
this when looking at dates as using the same length will avoid confusion
when making sense out of the data.

Data verification
Verification is a process done to make sure that the data is put in correctly
and makes sure that it matches the original data the ways dada verification is
carried are:

Double entry

This is entering data twice and comparing the copies. An example of this is
two factor authentication where data must be given both on a computer and
on a phone in most cases.

Proofreading

This method is when someone looks at the data entered and compares it to
the original. This takes some time, and it could be inaccurate if a human does
it. Most people also tend to be biased and think that there are less mistakes
than they think in their work.

Privacy of customers
Component 2 Part A

The issue of big data has become a growing concern as companies continue
to collect vast amounts of information about individuals without their
knowledge or consent. This information can be used for bad purposes such as
identity theft, where others may pretend to be someone else using the
information collected. Additionally, inaccuracies in the data can occur when
individuals fail to update their information after moving or undergoing
changes. It is serious for individuals to exercise caution and take steps to
protect their personal information.

GDPR

To stop this from happening the rules called GDPR were formed. These laws
change the way companies are allowed to use people’s data. It makes sure
that the individuals know what kind of their data is being collected and for
what it is being used. This is good as a lot of times people give their data out
to companies without even knowing it.

Fraud

Data that is being collected has a chance of being stolen and collected. This
could be used to steal people’s identities. Stolen identities can be used to
frame for theft while the actual criminal is anonymous. It is quite difficult and
expensive to prove that this crime was not your and it could take a long time
to you get your identity back.

Targeting vulnerable groups

You and old people are particularly vulnerable to the internet. This is because
they are quite inexperienced with the safety measures and can end up
visiting sites that are dangerous or downloading malware without the proper
protections. Younger people could also find themselves in a dire situation
because they put personal data on a dangerous website. They could end up
seeing mature media on the internet or even end up falling for phishing and
pharming schemes. Older people are quite subjectable to phishing according
to data. They could end up handing in data on their bank accounts and get a
lot of money stolen. Good education on the dangers of the internet could help
keep their data secured.

Inaccurate information

There is a lot of data to be found everywhere on everyone’s devices but that


begats the question, “how much of our data is actually accurate.” There are
a lot of opened accounts that have a lot of personal data on it but are
forgotten. There are a lot of computer drives out there with personal
information but are lost. It is important that data is always set up to data and
are protected strongly so there are no problems in the future.

You might also like