Professional Documents
Culture Documents
Holiday Date
First column – The customer’s email. Emails have the format of username@format.com.
This format has three parts. The username tasty is unique the address owner that can
have a combination of letters, numbers, and special characters, the @ symbol that
separates the username and the domain name, and the domain name which shows the
organization that is providing the email service.
Second and fourth column – Date of survey and Holiday Date. The dates of the survey
and the holiday are put in short form to reduce the amount of time typing the date. This
helps the company know the age of the data.
Third column – Gender. This is in a single letter format. If the person putting in the data
is made then the letter M will be shown and if the person with the data is female, then
the letter F is shown. The data is in short form to save time and reduce the risk of
human error. This helps us know if the user is as male or a female.
Fifth column – Holiday Duration. The data is shown in a numerical format to reduce the
amount of time used to put in the data, the number shows the number of days that the
holiday was. The use of just numbers makes it easier to perform calculations and know
how many days the holiday was.
Component 2 Part A
Sixth columns – holiday type. The data is shown in a word format. Text is put in this
format is it is easier to read and know what holiday is being shown. It is also useful as
there is wide selection of holidays shown.
Seventh, eight, tenth column – Accommodation, booking, transport ratings. These data
are shown in a numerical format. This is because it is in ratings from 1-10. This is data
that is imputed to show ethe quality of the activities done during the trip. This will help
the company a lot as it shows trends and areas that will need looking at.
Ninth column – Transport. The information is shown in a textual format. This makes it
easier to know exactly what transport is being shown. The data here shows the type of
transport being used by each person.
Twelfth columns – Price paid. The format for this data is in the currency of the British
pound which uses two decimal places. This data can be used to find trends and make
calculations.
Supermarket data
First column – customer number. This data is in a numerical format. It shows the unique
number that every customer has so that data is not mixed with another. The number is
seven digits long.
Second column – Date. The data is shown in a numerical format. The data shows the
exact data that the data was collected. the data is shown in short form as to reduce the
amount of time it takes for the data to be collected. This data shows what part of the
year particular items are being sold the most and the nature of things being bought by
people at months.
Component 2 Part A
Third column – Time. The time is show in short form to save time. The time shows the
exact time the checking out happened, and the data was collected. time was set in the
24-hour format. The time can show what exact part of the day most shopping is done.
Fourth column – amount spent. The amount spent shows the amount of money spent by
each customer. It is shown in the British pound currency format. The data can be used
to show the number of sales the supermarket has.
Fifth column – gender. Gender is shown in a single letter format. This reduces the risk of
human error. F is being used if the customer is female and M is being used if the
customer is male. It is faster to type one letter than the whole word. This is called coding
data for quick entry.
Sixth column – self checkout. This shows whether a person used self-checkout or not.
The letter Y was used if they did and N if they did not this makes it quicker for data to be
typed and avoid human error.
Seventh column – minutes to process. This is the amount of time it took for all items to
be scanned. It is put in a numerical value and is easier to be used for calculations and
pattern recognition.
The data has been collected for the company’s database, the electoral roll and a survey
company who gets data from customers via email. According to the company 50% of the
customers reached out to via email responded.
The electoral roll is a list of names and emails of people who are registered to vote in
government elections. Because the holiday company is using data from this roll it is
secondary data. The company has been given legal right to use this data for their own
gain. Using the government’s data is good because it is always accurate. This downside
of using electoral roll is that not everyone is able to vote in government elections.
The holiday company paid the survey company to collect data for them. This is
secondary data because they are using data that another party has collected for
themselves. The data company used an email-based system to collect data. This will
explain the 50% return as most people get annoyed by emails concerning surveys even
if there is a reward after completing the survey.
The holiday company uses both primary and secondary data. The primary data is
rotten from the booking system, so it is logged in the computer when a
customer books a holiday. This makes it easier to collect data as the use of
computers reduces the risk of human error as it will be done in a website form
with questions and text boxes. This however does not mean that the use of
computers completely nullifies the risk of human error. There is still the chance
that the person filling the survey makes a typo or does a miskick. There is also
the case of people ignoring questions if they do not understand it. This however
is easily counter able by making questions required. If it that it the case, it is still
possible that the person filling in data is lying due to confusion.
Secondary data was collected from the electoral roll from the local government.
Data collected for the electoral roll the local government registers were
91% accurate and 84% complete. The parliamentary registers were 91%
accurate and 85% complete. This is good as the higher the accuracy of the data,
the more likely that you will find trends and patterns that are more useful and
relevant to you. Have more than 80% of people complete the surveys is good as
the data has more participants so patterns are more prominent leading to better
decisions. Electoral rolls are usually updated annually, this is good as to make
sure that trends are more up to date.
Secondary data was collected from specialist survey company. Due to only 50%
of surveys being completed the data is less valid than if all the surveys were
completed. Another issue with the nature of the surveys is the opinion system.
People are more like to exaggerate on the whole experience when they did or
did not like one part of the trip. This could lead to patterns that show wrong or
exaggerated trends. This data however is still able to be used to see trends in
various parts of the company.
Supermarket data
Component 2 Part A
The supermarket collected primarily. The data is collected when the customers
use their loyalty card. This is good because it is always fully accurate, and it
reduces the risk of human error. It is also faster to do as swiping your card is
instantaneous. The data is dependable because it is big data with a lot of
samples, so trends are more obvious and valid to the company. Because data is
collected as items are sold, the data will no contain missing information
normally.
The information could have been better than it is now. For example, the type of
item bought could have been mentioned to see what kind of items are bought at
separate times and different months the most and the least. The payment
method could have also been useful. The data is also quite outdated so trends
could be more different that it was before. Having up to data is particularly good
for a supermarket as the trends could help the company know what to restock
and market more than others.
Decision making
Holiday company
The holiday company’s data can be used for many purposes such as:
Supermarket
The data the supermarket has collected can help the company in many ways.
Collection method
Different methods of data can lead to different outcomes, surveys can lead to
users having bias response due to opinion questions. Observation can lead to
something called observation bias which leads to interpretations that best
suit the observer’s point of view. Manual forms of data that require humans
manually putting in data could lead to typos and human error. The data being
collected at the point of sale is more accurate as it is quicker and avoids any
bias or human error.
Completeness
If the data is not complete, the missed data is deemed useless and a waste
of time. Some parts of the data are quite crucial for finding trends. For
example, in the supermarket, without data from the daytime periods, you will
not be able to find s lot of the trends as there is more shopping in the
daytime than in the nighttime.
Consistency
Consistent data has a consistent format and that all the data is arranged in
the same way. This makes it easier for data to be entered and analysed by
everyone. Data being collected at a consistent part in time makes it easier as
well. Collecting data at 12:00am Sunday will ensure data is collected form
the whole week before consistently and trends can be shown more efficiently.
Accuracy
Validity
Valid data has to do with if the data is correct or not. Invalid data will make
the analyses incorrect, and it causes problems when analysing information.
This will lead to you making wrong decisions for the company and causing
you to lose money.
Component 2 Part A
Timeliness
Data being used must be up to date. Using old data will make the trend
analysis incorrect as it may have changed over time. Promoting a product
that was only famous 5 years ago will not be as effective as trends make
have changed. Money spent on advertisement will be wasted.
Range check
A range checks that the data put in conforms with a particular range. An
example where this is used is in rating columns where you would want data
to be from the numbers 1 through 10.
Type check
Type checks ensure that the data being shown is in the expected data type.
This will make sure that the data is compatible with the other data shown and
makes it sensible to find patterns. An example of this is making sure that the
holiday types shown is the company offering a valid holiday type.
Lookup check
Component 2 Part A
This is used when data shown is from pre-existing set of data. Checking this
will makes sure that the data is valid and is found in the allows range. This
helps in making data more accurate and consistent. The holiday company
used this when checking that the booking system ratings are their actual
ratings, and the data is accurate.
Presence check
This is a validation check that makes sure that there are not any sections
that are left blank. This will help the holiday company as important data as
the date the survey was done could be left out and it will be a lot more
difficult to see trends or progress.
Length check
This is a check that makes sure that the data put in is in its correct length. An
example is making sure that phone numbers have eleven numbers and dates
use six numbers and three slashes. The holiday company will benefit from
this when looking at dates as using the same length will avoid confusion
when making sense out of the data.
Data verification
Verification is a process done to make sure that the data is put in correctly
and makes sure that it matches the original data the ways dada verification is
carried are:
Double entry
This is entering data twice and comparing the copies. An example of this is
two factor authentication where data must be given both on a computer and
on a phone in most cases.
Proofreading
This method is when someone looks at the data entered and compares it to
the original. This takes some time, and it could be inaccurate if a human does
it. Most people also tend to be biased and think that there are less mistakes
than they think in their work.
Privacy of customers
Component 2 Part A
The issue of big data has become a growing concern as companies continue
to collect vast amounts of information about individuals without their
knowledge or consent. This information can be used for bad purposes such as
identity theft, where others may pretend to be someone else using the
information collected. Additionally, inaccuracies in the data can occur when
individuals fail to update their information after moving or undergoing
changes. It is serious for individuals to exercise caution and take steps to
protect their personal information.
GDPR
To stop this from happening the rules called GDPR were formed. These laws
change the way companies are allowed to use people’s data. It makes sure
that the individuals know what kind of their data is being collected and for
what it is being used. This is good as a lot of times people give their data out
to companies without even knowing it.
Fraud
Data that is being collected has a chance of being stolen and collected. This
could be used to steal people’s identities. Stolen identities can be used to
frame for theft while the actual criminal is anonymous. It is quite difficult and
expensive to prove that this crime was not your and it could take a long time
to you get your identity back.
You and old people are particularly vulnerable to the internet. This is because
they are quite inexperienced with the safety measures and can end up
visiting sites that are dangerous or downloading malware without the proper
protections. Younger people could also find themselves in a dire situation
because they put personal data on a dangerous website. They could end up
seeing mature media on the internet or even end up falling for phishing and
pharming schemes. Older people are quite subjectable to phishing according
to data. They could end up handing in data on their bank accounts and get a
lot of money stolen. Good education on the dangers of the internet could help
keep their data secured.
Inaccurate information