You are on page 1of 14

1.

DATA, INFORMATION, KNOWLEDGE AND PROCESSING

1.1 Data, Information and Knowledge


Data is raw numbers, letters, symbols, sounds or images with no meaning.
Data needs to be turned into meaningful information and presented in its most
useful format
Data must be processed in a context in order to give it meaning
Information: data with context and meaning
Knowledge: information to which human experience has been applied
Knowledge requires a person to understand what information is, based on their
experience and knowledge base.
Knowledge base: the amount of information a person or medium knows that
often expands over time with the addition of new information
knowledge is also what a machine knows through the use of a knowledge base
consisting of rules and facts, often found in knowledge-based systems, modelling
and simulation software.
1.2 Sources of data
Static data: data that does not normally change
Static data is either fixed or has to be changed manually by editing a document.
Examples: title of web pages, magazines, CD ROM, instructions on a data entry
screen
Dynamic data: data that changes automatically without user intervention
Dynamic means ‘moving’. It is data that updates as a result of the source data
changing
Example : website, availability of tickets concerts, products price for a point of
sale
• Static information source:
 Sources where information does not change on a regular basis.
• Information can go out of date quickly.
• Information can be viewed offline since no live data is
required.
• More likely to be accurate since information will be
validated before information.
• Dynamic information source:
 Information is automatically updated when source data changes.
• Information most likely to be up to date.
• An internet/network connection to source data is
required.
• Data may be less accurate since it is produced very
quickly so may contain errors.

Direct data source: data that is collected for the purpose for which it will be used
Indirect data source: data that was collected for a different purpose (secondary
source)

Direct data source Indirect data source


Data collected are relevant because Required data may not exist and
what is needed has been collected irrelevant data may exist so will take
time to sort
Original source is known so is Original source is not verified
trustworthy
May take time to collected original Data is immediately available
source then data that already exists
Data will be up to data Data may not be up to data
Data collected can be presented in If statistical data is needed it is their in
required format large samples
Data is more likely to be unbiased Data likely to be biased because of
unverified source
Extraction may be difficult if in
different format

1.3 Quality of information


• Accuracy: Data must be accurate.
Data must be correct for example if the amount of a product is 903.9 and the
decimal point is 90.39 then the price changes and becomes in accurate
• Relevance: Information must be relevant to its purpose.
Giving additional data that is not actually required for your purpose example gave
a bus timetable when you want to catch the train
• Age: Information must be up to date.
Information that is out of date cannot be used without updating example the
number of residents lived on 2011, the number has changed in 2020 and more
people live in that town now.
• Level of detail: Good quality information required the
right amount of information.
Not giving the whole information example ordering a pizza and giving location
without the building number the driver doesn’t have enough information to fulfil
the order
• Completeness: All information required should be present.
All the information should be present when making a delivery.
1.4 Coding, encoding and encrypting data
Coding: representing data by assigning code to it for classification or identification
You are probably familiar with coding, we already send messages in code when
texting
Example
LOL =laugh out loud
2=to
4=for
BRB = be right back
M= male
F= female
More examples
• DR=dress
• 2XL=extra extra large
• BL=blue
• DR2XLBL=a dress in size extra large and color blue
Advantages of coding Disadvantages of coding
Data can be presented in small space Limited number of codes
Less space required Interpretation may be difficult (might
thing ET is Ethiopia when registed one
is egypt
Speed of input increases Similarity may lead to errors (O and 0)
Data can be processed faster Efficiency decreases if user does not
know the code
Validation become easier (checking it Some information may get lost during
has be in a particular length coding
registration number maximum of 3
Increases confidentiality (only people
who know the code increases safety)
Increases consistency

Encoding: storing data in a specific format


Computers store data in binary digits they do not read the data the same way we
do. Data is converted in 1s and 0s. One is on and 0 is off. Therefore, data need to
be encoded in a format the computer understands
Text is encoded as a number that is then represented by a binary number

Codec: a computer program that encodes and decodes a digital data stream
ASCII (American Standard Code for Information Interchange) is a common
method for encoding text
Images are encoded as bitmaps through various parameters (such as
width/height, bit count, compression type, horizontal/vertical resolution, and rast
data.)
The graphic screens are made up of tiny grid called pixels. The more pixels=high
resolution= better quality= more storage needed.
Bitmap images are widely used on smartphones, cameras and online. Bitmaps are
organized as grids that are colored squares of pixels. That’s why when zooming in
a image, pixels are stretched into larger blocks. This is why bitmap appear as poor
quality when enlarged.
A byte consists of eight bits and so will represent eight pixels
Each colour of an image is stored as
a binary number.
As the image gets bigger its takes larger storage.
Therefore, a method called run-length encoding
(RLE) can be used to reduce the amount of storage
space that is used. This is known as compression.
Sometimes when files are compressed, they use lossy compression, which means
some of the original data is removed and the quality is reduced.
• Images are often encoded into file types such as: common bitmap image file
type
o JPEG/JPG (Joint Photographic Experts Group)
o GIF (Graphics Interchange Format)
o PNG (Portable Network Graphics)
o SVG (Scalable Vector Graphics)
Sound is encoded by storing the sample rate, bit depth and bit rate.
When sound is recorded, it is converted from original analogue format to a digital
format, which is broken down into thousands of samples per second. Each sound
sample is stored as binary data.
The Sample rate or frequency is the number of audio samples per second.
Measure in Hertz (Hz)
The higher the sample rate, the higher the quality of the music, but also the more
storage that is required. Each sample is stored as binary number
The bit depth is the number of bits (Is and Os) used for each sound clip. A higher
bit depth will give a higher quality sound.
The bit rate is the number of bits processed every second.
• bit rate= sample rate x bit depth x number of channels
• bit rate is measured in kilobits per second (kbps)
• Uncompressed encoding uses WAV (Waveform Audio File Format)
lossy compression: reduces files size by reducing bit rate, causing some loss in
quality
lossless compression: reduces the file size without losing any quality but can only
reduce the file size to about 50%
CD sound file has a sample rate of 44.1kHz (44100Hz), a bit depth of 16 bits and
two channels (left and right for stereo).
 bit rate=44100 x 16 x 2=1411200 bps=1.4mbps (megabits per second)
 That means that 1.4 megabits are required to store every second of audio.
 bit rate by the number of seconds to find the file size.
 file size (in bits)=1411200 x 210=296352000 (296 megabits)
 There are eight bits in a byte and we use bytes to measure storage, so the
file size in bits is divided by eight:
 file size (in bytes)=296352, 000+8=37044000 megabytes=37MB
(megabytes)
Video
when video is encoded it needs to store images and sound. Images are stored as
frames, standard quality video have 24 frames per second (fps)normally. High
quality video uses 50 fps and 60 fps. The higher the frames per second= larger
storage= but higher quality.
Size: A HD video will have an image size of1920 pixels wide and 1080 pixels high.
The bit rate of video include both audio and frames. The bit rate is the number of
bits to be processed every second. A higher frame rate requires a higher bit rate.
Example: in one hour, eight-bit HD video with 24 fps would require 334GB
(gigabytes) of storage, which is too much data to download. Therefore,
compression is required. Compression involves:
 Resolution
 Bit rate
 Image size
These all result in lossy compression. (Digital video (DV)
Advantages of encoding Disadvantages of encoding
Reduced file size The required codecs cannot be
installed so file cannot be saved in the
desired format.
Enable real time streaming of video The necessary codecs need to be
and music in restricted bandwidth installed to open encoded files.
Reduce time take to download files Not all software is able to open
different file types.
Enable different formats to be used some hardware such as music and
(GIF allowing animated videos) video players only play files encoded in
certain formats ( Cd player playing
mp3 files but cannot download it)
Easy to download music, images or quality of images, sound and videos is
video from websites. lost when files are compressed using
lossy compression
text encoded using ASCII or UNICODE
needs to be decoded using the correct
format when it is opened.

Encryption (protection): scrambling data so it cannot be understood without a


decryption key to make it unreadable if intercepted.
The purpose of encryption is to make the data difficult or impossible to read if it is
accessed by an unauthorized user
Accessing encrypted data legitimately is known as decryption.
Data can be encrypted when it is stored on disks, or other storage media, and
across the internet.
Caesar cipher is a secret way of writing, selects replacement letters by shifting
along the alphabet.
Decipher: when you decrypt the message.
Convert a message into encrypted message. It is a special type of algorithm
H=K etc.
Symmetric encryption (oldest method)
It requires both the sender and recipient to possess the secret encryption and
decryption key. The secret key needs to be sent to the recipient.
Can send the message through post or internet

Asymmetric encryption (public key cryptography)


Sawsan sends a message to Jannat. Sawsan encrypts the message using Jannat’s
public key. Jannat receives the encrypted message and decrypts it using her
private key
This method takes a longer time to decrypt the data.
To find the public key, a digital certificate (username etc.) are required which
identify the user and provide the public key.
When encrypted data is needed by Jannat. Then the computer will request the
digital certificate from the sender. The public key (sender will give back the public
key to the recipient that was send for encrypting the message) can be found
within the digital certificate
Everyone will have the public key but who receives will only have the private key
to decrypt and access the data.
SSL: Secure Sockets Layer (secure the websites)
TLS: Transport Layer Security
Asymmetric encryption is used for Secure Sockets Layer (SSL) used to secure
websites. Transport Layer Security (TLS) has replaced SSL but they are both often
referred to SSL. Once SSL has established an authenticated session, the client and
server will create symmetric keys for faster secure communication.
HTTPS: Hyper Text Transfer Protocol
Hard disk
Disks will encrypt every single data that is stored on the disk. It is different from
encrypting single files. To access the files on the disk, you need an encryption key.
This encryption method is used on other storage media, backup, universal serial
bus (USB) drive, flash memory. These storage media are portable so need to be
encrypted as they can be stolen. If anyone tries to open it cannot understand as it
is encryption. To open the files or access the data a password or fingerprint to
unlock the encryption.
HTTPS
if a normal web page which are not encrypted are sent over http. Anyone who
sends data over https would be able to read the content of the page. This is
particularly a problem when sending sensitive data like credit card information.
HTTPS is a secure communication over the internet. It encrypts the data and
makes secure web pages. It uses TLS and SSL to encrypt and decrypt pages and
information sent and received by web users.
This method is used by banks when a person logs in the address link will start with
https://--
The browser requests a
secure page, it will check the digital certificate (user info) if it is trusted and valid,
and certificate is related to the coming site. The
web browser uses the web server’s public key to
encrypt a new symmetric key and sends that
encrypted symmetric key to the web server. The
web server uses its own private key to decrypt the
new symmetric key.
The browser and web server now communicate
using the same symmetric key.
Email
Email encryption uses asymmetric encryption. This means that the receiver must
have the private key that matches the public key used to encrypt the original
email (the public key will be sent to the sender by the receiver).Both the sender
and receiver needs to send digital certificate to have them in their contact
Encrypting an email will also encrypt the attachments.
How encryption protects data
Encryption scrambles the data and makes it impossible to understand. However,
it does not stop data to be stolen. But with strong 256-bit AES encryption makes
virtually impossible for someone to decrypt the data, so data is protected from
prying eyes.
1.5 Checking the accuracy of data
Validation: the process of checking data to make sure it matches acceptable rules.
Example: a product is valid to use until expiry date. The rule here is that the date
must be before the expiry date. Then the data is accepting the rules so data is
valid.
Presence check: used to ensure that data is entered(present).
Example: when filling in contact details in a form. If data is not entered in a
column shows and error message asking them to enter the data
Limit check: ensures that data is within a defined range. contains one boundary,
either the highest possible value or the lowest possible value.
Example: Age>=18 The respondents have to be at least 18 years old
Range check: ensures that data is within a defined range. contains two
boundaries, the lower boundary, and the upper boundary.
> greater than, < less than, >= greater than or equal to, <= less than or equal to
Example: gym class accepts ages between 15 to 70 only
Type check: ensures that data must be of a defined data type.
Data that is of the correct data type is valid. Data that is valid and of the correct
data type is not necessarily correct
Example: 22/06/2087 The date is valid because itis a date data type, but it is
clearly incorrect
Length check: ensures data is of a defined length or within a range of lengths.
Example: A password must be at least six characters long.
Having the correct length doesn’t necessarily mean the format is correct (2ndfeb)
it is six characters but doesn’t follow the required format.
Format check: ensures data matches a defined format.
An email address must include an @ symbol preceded by at least one character
and followed by other characters.
fbc@gh
Data that is valid and of the defined format is not necessarily correct. An email
address of fdc@jb meets the rules above but is clearly incorrect
Lookup check: rests to see if data exists in a list. Similar to referential integrity.
When asking a user for their gender, they Can respond with 'Male’ or ‘Female’. A
lookup validation rule would check to see that the values are within this list
Consistency check: compares data in one field with data in another field that
already exists within a record, to check their consistency.
When entering the gender of ‘M’ or ‘F’, a consistency check will prevent ‘F’ from
being entered if the title is ‘Mr’ and will prevent ‘M’ from being entered if the title
is ‘Mrs’ or 'Miss’.
Check digit: This is used when you want to be sure that a range of numbers has
been entered correctly for example a barcode or an ISBN number:
ISBN 1 84146 201 2
The check digit is the final number in the sequence, so in this example it is the
final ‘2’
In this image the check digit is the final sequence “6”
The computer will perform a complex calculation on all
of the numbers and then compare the answer to the
check digit.  If both matches, it means the data was
entered correctly. 
Check digits are used to identify errors in data entry. Instead of typing 2236, types
2286 so the check digit corrects the error.
Proof reading: checking information manually.
Example: when this book was written it was checked for spelling errors, grammar
errors, formatting, and accuracy. Proof reading is done in documents. The author
may check their own document but miss out some errors. So proof reading will
accurately check for errors.

Verification: the process of checking whether data entered into the system
matches the original source.
• Visual checking: Visually checking the data if it matches the original source, by
reading and comparing, usually by the user.
Visual checking does not ensure that the data entered is correct. If the original
data is wrong, then the verification process may still pass
For example, if the intended data is ABCD but ABC is on the source document,
then ABC will be entered into the computer and verified, but it should have been
ABCD in the first place
• Double data entry: Data is input into the system twice and checked for
consistency by comparing.
The two items of data are compared by the computer system and if they match,
then they are verified.
If there are any differences, then one of the inputs must have been incorrect.
By using both validation and verification, the chances of entering incorrect data
are reduced.
If data that is incorrect passes a validation check, then the verification check is
likely to spot the error.
The validation rule is that a person’s gender must be a single letter. N is entered.
This passes the validation check but is clearly incorrect. When verified using
double entry, the user enters N first followed by M the second time. The
verification process has identified the error. However, it is still possible that the
user could enter N twice and both the validation and verification processes would
fail.
The correct letter entered in M.
1.06 Summary
Information has context and meaning so a person knows what it means. The
quality of information can be affected by the accuracy, relevance, age, level of
detail and completeness of the information. Proofreading is the process of
checking information.
Data are raw numbers, letters, symbols, sounds or images without meaning.
Knowledge allows data to be interpreted and is based on rules and facts. Static
data does not normally change. Dynamic data updates as a result of the source
data changing. Data collected from a direct data source (primary source) must be
used for the same purpose for which it was collected. Data collected from an
indirect source (secondary source) already existed for another purpose.
Coding is the process of representing data by assigning a code to it for
classification or identification. Encoding is the process of storing data in a specific
format. Encryption is when data is scrambled so that it cannot be understood.
Validation ensures that data is sensible and allowed. Validation checks include a
presence check, range check, type check, length check, format check and check
digit. Verification is the process of checking data has been transferred correctly.
Verification can be done visually or by double data entry.

You might also like