Organisation and Structure of Data

• What is a file? A collection of data stored in one unit

• This unit is identified by a filename

File Structure
• A file is made up of a number of records Each line in a file is a record • A record is made up of a number of fields Each field in a record holds a piece of data

File Structure
Record Type B B B B Credit / Debit C D D C Date 01/05/2011 03/05/2011 04/05/2011 12/05/2011 Description Counter credit Chq No: 176534 Chq No: 176535 Amount 250.00 26.34 134.28 50.00


Counter credit


File Structure
• When all the records are the same length the file is said to be a: fixed length file
File structure ID: 5 Surname: 15 Forename: 15 DOB: 8




DOB 23/041978 08/03/1979

92013 Sidney 92017 Lorraine 92114 Mandy

LAIDLAW 12/11/1979

Fixed Length Files • Advantage: – Reading them can be very fast because the computer knows where each record/field is • Disadvantage: – Lots of unused space in each record. therefore larger file sizes .

then the file is said to be: variable length file • Note how each field is separated by an * .File Structure • If each record has a different record length.

File Sizes • Can only be worked out on fixed length files 1.000 bytes . Finally multiply by the number of records in the whole file: 46 x 1000 = 46. Add the fields together to get the record size: ID: Surname: Forename: YearGroup: FormNo: DOB: 5 15 15 2 1 8 46 3. Need to work out how long each field is: 2.

gender. DOB. amount) • Variable length files: – Very good when you fields that differ in size – Eg: customer details (title.etc) . name.File Structure • Fixed length files: – Very good when you have lots of data that is always the same length – Eg: transaction details (transaction id. bank sort code. account number. address.

File Types • Serial file: Contains data in no particular order – records are usually stored in the order they are received • Records on this type of file will be read from top to bottom .

File Types • Sequential file: Records are stored in some sort of order – for example account number order • File is still read from top to bottom. but is sorted first .

File Types • Indexed sequential file: Each record is given a key which uniquely identifies it. and the file is kept in the key order • The file can be: – Read sequentially from top to bottom – A specific record found because of the index .

File Types • Random access file: Each record is given a unique key is generated by an algorithm • Because of the key. the file does not need to be kept in order • The file will be accessed based on the key and not reading from top to bottom .

Adding Data to Files • Serial files: – New records are appended to the bottom of the file • Sequential files: – Records are read in one at a time and the new record is slotted in when appropriate 1 2 Old file 3 5 4 1 2 3 4 5 New file .

the same as sequential files • Random access files: – The records don’t need to be stored in any order. The index that points to where the record is located is updated .Adding Data to Files • Indexed sequential files: – The record is added based on the key being slotted into the correct place.

The deleted record is just missed out 1 2 3 Old file 4 5 1 2 3 5 New file • Sequential files: – Same as with serial files .Deleting Data from Files • Serial files: – Each record is copied across one by one to a new file.

only the key needs to be deleted without reorganising the file structure .Deleting Data from Files • Indexed sequential files: – Both the key and the record must be deleted • Random access files: – As the record can be stored anywhere.

File Types • All of the file types described previously are relatively old. though they are still used • A newer method for storing data is a: database .

Databases • What is a database? A collection of related data organised in a structured way so as to allow easy management of the data • What do we mean by management of the data? – – – – Selection (retrieval) of data Updating data already there Inserting new data Deleting data we don’t want anymore .

Databases • What is data? Raw facts or figures What is information? Data that has a meaning .

is this data or information? 20/06/1969 Data • Whilst it is obviously a date. you don’t know what the date relates to Date of first moon landing = 20/06/1969 • Now it is information .Databases • So.

Databases • A database is managed by: Database management system (DBMS) • A DBMS is: A collection of programs that provides the necessary tools to create and manipulate the data in a database .

Database Management System • The DBMS sits between the applications you are running and the files that are holding the data Applications DBMS File System • The DBMS will manage this data by: – Checking for data inconsistencies – Minimising duplicated data – Retrieve related data from different files .

Databases Advantages Data can be accessed quickly and manipulated to create new data A single database can be shared by many users Data validation can ensure good quality data Data duplication can be avoided Security of data can be centralised Disadvantages Requires time to set up Centralised data can be easier to steal If data is not correct then all users will see the wrong data .

Elements of a Database • Table: A complete set of data – the equivalent of a file. Also known as an attribute . Also known as an entity One row of related data within a table • Record: • Field: A property or characteristic of a table.

Types of Database • Flat-file database A single table • Relational database Multiple tables .

Designing a Database • The first thing to do is to work out what data we have to store • To do this you would do the analysis part of the software development lifecycle Questionnaires Interviews Any current documentation .

we need to decide what is data and what is information Age: 21 information data • How could we store this? Date of birth .Designing a Database • Next.

Designing a Database • Next. we need to break everything down into: Atomic data Mr Brad Pitt Could be held as name Title Forename Surname But we have 3 separate pieces of data Atomic data .

Atomic Data • Worcester Sixth Form College • Spetchley Road • Worcester • WR2 5LU Name / Organisation Address Line 1 Town / City Postcode • 01905 632600 Varies – usually TelNo. but could be AreaCode and TelNo .

Data Types • Standard data types: String / Text Integer / Number / Short / Long Decimal / Real / Single / Double Date and Time Boolean Why don’t we have currency? .

Databases • To help organise the data we need a … Primary key • This is … A field in which a unique piece of data can be held for each record .

Primary Keys • What sort of field would be suitable for: Students at college? StudentID Customers for an online shop? Email address / CustomerID Cars at a garage? Car registration .

Getting Data Back • One of the advantages to using a database is the ability to get back just the data you want • For example: – All the names of students over 18 – Registrations of cars with no road tax – How much tax a specific employee paid within a certain timeframe • We do this with the use of queries .

Queries • Queries involve using a special language called … Structured Query Language (SQL) • … to get back and manipulate data from the database .

Queries • Queries can be used in a number of ways: – SELECTing records from the database that meet a specific criteria (filtering) – COUNTing the number of records that meet a specific criteria – Performing calculations on fields that meet certain criteria (eg: SUM) .

Garbage Out .Validation and Verification • When a user enters data into a system there is a chance that it could be wrong • GIGO … Garbage In.

Validation and Verification • Errors can occur … – When data is captured – When hardcopy data is copied onto a computer – When data is transmitted within a computer system – When data is being processed by software • To prevent this we use validation and verification .

.Verification • Checking by comparison that no alterations have been made to data as it is transferred from one system to another or on first entry onto a computer - Validation • Checking that data is sensible – rejecting data that is not. • EG: keying data twice and comparing input/outputs • EG: presence check / format check.

Transcription errors • Occurs when data in manually copied .

Transcription errors … • Usually due to : – Bad handwriting – Misreading – Mishearing – Long strings of meaningless numbers .

Transcription errors … • Main type .transpositional errors: – When two characters are swapped over 134638 136438 .

591 entered as 2385.Activity • For each of these transcription errors explain what is wrong and how the error is likely to have been made:  SO23 5RT entered as SO23 SRT  Leeming entered as Lemming  419863 entered as 419683  2000000 entered as 200000  238.91  23/05/89 entered as 23/05/07  199503 entered as 195503 .

Transmission errors • Data that is already entered correctly on a computer becomes corrupted when transferred to another computer .

Processing Errors • Errors due to incorrectly written software • Could be due to: – Incorrect calculations – Records being ignored – Wrong records being updated .

Verification • Remember this is … Checking that data is correct • For example: – Double-entry verification – Using check-digits – On-screen verification .

Verification • Double-Entry Verification: – This is the entering of data twice – For example: • Passwords • Email addresses .

Verification • On-screen verification: – This is where the user is presented with the entered data for them to manually check over – Often used in online forms when signing up to things .

Verification • Check digit: – A digit in a numerical field used to check that the overall sequence of numbers are valid – Done by automatically performing a calculation on the other numbers – EG: • • Barcodes Bank account numbers .

Validation • Remember this is … Checking that data follows the rules of the program and is sensible • For example: – Character check .

Validation • Character check: – Check for the appropriate range of characters – For example: • • Upper / lower case characters Numeric / alpha-numeric in correct fields .

Validation • Format check: – Checks to see if data is in a valid format – For example: • Dates being DD/MM/YYYY .

Validation • Length check: – Checks to see the length of the data entered is correct – For example: • • Account numbers Credit card numbers .

Validation • Range check: – Checks that data is neither too large or small and fits within certain parameters – For example: • Age is over 18 after DOB entered .

Mrs.Validation • Look-up lists: – Only allowing the user to select values from a list – For example: • • Titles when entering name (Mr. etc) Months when entering a date .

Validation • Presence check: – Checks to see whether any data has been entered into a field – For example: • When filling out online forms .

Validation • Type check: – Checks to see if the appropriate data type has been entered – For example: • Checking that text isn’t entered into a number field • Checking that only dates are entered into .

Sign up to vote on this title
UsefulNot useful