You are on page 1of 13

People's Democratic Republic of Algeria

‫الجمهورية الجزائرية الديمقراطية الشعبية‬

Ministry of Higher Education and Scientific Research


‫وزارة التعليم العالي والبحث العلمي‬

Djillali Liabes University ‫جامعة جياللي ليابس‬


Of
Sidi Bel Abbès ‫سيدي بالعباس‬

File and data structures

Presented by: Dr. MENAD

2023-2024
Chapter 1: File Generalities
I. Basic Concepts:
Nowadays, the use of computers is widespread. It is, in fact, a simple and fast way of
research that allows professionals and ordinary users to obtain information on various subjects.
No one can ignore the advantages and benefits of using a computer in education, management,
communication, etc., etc., compared to the human being who:
- Processes less quickly;
- Computes with less accuracy;
- Memorizes less information in less time.
Thus, the computer is charged with replacing man in his functions:
- Processing;
- Calculating;
- Memorization.
In order to be able to solve the tasks set for it, the computer must have two essential
elements:
- The programs: specially designed and realized to answer the nature of the problem
posed.
- Data related to the problem that the programs will act on to achieve results.
These data and programs are processed by the machine as files.

1. Definition of a file:
Is a unified data structure that has a name. The computer file is a way to organize
information.
There are different types of files: Text, sound, data, source programs. Each file is
identified by a name and an extension to indicate the application that uses the file.
Example: course.doc, course is the name and . doc stands for the extension.

II. File, Record, Zone, Character:


1. The concept bit:
- The information processed by the computer is represented in binary form
- All information is converted into a sequence of bits (0 or 1)
- The bit is the elementary unit of information, materially it is a memory field that can
receive either 0 or 1
2. The character concept:
- The character is the 6,7- or 8-bit grouping, also called byte.
- 6,7- or 8-bit grouping to represent alphanumeric characters (0-9, A-Z) plus special
characters (?,@,#, »...).
- The character or byte is the unit of measurement for the information.

• 8 bits = 1 Byte
• 1 KB=1024 Bytes (210)
• 1 Mega Byte (MB)= 1048576 Bytes (220)
• 1 Gega Byte (GO)=1073741824 Bytes (230)
• 1 Tera Byte (TB)=1099511627776 Bytes (240)
Example:
Binary representations of some characters (ASCII code 7bits):
Character Code BCD (6 bits) Code ASCII (7bits) Code (8 EBCDIC bits)
0 000 000 000 0000 1111 0000
1 000 001 000 0001 1111 0001
2 000 010 000 0010 1111 0010
3 000 011 000 0011 11110011
. . . .
. . . .
. . . .
A 010 001 100 0001 1100 0001
B 010 010 100 0010 1100 0010
C 010 011 100 0011 1100 0011

3. The Zone Concept:


- The zone is the successive grouping of several characters.
- The zone represents information that is accessible through processing
- To distinguish between the different zones, we assign identifiers.
- The size or length of a zone is the number of characters in that zone.
- The type of characters in the zone defines the type of the zone.
Example:
Zone 1 Zone 2 Zone 3 Zone 4 Zone 5

Serial number Family name First name Date of birth Address


2022001 Mahmoudi Fatima 14 -04-2005 Sidi bel abbes

- The recording concept:


- A record is the grouping of multiple fields related to a single subject.
- A record is a set of data (information) stored in a file. This is the type of entity that any
database uses.
- A record is a collection of information contained in fields that relate to the same subject.
- To uniquely identify records, a field called key is used, whose value is unique for each
record.
Example:
Zone 1 Zone 2 Zone 3 Zone 4 Zone 5
Recording

Serial number Family name first name Date of birth Address


2022001 Mahmoudi Fatima 14 -04-2005 Sidi bel abbes

4. The File Concept:


- A file is a set of information stored on a physical medium (tapes, discs, CD...).
- A file is a set of homogeneous information that is logically related.
- A file is a set of fields (zone) grouped as records.
- A file can be stored permanently on a physical medium.
Example:
- The student file contains the student information for a university.
- The vehicle file contains the vehicle information for a fleet of vehicles.
- The library file contains the book information for a library.

III. File Activity:


The activity of a file characterizes all manipulations that are made to the file. It is
defined by the following four characteristics:
- Consultation rate
- Frequency of consultations
- Renewal rate
- Stability of the file

1. Consultation rate:
Refers to the relationship between the number of records displayed (or changed) and
the total number of records in the file; Over a period of time.
Number of records modified or (viewed)
Cr =
total number of records
Thus, a distinction is made between:
- The basic rate for consultation: for a single treatment (implementation of a
programme).
- Annual consultation rate: for one year
2. Frequency of consultations:
Refers to an annual frequency, i.e., the number of times a record in the file is accessed
for simple query or update in a year.
3. Renewal rate:
Refers to a specific period of time. It expresses the relative number of new records
inserted into the file.
4. Stability of the file:
Refers to a specific period of time. A file is considered stable for a given period if the
number of records created is approximately equal to the number of records deleted.

IV. File typology:


There are different types of files, depending on:
- The type of information they contain
- Their lifetime
- The type of media used for storage
- The organization of the information

1. The type of information they contain:


A file can contain two types of information: Data or programs, and depending on the case we
speak of data file or program file.
• Program files:
These are the files that contain the instructions of the program to be executed. These instructions
are first written in any programming language (Pascal, C, etc.).

• Data files:
These are the files that group the data that a program can possibly use and/or the results
that it leads to. Unlike program files, data files are scalable, meaning that the data can be scaled:
- Modified
- Deleted
- Added
- Consulted

2. File Type by Lifespan:

A file may exist continuously or only momentarily, depending on the purpose, value, and
relevance of the information it contains. As a result, files can be divided into 4 categories:
- Permanent file
- Movement file
- Manoeuvre file
- Intermediate file
- Archive file
- Table file

• Permanent file:
Is a file whose information is of paramount importance within an application. Its content
doesn't suffer frequent changes.
Example:
In the education management program, the Students file serves as a permanent file.

• Movement file:
It is a temporary file that is used to update a permanent file and has a brief lifespan. When
the treatment is over, it is useless.
Example:
We consider the case of a tuition management service that manages a permanent file on each
student. There is a new segment that begins at the beginning of each session. New enrollees are
initially kept in a Registrant file, and once their academic pursuits have been verified, they are
moved to the Students file. The student file is then updated using the registered file.
• Manoeuvre file:
When there is not enough memory to contain all the data required for a particular
processing, it serves a purpose. Its life expectancy is constrained by that of the therapy that gave
rise to it.
Example:
A Notes file with the structure (Student number, note1, note2,..., note12) is used to create the
list of students who take the remedial tests at the conclusion of the semester. It is considered
that there are a lot of students and that their memory capacity is fairly low. The outcomes of
this test will be saved in a Decision file that will include the data: (student number, decision1,
decision2,..., decision12) as a decision may take one of the two values: dispensed or maintained.
This is done to prevent possible memory saturating.

• Intermediate file:
It includes outcomes from a particular treatment that can be applied to that treatment or by other
subsequent treatments that come after it. Unlike the manoeuvre file, which only transmits its
data to the computer that generated it, it permits the transmission of data across programs.
Additionally, the lifespan of an intermediate file is not constrained by the lifespan of the
processing that generated it, allowing for its use by other processes.
Example:
A result file is generated from the grade files at the conclusion of each academic year, saving
the student grades for both semesters. The result file is used for two tasks:
- A more detailed list of accepted students.
- Assignment of students accepted for internships based on their performance
The Result file is an intermediate file.

• Archive file:
It is a file that is used to store data over a program's lifespan.
Example:
After a section has graduated from the university and completed its last year of training, all of
the information about its students will be transferred from the Student file to a School Archive
file so that it may be consulted whenever necessary.

• Table file:
A table is a collection of records with at least one value and one or more arguments. In
most cases, access is made by the argument and we leave by its value.

Arguments Values
A1 V1
A2 V2
A3 V3
. .
. .
An Vn

Example:
Product Table:

Product number Designation Quantity


01 Desk 10
02 Table 14
03 Chair 50

3. File Types by Storage Media Used:

Although a file's content is unaffected by the medium used to store it, several aspects of the
file, such as how data can be accessed from it, are strongly tied to the nature of the medium.
For instance, only sequential access is possible for files saved on magnetic tapes, whereas both
sequential and direct access is possible for files stored on magnetic disks.

4. File Types by Information Organization:

One of a file's most crucial aspects is the organization used because it specifies how to retrieve
the information it contains. Three primary types of organizations exist:
- Sequential organization
- Indexed sequential organization
- Random organization.

V. Fundamental operations on files:

The operations that can be performed on files are as follows:


- Creation
- Update
- Deletion
- Sorting
- Merge
- Join
- Split
- Extract
- Copy
1. Creation:
A file is created by:
- Defining its structure, or specifying the types and lengths of its various fields.
- Put the objects on a medium and enter them in the file.

Example:
To create the Teachers file, we define its structure and the size of its articles as follows:

Field name Length of the field


Id 6 characters
Teacher name 10 characters
Teacher first name 10 characters
Date of birth 8 characters
Teacher address 20 characters

2. Update
The three treatments listed below are included in the update:
- The Creation of new files.
- The removal of current files.
- Modifying a file's content.

3. Deletion :
To delete a file is to cancel its storage, that is to say to erase all the records that make up the
file. There are two types of deletion: logical and physical:
- Logical deletion entails declaring a file as transparent, although the data is still present on
the medium. File restoration is possible.
- The file can only be physically deleted once. The file's old storage space will be reclaimed.

4. Sorting:
The search operation should be the primary action taken with the files. It would be
interesting to record the information in a carefully considered manner in order to reduce the
time required for this search (and to enable the user to obtain the information as quickly as
possible). Sorting is the term for this storage activity.
A file is sorted when its records are arranged in ascending or descending order of the
value of one or more sorting arguments.
Example:
We take into account the student record that includes the following data: student number,
student name, student first name, and student specialization. Selecting the key (student
number) as the sorting argument is the simplest approach to arrange this information. This
approach isn't usually advised, though. In fact, it's important to consider both the user's
demands and how the file will be used in the future.
5. Merge:
It involves combining the recordings from two or more files into one file. The type
(structure) of the merging files must be the same. The resultant file will so have the same
structure as the originating files.

Example:
In a tuition management service, it is believed that students are controlled through three
files in accordance with the speciality they have studied:
The file Stud_WIC:
Registration Name First name Address Speciality
number
2022001 Ben Ali Tarek Sidi bel abbes WIC
2022002 Mahmoudi Sarah Oran WIC

The file Stud_RSSI :

Registration Name First name Address Speciality


number
2022003 Messaoudi Karim Saida RSSI
2022004 Mammeri Mohamed Oran RSSI

The file Stud_ISI:

Registration Name First name Address Speciality


number
2022005 Serhan Amina Alger ISI
2022006 Laabani Imane Sidi bel abbes ISI

The result of the merge is the following Student file:


Registration Name First name Address Speciality
number
2022001 Ben Ali Tarek Sidi bel abbes WIC
2022002 Mahmoudi Sarah Oran WIC
2022003 Messaoudi Karim Saida RSSI
2022004 Mammeri Mohamed Oran RSSI
2022005 Serhan Amina Alger ISI
2022006 Laabani Imane Sidi bel abbes ISI

6. Join:
A new file is created by combining several source files.

7. Split:
It is the opposite of the joining operation. Multiple receiver files are generated from one
sender file.

Example:
We have the "Employee address" information from the sender files (Telagh 22007 Sidi bel
abbes).
Employee address
Telagh 22007 Sidi bel abbes

We must perform processing on the new file that is based on the wilaya. In this instance, the
"Address" field will be divided into the following subfields: City, Postcode, and Wilaya:
Employee address
City Postcode Wilaya
Telagh 22007 Sidi bel abbes

Only the method of representation changes; the data itself stays the same.
8. Extract:
This processing comprises of extracting or copying recordings (parts of a file) or parts of
recordings onto another medium in accordance with a specified criterion.
Example:
Print the list of admitted students from a Students file containing the information: number
student – student name – student first name – student address – student specialty – student
results.
The printed list will only contain the information: student number – student name – first name
student – student results.
9. Copy:
A file's contents are duplicated on a medium when it is copied. Several factors may make
this treatment acceptable:
- Permit a quicker access time.
- Increase reliability to prevent information loss.

VI. Differences between Central Memory (RAM) and Secondary Memory:


All data must be present in central memory before being manipulated by a program.
Central memory is working memory.
Example:
When you write a C program that stores a value in a variable, you use Central Memory.

On the other hand, secondary memory is only used to save information before being
loaded into central memory for processing. This is a storage area.
Example:
When you write a C program and save it to disk, you are using permanent storage.

The information is simply saved in the secondary memory before being put into the
central memory for processing, on the other hand. It serves as a permanent storage space. Or
after a computer is switched off, the data is still there. The secondary memory is typically big,
but in comparison to the core memory, the access time is very long. Additionally, secondary
memory is typically non-volatile and costs less per storage unit than central memory. In the
following table a comparison between central memory and secondary memory:
Central Memory Secondary Memory
Storage Temporary (Volatile) Permanent (Non volatile)
Memory size Small size Large size
Access time 100,000 to 1000,000 times Very long time
faster
Cost by byte (Price) More expensive Less expensive
VII. Physical and logical files:
• Logical file :
The structure of the logical file—that is, the structure of the records it contains—describes
it. The physical media that will be utilized to store the logical file is not necessary.
• Physical file:
A logical file is converted into a physical file by being kept on a physical medium.
The content and physical backing of a file define it as a physical object.
VIII. Physical recording and logical recording:
• Logical Record:
Items or records in a logical file are referred to as logical records. A record's size is
expressed in bytes or characters and can be constant, cyclical, or indefinite.
- Fixed length record: A record's length is determined by the character count in its
structure.
Example :
Student file fixed length recording 50 characters:
- Number (8 characters)
- Name (20 characters)
- First name (20 characters)
- Section (2 characters)

- Variable length recording:


Example:
Employee file: variable length record
Teacher number, Last name, First name, Grade, Position, Child(ren), Marital status;
Because the Child(s) zone is variable: Employees do not have the same number of children.
Solution:
Create a file for children with the following structure: Child (Teacher number, First name,
Date of birth).
- 1 Fatima 06/30/1990
- 1 Mohamed 05/28/1994
- 1 Amina 01/28/1998
- 2 Sarah 04/14/2005

-Record of indefinite length:


The length of this variable record can only be established at the time that information is
entered.
• Physical recording:
Physical records are records contained within a physical file. The quantity of data
transmitted between the storage unit and the central memory is represented by the physical
record. The smallest unit of data that may be read or written in a single operation is a physical
record, often known as a block. One or more records, as well as potential control and
organizational information, are included in the physical record.
IX. The blocking factor:
The Blocking factor is corresponding to the number of logical records in a physical
record. It basically saves time, including the execution of I/O operations, to block a specific
amount of logical records into physical records.

Number of articles
Blocking factor =
Number of physical records (block)
Example:
Consider a file of 1000 logical records with blocking factor equal to 1, we therefore need
1000 I/O to read the entire file however we only need 500 I/O operations if the blocking
factor is equal to 2.
X. The static file and the dynamic file:
We represent the secondary memory as a contiguous zone of blocks with sequential
numbers (these numbers serve as the block addresses). Blocks are contiguous groups of
identically sized bytes that include data (records) from files among other things. The
secondary memory in the diagram below houses the three files E, F, and G.

File E consists of 4 contiguous blocks, file F consists of 7 contiguous blocks and file G is a
list of blocks.

You might also like