You are on page 1of 19

Govt .

Post Graduate Islamia College FSD

Department: Physics
Assignment on: Processing Data Storage
Course code : CSI-321
Course title : Introduction of Computing Applications
Submitted to ;
Miss.Anam
Submited by;
Group C :

ʍɨռɖ ʀɛǟɖɛʀֆ
Roll no :
30 , 44
39 , 29
40
Data processing
Define:
Data processing is a process of converting raw
facts or data into a meaningful information.

Stages of Data Processing


Data processing consists of following 6 stages;
Collection:
Collection of data refers to gathering of data. The data
gathered should be defined and accurate.
 Collecting data is the first step in data processing. Data is pulled
from available sources, including data lakes and data warehouses.
It is important that the data sources available are trustworthy and
well-built so the data collected (and later used as information) is of
the highest possible quality.
 The collection of raw data is the first step of the data processing
cycle. The type of raw data collected has a huge impact on the
output produced. Hence, raw data should be gathered from defined
and accurate sources so that the subsequent findings are valid and
usable. Raw data can include monetary figures, website cookies,
profit/loss statements of a company, user behavior, etc.
Preparation:
Preparation is a process of constructing a dataset of
data from different sources for future use in processing step of cycle.
 Once the data is collected, it then enters the data preparation stage.
Data preparation, often referred to as “pre-processing” is the stage
at which raw data is cleaned up and organized for the following
stage of data processing. During preparation, raw data is diligently
checked for any errors. The purpose of this step is to eliminate bad
data (redundant, incomplete, or incorrect data) and begin to create
high-quality data for the best business intelligence.
 Data preparation or data cleaning is the process of sorting and
filtering the raw data to remove unnecessary and inaccurate data.
Raw data is checked for errors, duplication, miscalculations or
missing data, and transformed into a suitable form for further
analysis and processing. This is done to ensure that only the
highest quality data is fed into the processing unit.
 The purpose of this step to remove bad data (redundant,
incomplete, or incorrect data) so as to begin assembling high-
quality information so that it can be used in the best possible way
for business intelligence.
Input:
Input refers to supply of data for processing. It can be fed into
computer through any of input devices like keyboard, scanner, mouse,
etc.
 The clean data is then entered into its destination (perhaps a CRM
like Salesforce or a data warehouse like Redshift), and translated
into a language that it can understand. Data input is the first stage
in which raw data begins to take the form of usable information.
 In this step, the raw data is converted into machine readable form
and fed into the processing unit. This can be in the form of data
entry through a keyboard, scanner or any other input source.
Processing:
The process refers to concept of an actual execution of
instructions. In this stage, raw facts or data is converted to meaningful
information.
 During this stage, the data inputted to the computer in the previous
stage is actually processed for interpretation. Processing is done
using machine learning algorithms, though the process itself may
vary slightly depending on the source of data being processed (data
lakes, social networks, connected devices etc.) and its intended use
(examining advertising patterns, medical diagnosis from connected
devices, determining customer needs, etc.).
 In this step, the raw data is subjected to various data processing
methods using machine learning and artificial intelligence
algorithms to generate a desirable output. This step may vary
slightly from process to process depending on the source of data
being processed (data lakes, online databases, connected devices,
etc.) and the intended use of the output.
Output and Interpretation:
In this process, output will be
displayed to user in form of text, audio, video, etc. Interpretation of
output provides meaningful information to user.
 The output/interpretation stage is the stage at which data is finally
usable to non-data scientists. It is translated, readable, and often in
the form of graphs, videos, images, plain text, etc.). Members of
the company or institution can now begin to self-serve the data for
their own data analytics projects.
 The data is finally transmitted and displayed to the user in a
readable form like graphs, tables, vector files, audio, video,
documents, etc. This output can be stored and further processed in
the next data processing cycle.
Storage:
In this process, we can store data, instruction and information
in permanent memory for future reference.
 The final stage of data processing is storage. After all of the data is
processed, it is then stored for future use. While some information
may be put to use immediately, much of it will serve a purpose
later on. Plus, properly stored data is a necessity for compliance
with data protection legislation like GDPR. When data is properly
stored, it can be quickly and easily accessed by members of the
organization when needed.
 The last step of the data processing cycle is storage, where data
and metadata are stored for further use. This allows for quick
access and retrieval of information whenever needed, and also
allows it to be used as input in the next data processing cycle
directly.
Types of Data Processing

Type Uses

Data is collected and processed in batches. Used


for large amounts of data.
Batch Processing
Eg: payroll system

Data is processed within seconds when the input


is given. Used for small amounts of data.
Real-time
Processing

Eg: withdrawing money from ATM


Data is automatically fed into the CPU as soon as it
becomes available. Used for continuous processing of
Online Processing data.

Eg: barcode scanning


Data is broken down into frames and processed using
two or more CPUs within a single computer system.
Multiprocessing Also known as parallel processing.

Eg: weather forecasting

Allocates computer resources and data in time

Time-sharing slots to several users simultaneously.


Data Processing Cycle
This process is completed in three stages:
Input (Data)
Processing
Output
(Information)
Input(Data) Processing Output
(Information)

Planning Data Classification Testing


Data Collecting Data sorting Summarizing
Input Data Calculation Sorting result
Reification Output the result
Coding Feedback
Data processing

Explaination:
 Data processing is the process of data
management , which enables creation of valid, useful
information from the collected data. Data processing
includes classification, computation, coding and
updating. Data storage refers to keeping data in the
best suitable format and in the best available medium.

 Data processing occurs when data is collected


and translated into usable information. Usually
performed by a data scientist or team of data
scientists, it is important for data processing to be
done correctly as not to negatively affect the end
product, or data output.

 Data processing starts with data in its raw form


and converts it into a more readable format (graphs,
documents, etc.), giving it the form and context
necessary to be interpreted by computers and utilized
by employees throughout an organization.
Data Processing Methods
There are three main data processing methods - manual,
mechanical and electronic.

Manual Data Processing


This data processing method is handled manually. The entire
process of data collection, filtering, sorting, calculation, and other
logical operations are all done with human intervention and
without the use of any other electronic device or automation
software. It is a low-cost method and requires little to no tools, but
produces high errors, high labor costs, and lots of time and tedium.

In manual data processing, most tasks are done manually with a


pen and a paper. For example in a busy office, incoming tasks
(input) are stacked in the “tray” (output). The processing of each
task involves a person using the brain in order to respond to
queries.

The processed information from the out tray is then distributed to


the people who need it or stored in a file cabinet.

Mechanical Data Processing


Data is processed mechanically through the use of devices and
machines. These can include simple devices such as calculators,
typewriters, printing press, etc. Simple data processing operations
can be achieved with this method. It has much lesser errors than
manual data processing, but the increase of data has made this
method more complex and difficult.
Manual is cumbersome and boring especially repetitive tasks.
Mechanical devices were developed to help in automation of
manual tasks. Examples of mechanical devices include the
typewriter, printing press, and weaving looms. Initially, these
devices did not have electronic intelligence.

Electronic Data Processing


Data is processed with modern technologies using data processing
software and programs. A set of instructions is given to the
software to process the data and yield output. This method is the
most expensive but provides the fastest processing speeds with the
highest reliability and accuracy of output.

For a long time, scientists have researched on how to develop


machine or devices that would stimulate some form of human
intelligence during data and information processing. This was
made possible to some extent with the development of electronic
programmable devices such as computers.

The advent of microprocessor technology has greatly enhanced


data processing efficiency and capability. Some of the micro
processor controlled devices include computers, cellular(mobile)
phones, calculators, fuel pumps, modern television sets, washing
machines etc.
Examples of Data Processing
Data processing occurs in our daily lives whether we may be aware of it
or not. Here are some real-life examples of data processing:

 A stock trading software that converts millions of stock data into a


simple graph.

 An e-commerce company uses the search history of customers to


recommend similar products.

 A digital marketing company uses demographic data of people to


strategize location-specific campaigns.

 A self-driving car uses real-time data from sensors to detect if


there are pedestrians and other cars on the road.
Types of storage devices
SSD and flash storage
Flash storage is a solid-state technology that uses flash memory chips for writing
and storing data. A solid-state disk (SSD) flash drive stores data using flash
memory. Compared to HDDs, a solid-state system has no moving parts and,
therefore, less latency, so fewer SSDs are needed. Since most modern SSDs are
flash-based, flash storage is synonymous with a solid-state system.

Hybrid storage
SSDs and flash offer higher throughput than HDDs, but all-flash arrays can be
more expensive. Many organizations adopt a hybrid approach, mixing the speed of
flash with the storage capacity of hard drives. A balanced storage infrastructure
enables companies to apply the right technology for different storage needs. It
offers an economical way to transition from traditional HDDs without going
entirely to flash.

Cloud storage
Cloud storage delivers a cost-effective, scalable alternative to storing files to
on-premise hard drives or storage networks. Cloud service providers allow you
to save data and files in an off-site location that you access through the public
internet or a dedicated private network connection. The provider hosts, secures,
manages, and maintains the servers and associated infrastructure and ensures
you have access to the data whenever you need it.

Hybrid cloud storage


Hybrid cloud storage combines private and public cloud elements. With hybrid
cloud storage, organizations can choose which cloud to store data. For instance,
highly regulated data subject to strict archiving and replication requirements is
usually more suited to a private cloud environment. Whereas less sensitive data can
be stored in the public cloud. Some organizations use hybrid clouds to supplement
their internal storage networks with public cloud storage.

Backup software and appliances


Backup storage and appliances protect data loss from disaster, failure or fraud.
They make periodic data and application copies to a separate, secondary device
and then use those copies for disaster recovery. Backup appliances range from
HDDs and SSDs to tape drives to servers, but backup storage can also be offered
as a service, also known as backup-as-a-service (BaaS). Like most as-a-service
solutions, BaaS provides a low-cost option to protect data, saving it in a remote
location with scalability.

Forms of data storage


Data can be recorded and stored in three main forms: file storage, block storage
and object storage.

File storage
File storage, also called file-level or file-based storage, is a
hierarchical storage methodology used to organize and store data.
In other words, data is stored in files, the files are organized in
folders and the folders are organized under a hierarchy of
directories and subdirectories.

Block storage
Block storage, sometimes referred to as block-level storage, is a
technology used to store data into blocks. The blocks are then
stored as separate pieces, each with a unique identifier.
Developers favor block storage for computing situations that
require fast, efficient and reliable data transfer.
Object storage
Object storage, often referred to as object-based storage, is
a data storage architecture for handling large amounts of
unstructured data. This data doesn't conform to, or can't be
organized easily into, a traditional relational database with
rows and columns. Examples include email, videos, photos,
web pages, audio files, sensor data, and other types of media
and web content (textual or non-textual).

Description of errors in Data processing


1.Computational errors
Occurs when an arithmetic operation does not produce the expected
results. The most common computation errors include ;
 Overflow
 Truncation
 Rounding

 Overflow errors
Occurs if the result from a calculation is too large to be stored in the
allocated memory space. For example if a byte is represented using 8
bits, an overflow will occur if the result of a calculation gives a 9-bit
number.
 Truncation errors
Result from having real numbers that have a long fractional part which
cannot fit in the allocated memory space. The computer would truncate
or cut off the extra characters from the fractional part. For example, a
number like 0.784969 can be truncated to four digits to become 0.784.

 Rounding errors
Results from raising or lowering a digit in a real number to the required
rounded number. for example, to round off 30.666 to one decimal place
we raise the first digit after the decimal point if its successor is more
than or equal to five. In this case the successor is 6 therefore 30.666
rounded up to one decimal place is 30.7.if the successor is below
five,e.g.30.635,we round down the number to 30.6.

The accuracy of the computer output is critical. As the saying goes


garbage in garbage out (GIGO),the accuracy of the data entered in the
computer directly determines the accuracy of the information given out.

 Some of the errors that influence the accuracy of data input and
information output include
 Transcription
 Computation
 Algorithm or logical errors.
 Transcription errors
Occurs during data entry. Such errors include ;
 Transposition errors
 Misreading errors
 Are brought about by the incorrect reading of the source by the
user and hence entering wrong values. For example a user may
misread a handwritten figure such as 589 and type S89 instead i.e.
confusing 5 for S.
 Transposition errors
Results form incorrect arrangement of characters i.e. putting characters
in the wrong order. For example the user might enter 396 instead of 369.
These errors may be avoided by using modern capture devices such as
bar code readers, digital cameras etc which enter data with the minimum
user intervention.

 Algorithm or logical errors


An algorithm is a set of procedural steps followed to solve a given
problem. Algorithms are used as design tools when writing programs.
Wrongly designed programs would result in a program that runs but
gives erroneous output. Such errors that result from wrong algorithm
design are referred to as algorithm or logical errors.
The Future of Data Processing
The future of data processing can best be summed up in one short phrase: cloud
computing.

While the six steps of data processing remain immutable, cloud technology has
provided spectacular advances in data processing technology that has given data
analysts and scientists the fastest, most advanced, cost-effective, and most efficient
data processing methods today.

The cloud lets companies blend their platforms into one centralized system that’s
easy to work with and adapt.

Cloud technology allows seamless integration of new upgrades and updates to


legacy systems while offering organizations immense scalability. Cloud platforms
are also affordable and serve as a great equalizer between large organizations and
smaller companies.

So, the same IT innovations that created big data and its associated challenges have
also provided the solution. The cloud can handle the huge workloads that are
characteristic of big data operations.

Data storage for business


Computer memory and local storage might not provide enough storage, storage
protection, multiple users' access, speed and performance for enterprise
applications. So, most organizations employ some form of a SAN in addition to a
NAS storage system.

SAN

Sometimes referred to as the network behind the servers, a SAN is a


specialized, high-speed network that attaches servers and storage devices. It
consists of a communication infrastructure, which provides physical
connections, allowing an any-to-any device to bridge across the network
using interconnected elements, such as switches and directors. The SAN can
also be viewed as an extension of the storage bus concept. This concept
enables storage devices and servers to interconnect by using similar
elements, such as local area networks (LANs) and wide-area networks
(WANs). A SAN also includes a management layer that organizes the
connections, storage elements and computer systems. This layer ensures
secure and robust data transfers.
Traditionally, only a limited number of storage devices could attach to a
server. Alternatively, a SAN introduces networking flexibility enabling one
server, or many heterogeneous servers across multiple data centers, to share
a common storage utility. The SAN also eliminates the traditional dedicated
connection between a server and storage and the concept that the server
effectively owns and manages the storage devices. So, a network might
include many storage devices, including disk, magnetic tape and optical
storage. And the storage utility might be located far from the servers that it
uses.

SAN components

The storage infrastructure is the foundation on which information relies.


Therefore, the storage infrastructure must support the company's business
objectives and business model. A SAN infrastructure provides enhanced
network availability, data accessibility and system manageability. In this
environment, simply deploying more and faster storage devices is not
enough. A good SAN begins with a good design.

The core components of a SAN are Fibre Channel, servers,


storage appliances, and networking hardware and software.
Fibre Channel

The first element to consider in any SAN implementation is the connectivity of


the storage and server components, which typically use Fibre Channel. SANs,
such as LANs, interconnect the storage interfaces together into many network
configurations and across longer distances.
Server infrastructure

The server infrastructure is the underlying reason for all SAN solutions, and this
infrastructure includes a mix of server platforms. With initiatives, such as server
consolidation and Internet commerce, the need for SANs increases, making the
importance of network storage greater.

Storage system

A storage system can consist of disk systems and tape systems. The disk system
can include HDDs, SSDs or Flash drives. The tape system can include tape
drives, tape autoloaders and tape libraries. Network system SAN connectivity
consists of hardware and software components that interconnect storage devices
and servers. Hardware can include hubs, switches, directors and routers.

You might also like