You are on page 1of 26

AE 21: IT Application Tools in Business

MODULE 4
Title: Information Technology and the Internet

Name of Student:
Course/ year:
Class Schedule:

LECTURE NOTES
READ THIS…

Lesson 1: What is the Internet?


The Internet is a worldwide network of computer networks that connects university, government, commercial, and other
computers in over 150 countries. There are thousands of networks, tens of thousands of computers, and millions of users
on the Internet, with the numbers expanding daily. Using the Internet, you can send electronic mail, chat with colleagues
around the world, and obtain information on a wide variety of subjects.

Internet refers to the global information system that — (i) is logically linked together by a globally unique address space
based on the Internet Protocol (IP) or its subsequent extensions/follow-ons; (ii) is able to support communications using
the Transmission Control Protocol/Internet Protocol (TCP/IP) suite or its subsequent extensions/follow-ons, and/or other
IP-compatible protocols; and (iii) provides, uses or makes accessible, either publicly or privately, high level services
layered on the communications and related infrastructure described herein.

Three principal uses of the Internet are:


 Electronic mail. Electronic mail, or e-mail, lets you electronically "mail" messages to users who have Internet E-
mail addresses. Delivery time varies, but it's possible to send mail across the globe and get a response in minutes.
LISTSERVs are special interest mailing lists which allow for the exchange of information between large numbers
of people.
 USENET newsgroups. USENET is a system of special interest discussion groups, called newsgroups, to which
readers can send, or "post" messages which are then distributed to other computers in the network. (Think of it as
a giant set of electronic bulletin boards.) Newsgroups are organized around specific topics, for
example, alt.education.research, alt.education.distance, and misc.education.science.
 Information files. Government agencies, schools, and universities, commercial firms, interest groups, and private
individuals place a variety of information on-line. The files were originally text only, but increasingly contain
pictures and sound.

Foundation of the Internet

The Internet resulted from the effort to connect various research networks in the United States and Europe. First,
DARPA established a program to investigate the interconnection of “heterogeneous networks.” This program, called
Internetting, was based on the newly introduced concept of open architecture networking, in which networks with defined
standard interfaces would be interconnected by “gateways.” A working demonstration of the concept was planned. In
order for the concept to work, a new protocol had to be designed and developed; indeed, a system architecture was also
required.

In 1974 Vinton Cerf, then at Stanford University in California, and this author, then at DARPA, collaborated on a
paper that first described such a protocol and system architecture—namely, the transmission control protocol (TCP),
which enabled different types of machines on networks all over the world to route and assemble data packets. TCP, which
originally included the Internet protocol (IP), a global addressing mechanism that allowed routers to get data packets to
their ultimate destination, formed the TCP/IP standard, which was adopted by the U.S. Department of Defense in 1980.
By the early 1980s the “open architecture” of the TCP/IP approach was adopted and endorsed by many other researchers
and eventually by technologists and businessmen around the world.
By the 1980s other U.S. governmental bodies were heavily involved with networking, including the National
Science Foundation (NSF), the Department of Energy, and the National Aeronautics and Space Administration (NASA).
While DARPA had played a seminal role in creating a small-scale version of the Internet among its researchers, NSF
worked with DARPA to expand access to the entire scientific and academic community and to make TCP/IP the standard
in all federally supported research networks. In 1985–86 NSF funded the first five supercomputing centres—at Princeton
University, the University of Pittsburgh, the University of California, San Diego, the University of Illinois, and Cornell
University. In the 1980s NSF also funded the development and operation of the NSFNET, a national “backbone” network
to connect these centres. By the late 1980s the network was operating at millions of bits per second. NSF also funded
various nonprofit local and regional networks to connect other users to the NSFNET. A few commercial networks also
began in the late 1980s; these were soon joined by others, and the Commercial Internet Exchange (CIX) was formed to
allow transit traffic between commercial networks that otherwise would not have been allowed on the NSFNET backbone.
In 1995, after extensive review of the situation, NSF decided that support of the NSFNET infrastructure was no
longer required, since many commercial providers were now willing and able to meet the needs of the research
community, and its support was withdrawn. Meanwhile, NSF had fostered a competitive collection of commercial Internet
backbones connected to one another through so-called network access points (NAPs).
From the Internet’s origin in the early 1970s, control of it steadily devolved from government stewardship to
private-sector participation and finally to private custody with government oversight and forbearance. Today a loosely
structured group of several thousand interested individuals known as the Internet Engineering Task Force participates in
a grassroots development process for Internet standards. Internet standards are maintained by the nonprofit Internet
Society, an international body with headquarters in Reston, Virginia. The Internet Corporation for Assigned Names and
Numbers (ICANN), another nonprofit, private organization, oversees various aspects of policy regarding Internet domain
names and numbers.

Commercial expansion
The rise of commercial Internet services and applications helped to fuel a rapid commercialization of the Internet. This
phenomenon was the result of several other factors as well. One important factor was the introduction of the personal
computer and the workstation in the early 1980s—a development that in turn was fueled by unprecedented progress
in integrated circuit technology and an attendant rapid decline in computer prices. Another factor, which took on
increasing importance, was the emergence of Ethernet and other “local area networks” to link personal computers. But
other forces were at work too. Following the restructuring of AT&T in 1984, NSF took advantage of various new options
for national-level digital backbone services for the NSFNET. In 1988 the Corporation for National
Research Initiatives received approval to conduct an experiment linking a commercial e-mail service (MCI Mail) to the
Internet. This application was the first Internet connection to a commercial provider that was not also part of the research
community. Approval quickly followed to allow other e-mail providers access, and the Internet began its first explosion in
traffic.
In 1993 federal legislation allowed NSF to open the NSFNET backbone to commercial users. Prior to that time,
use of the backbone was subject to an “acceptable use” policy, established and administered by NSF, under which
commercial use was limited to those applications that served the research community. NSF recognized that commercially
supplied network services, now that they were available, would ultimately be far less expensive than continued funding of
special-purpose network services.
Also in 1993 the University of Illinois made widely available Mosaic, a new type of computer program, known as
a browser, that ran on most types of computers and, through its “point-and-click” interface, simplified access, retrieval,
and display of files through the Internet. Mosaic incorporated a set of access protocols and display standards originally
developed at the European Organization for Nuclear Research (CERN) by Tim Berners-Lee for a new Internet application
called the World Wide Web (WWW). In 1994 Netscape Communications Corporation (originally called Mosaic
Communications Corporation) was formed to further develop the Mosaic browser and server software for commercial use.
Shortly thereafter, the software giant Microsoft Corporation became interested in supporting Internet applications on
personal computers (PCs) and developed its Internet Explorer Web browser (based initially on Mosaic) and other
programs. These new commercial capabilities accelerated the growth of the Internet, which as early as 1988 had already
been growing at the rate of 100 percent per year.

By the late 1990s there were approximately 10,000 Internet service providers (ISPs) around the world, more than
half located in the United States. However, most of these ISPs provided only local service and relied on access to regional
and national ISPs for wider connectivity. Consolidation began at the end of the decade, with many small to medium-size
providers merging or being acquired by larger ISPs. Among these larger providers were groups such as America Online,
Inc. (AOL), which started as a dial-up information service with no Internet connectivity but made a transition in the late
1990s to become the leading provider of Internet services in the world—with more than 25 million subscribers by 2000
and with branches in Australia, Europe, South America, and Asia. Widely used Internet “portals” such as AOL, Yahoo!,
Excite, and others were able to command advertising fees owing to the number of “eyeballs” that visited their sites.
Indeed, during the late 1990s advertising revenue became the main quest of many Internet sites, some of which began to
speculate by offering free or low-cost services of various kinds that were visually augmented with advertisements. By
2001 this speculative bubble had burst.

The 21st century and future directions


After the collapse of the Internet bubble came the emergence of what was called “Web 2.0,” an Internet with emphasis on
social networking and content generated by users, and cloud computing. Social media services such as Facebook, Twitter,
and Instagram became some of the most popular Internet sites through allowing users to share their own content with their
friends and the wider world. Mobile phones became able to access the Web, and, with the introduction of smartphones
like Apple’s iPhone (introduced in 2007), the number of Internet users worldwide exploded from about one sixth of the
world population in 2005 to more than half in 2020.

The increased availability of wireless access enabled applications that were previously uneconomical. For
example, global positioning systems (GPS) combined with wireless Internet access help mobile users to locate alternate
routes, generate precise accident reports and initiate recovery services, and improve traffic management and congestion
control. In addition to smartphones, wireless laptop computers, and personal digital assistants (PDAs), wearable devices
with voice input and special display glasses were developed.

While the precise structure of the future Internet is not yet clear, many directions of growth seem apparent. One is
toward higher backbone and network access speeds. Backbone data rates of 100 billion bits (100 gigabits) per second are
readily available today, but data rates of 1 trillion bits (1 terabit) per second or higher will eventually become
commercially feasible. If the development of computer hardware, software, applications, and local access keeps pace, it
may be possible for users to access networks at speeds of 100 gigabits per second. At such data rates, high-resolution
video—indeed, multiple video streams—would occupy only a small fraction of available bandwidth. Remaining
bandwidth could be used to transmit auxiliary information about the data being sent, which in turn would enable rapid
customization of displays and prompt resolution of certain local queries. Much research, both public and private, has gone
into integrated broadband systems that can simultaneously carry multiple signals—data, voice, and video. In particular,
the U.S. government has funded research to create new high-speed network capabilities dedicated to the scientific-
research community.
It is clear that communications connectivity will be an important function of a future Internet as more machines and
devices are interconnected. In 1998, after four years of study, the Internet Engineering Task Force published a new 128-
bit IP address standard intended to replace the conventional 32-bit standard. By allowing a vast increase in the number of
available addresses (2128, as opposed to 232), this standard makes it possible to assign unique addresses to almost every
electronic device imaginable. Thus, through the “Internet of things,” in which all machines and devices could be
connected to the Internet, the expressions “wired” office, home, and car may all take on new meanings, even if the access
is really wireless.
The dissemination of digitized text, pictures, and audio and video recordings over the Internet, primarily available today
through the World Wide Web, has resulted in an information explosion. Clearly, powerful tools are needed to manage
network-based information. Information available on the Internet today may not be available tomorrow without careful
attention’s being paid to preservation and archiving techniques. The key to making information persistently available is
infrastructure and the management of that infrastructure. Repositories of information, stored as digital objects, will soon
populate the Internet. At first these repositories may be dominated by digital objects specifically created and formatted for
the World Wide Web, but in time they will contain objects of all kinds in formats that will be dynamically resolvable by
users’ computers in real time. Movement of digital objects from one repository to another will still leave them available to
users who are authorized to access them, while replicated instances of objects in multiple repositories will
provide alternatives to users who are better able to interact with certain parts of the Internet than with others. Information
will have its own identity and, indeed, become a “first-class citizen” on the Internet.

Internet Application

An Internet application does something for end users. It is generally not concerned with how data is actually transmitted
between the hosts. Here are some distributed applications that require well-defined application level protocols:
 Sending and receiving email
 Searching and browsing information archives
 Copying files between computers
 Conducting financial transactions
 Navigating (in your car, smart scooter, smart bike, or other)
 Playing interactive games
 Video and music streaming
 Chat or voice communication (direct messaging, video conferencing)
In addition, there are a number of network services such as:
 Name servers
 Configuration servers
 Mail gateways, transfer agents, relays
 File and print servers

Review of Layers

Remember how the Internet was said to be so well designed you never think about it? Here’s one bit of evidence: all
Internet applications work over the exact same transport layers. The Internet says nothing about how these
application should work. It provides IP and TCP and UDP and that’s it. You can build anything on top of those.
Applications pretty much just need to know: (1) the IP address of the other party (what host the other party is running on
—a network layer concept), and (2) the port number of the application running at the other end (because the other
machine might be running multiple services—a transport layer concept). It passes those two pieces of information to the
transport layer to make the communication happen.

APPLICATION LAYER (HTTP, FTP,


SMTP, ...)

TRANSPORT LAYER (TCP, UDP, ...)

NETWORK LAYER (IP)


LINK LAYER (Ethernet, Wifi, ...)

IP Addresses for Hosts

A host address will be 32 bits for IPv4 and 128 bits for IPv6.
IPv4 host addresses are usually written in dotted-quad notation, with each of its four octets written in decimal (0...255,
inclusive), e.g., 27.253.1.199.
IPv6 host addresses are usually written in hex with its eight hextets separated by colons, e.g., fe80:0:3:0:a299:cff:18:57d1.
Port Numbers
Once packets get to the right machine, they have to get to the right program running on that machine. The abstraction here
is the port number. Port numbers are in the range 0..65535.
On the Internet, port numbers are partitioned as follows:
 0...1023 are the System Ports, assigned by the IETF Review or IESG Approval procedures described in RFC
8126.
 1024...49151 are the User Ports, assigned by IANA using the IETF Review process, the "IESG Approval"
process, or the "Expert Review" process, as per RFC 6335.
 49152...65535 are the Dynamic Ports, which are unrestricted, do with them whatever you want.
The authoritative list of port number assignments is published by IANA and is called the Service Name and Transport
Protocol Port Number Registry. It is worth browsing! Here are some highlights in the meantime:
7 echo
9 discard
13 daytime
17 qotd
19 chargen
20 ftp-data
21 ftp
22 ssh
23 telnet
25 smtp
37 time
43 nicname (WHOIS)
49 login
53 domain (DNS)
69 tftp
70 gopher
79 finger
80 http
88 kerberos
110 pop3
115 sftp
119 nntp
123 ntp
143 imap
179 bgp
194 irc
389 ldap
443 https
458 quicktime
540 uucp
546 dhcpv6-client
547 dhcpv6-server
563 nntps
565 whoami
636 ldaps
691 msexch-routing
765 webster
992 telnets
995 pop3s
1194 openvpn
1433 ms-sql-s
1649 kermit
1833 msnp
2049 nfs
3074 xbox
3306 mysql
3689 daap (iTunes)
3724 blizwow
5190 aol (AIM)
5432 postgresql
6000 x11
6346 gnutella-svc
6347 gnutella-rtr
7474 neo4j
26000 quake
33434 traceroute
Here are some other excellent partial lists:
 The Common Ports Appendix from the RHEL Security Guide
 A cool cheat sheet from Jeremy Stretch

Exercise: While World of Warcraft (3724) and Quake (26000) have officially assigned ports, many games such as Halo,
Call of Duty, and Halflife use ports that are unassigned or have been assigned to applications that are uncommon. Make
a list of the ports used by popular games.

Exercise: The selected port assignments above are not actually accurate, because you need not only a port number but
also a protocol. Find IANA port assignments for which the udp and tcp assignments differ.

Sockets
So hosts have IP addresses and applications run on specific ports. But how to we program with that info. Generally, there
is an O.S.-level data structure called a socket consisting of:
 The remote IP address
 The remote port number
 The local IP addresses
 The local port number
 A message queue
It’s likely that the O.S. provides a blocking read_socket call, which blocks until a message is available. It looks something
like:
read_socket(socket_descriptor, &buffer, number_of_bytes_to_read)
When a datagram arrives form the network, the link layer passes it to the network layer which passes it to the transport
layer, where the port number is extracted. The O.S. uses the port number to send the data to the right application. If a
thread is waiting there, the correct amount of data is copied into the buffer and the thread gets unblocked.

Writing Your Own Internet Applications

There are two main paradigms: Client-Server and Peer-to-Peer (P2P).

Client-Server
The most common is client-server.
 Server: the application (not the machine) that passively waits for contact. Generally reside on big machines and
can serve many clients simultaneously. Servers often have fixed, permanent IP addresses. They don’t know in
advance who the clients are.
 Client: application (not the machine) that actively initiates contact. Generally runs on end users’ devices.

Once connection is established, the clients and servers can talk back and forth to each other any way they want. (Clients
don’t talk to each other.)
In order to "fairly" handle multiple simultaneous clients, a server should
 Be multithreaded,
 Fork copies of itself, OR
 Be event-driven
Multi-threaded servers have code that looks like this:
while (true) {
Connection c = waitForConnection();
spawnANewThreadToHandle(c); // or submit to task to thread pool
}
Event-driven servers would look like:
server.on('connect', (connection) => {
connection.on('somemessage1', (data) => handlerForMessage1());
...
connection.on('somemessageN', (data) => handlerForMessageN());
}
But now we have many clients talking simultaneously to the service which is "running on" a fixed port. How are these
distinguished?
When a client's connection request is granted by the server, networking software on the client gets a port for the client to
run on. If there are multiple clients on the same host, they will each have different port numbers. The server, then, uses the
combination of client IP address + client port number to know exactly which connection is being referred to. Example:
When writing programs, your programming language will have some kind of library to create and manage these
connections, and to send and receive data through them. The abstraction for a connection is called a socket. A socket API
will often look something like this:
socket()
Create a socket (whether it is TCP or UDP or anything else is a parameter)
bind()
Bind IP address and port to socket
setsockopt()
Set socket options (configure socket)
getsockopt()
Get socket options (get configuration)
listen()
(Server) wait for a connection
accept()
(Server) accept a connection from client
getpeername()
(Server) get IP address of connected client
connect()
(Client) Initiate connection with server
send()
Send data
recv()
Receive data
sendmsg()
Send a message
recvmsg()
Receive a message
close()
Terminate connection
Most programming languages layer higher-level constructs over this low-level socket API. You’ll commonly see
things like classes for “server sockets” that automatically bind and listen, or various events that replace blocking calls.
Sometimes you can create streams that abstract away repeated calls to send and receive.
See the following pages for socket-based programming examples in:
 Java
 Python
 JavaScript

Lesson 2: Spreadsheets

A Quick and Easy Guide to Spreadsheets


A spreadsheet is a software program you use to easily perform mathematical calculations on statistical data and totaling
long columns of numbers or determining percentages and averages.
And if any of the raw numbers you put into your spreadsheet should change – like if you obtain final figures to substitute
for preliminary ones for example – the spreadsheet will update all the calculations you’ve performed based on the new
numbers.
You also can use a spreadsheet to generate data visualizations like charts to display the statistical information you’ve
compiled on a website.
This tutorial will focus on the use of the free application Google Spreadsheets. To use Google Spreadsheets, you will need
to sign up for a free Google account. There are other spreadsheet software you can purchase, like Microsoft Excel. While
this tutorial will focus primarily on Google Spreadsheet, most of its lessons will be applicable to any spreadsheet
software, including Excel.
Spreadsheet Layout
To create a new spreadsheet in Google Spreadsheet, sign into your Google Drive account. Then click on the New button
on the top left and select Google Sheets.
On your screen will appear a basic spreadsheet, divided into numbered rows and lettered columns.

The rows and columns intersect to create small boxes, which are called cells.
Each cell is identified by its column letter and row number.
Thus the very first cell in the upper left-hand corner is called A1.
Just below A1 is A2. Just to the right of A1 is B1. Just below B1 is B2, and so on.
In the image below, for example, cell D9 is highlighted.

Setting the View Options


You can select some settings to change the view of the spreadsheet or display toolbars you frequently use, such as the one
for entering formulas to make calculations.
To do this, in the menu at the top click on View and make sure there’s a check mark next to Show Formula Bar (to
display a box to enter formulas).

Entering Information in a Cell


You enter information into a spreadsheet program by typing it into each of the cells.
You can enter three different types of information into a cell:
 Numbers – so you then can perform mathematical calculations on them.
 Text – to identify what the numbers in the columns and rows represent, usually by typing headings across the top
of the columns or on the left edge of the rows
 Formulas – to perform calculations on the numbers in a column or a row of cells.

To enter information into a cell, simply click on the cell and type in the information.
When you’re done, you can either press the enter/return key, which will take you down to the next cell, or the tab key,
which will advanced to the cell to the right.
Each time you type information into a cell, you’ll notice the information also appears in the Formula bar, the box just
above the columns and rows.
For example, if you click on cell:
B3
And type in the number:
100
You’ll see the number 100 displayed in the formula bar above.

Text Headings
To enter text headings for the various columns and rows to identify them, follow the same procedure as you would with
entering numbers. Click on the cell, type in the name of a heading and press the enter/return key.
You can also “freeze” this header row, so it stays in the same place, even if you scroll down a long spreadsheet. To do
this, grab the small bar in the corner of the spreadsheet area, and drag it down one row.

Importing Data Into a Spreadsheet


Many government agencies and private organizations provide data on their websites in a spreadsheet or other format that
you can download onto your computer.
To import a spreadsheet, .csv or other file you’ve downloaded on your computer into a Google Spreadsheets, first create a
new spreadsheet in Google Docs. Then in the menu at the top click on File … Import and then Browse and select the
downloaded file.
Importing Sample Data
Let’s download some data to demonstrate how to import it into a Google Docs spreadsheet, and also to give us some
sample data to use to show how to do calculations and use other features of a spreadsheet.
The FBI compiles national crime statistics, including data on the types of weapons used in homicides.
This data is in an Excel spreadsheet (.xls) file that can be downloaded from the FBI website and then imported into a
Google Doc spreadsheet.

Click on the link at the top for:


Download Excel
The file will be downloaded onto your computer.
(if for some reason you have trouble downloading this file, you can click here to download the file from our website)
To import the file into a Google Docs spreadsheet, create a new spreadsheet and in the menu at the top click on:
File…Import
Click on the Browse button and navigate to the downloaded FBI file which is
named expanded_homicide_data_table_8_murder_victims_by_weapon_2010-2014.xls. Google Spreadsheet also
allows you to import data from your Google Drive. It may give you an option to replace existing data, or to create a new
sheet. Choose the best option for your situation.
After a few seconds you should see a Google Docs spreadsheet that looks like this:
This spreadsheet shows the number of murder victims in each year from 2004 to 2008 in five columns, with the columns
labeled by year in cells B4 to F4.
Below that the spreadsheet shows the weapon used in the murders in 18 rows of data, with the rows labeled by type of
weapon in cells A5 (which is the overall total for all weapons) to A22.

Resizing Columns or Rows


You can improve the display of the data in a spreadsheet by increasing or decreasing the width of a column or the height
of a row.
To change a column’s width, in the gray bar at the top of the spreadsheet where the letters of the columns are displayed,
move your mouse cursor to the border between any two columns.

Note for Excel: if you narrow the width of a column displaying a number too much, you will see a series of pound signs
displayed in the cell:
###
This doesn’t mean you’ve lost any data – you just made the column width too narrow to fit some of the numbers in the
cells in that column.
You can also speed up the resizing of columns and avoid making them too narrow by moving your mouse cursor to the
border separating two columns in the gray bar at the top and double-clicking on the border. This will automatically resize
the column to the left, making it just wide enough to fit the longest entry on any row in that column.
Deleting or Adding Columns or Rows
You can get rid of unwanted data or other information by deleting rows or columns.
For example, in our sample spreadsheet of weapons used in homicides, we might want to get rid of row 23, which is just a
footnote stating that one murder in which the victim was pushed to his/her death has been included in the “Personal
weapons” listing in row 14.

To delete a row, hover your mouse cursor over a row number in the gray area to the left, in this case row 23. Right click
and in the pop-up menu select Delete row.
Use the same procedure for deleting a column.
Hover your mouse cursor over a column letter in the gray area at the top, right click and in the pop-up menu select Delete
column (you also can click on the tiny downward-pointing arrow to get this pop-up menu).
If you want to add a column or row, again hover your mouse cursor over the appropriate column or row in the gray area
above or to the left, right click and in the pop-up menu select one of the Insert options.

Learn to work with data in a spreadsheet and to create engaging charts, maps and graphs in the Berkeley Advanced
Media Institute Data Visualization for Storytellers Workshop.
Formulas – Adding, Subtracting, Multiplying and Dividing
With a spreadsheet you can insert a formula that will instantly add, subtract, multiply or divide numbers in columns or
rows.
To do this you select a cell in a new column or row and then type in a formula.
A formula starts with an equals sign (=) that tells the spreadsheet you want to do a calculation.

A formula then has a symbol for what kind of calculation you want to perform (add, subtract, multiply, divide, etc.). The
symbols a spreadsheet uses for calculations are:
 plus sign (+) for adding one number to another
 minus sign (-) for subtracting one number from another
 asterisk (*) for multiplying one number by another
 backslash (/) for dividing one number by another
Then you type in the letters/numbers for the cells (A1, A2, B1, B2, etc.) to which you want to apply the calculation,
separated by the symbol for the type of calculation.

Adding Numbers in Columns


Let’s write a formula for adding together a series of numbers.
In the spreadsheet for types of weapons used in murders that we downloaded from the FBI website, the spreadsheet
already included the total number of homicides in which any kind of firearm was used each year from 2004 to 2008.
Those numbers are in row 6.
But what if these totals hadn’t been included in the original data and you needed to calculate them yourself using the
spreadsheet (or if you wanted to use the spreadsheet to double-check the FBI’s calculations).
This would require totaling up for each year the column of numbers for the five weapon types in the spreadsheet:
 Handguns – row 7
 Rifles – row 8
 Shotguns – row 9
 Other guns – row 10
 Firearms, type not stated – row 11

To do this we need to insert a formula for adding a series of numbers in a column.


Let’s start by doing this for the year 2004. Click on cell:
B23
Which is in the column that shows the numbers for weapons used in 2004.
In that cell, type:
=B7+B8+B9+B10+B11
(note: the letters are not case sensitive. So for example so you could type in either B7 or b7)
This tells the spreadsheet to add up the number of murders committed with handguns (B7), rifles (B8), shotguns (B9),
other guns (B10), and firearms, type not stated (B11) for the year 2004.

You should type cell


letters/numbers into a formula rather than the actual numbers.
That way if the numbers ever change (for example, if the FBI released updated murder weapon statistics for 2008), you
won’t have to re-enter the new numbers in the formula. Instead you’d just type the updated numbers into the appropriate
cells and the spreadsheet will apply the existing formula to the new numbers in those cells.
Applying a Formula to Multiple Cells
If we now wanted to calculate the total number of gun related homicides for the other four years, we could repeat the
process of typing an addition formula into each cell in the rest of row 23. But a spreadsheet has a much faster way of
accomplishing this – by letting you simply copy the formula to one or more of the other cells in the same row.
To do this, click on cell:
B23
Where we typed in our addition formula
=B7+B8+B9+B10+B11
Pass your mouse cursor over the bottom right corner of cell B23 and notice your cursor changes from an arrow pointer
to a thin crosshairs.

Click on that crosshairs, hold down your mouse button and drag your mouse to the right over the rest of the cells in row
23.

An outline will appear around the cells you’ve selected.


Continue dragging your mouse until you get to cell:
F23
Release your mouse button and the total number of homicides involving firearms for each year from 2010 to 2014 will
appear in row 23.

Which again confirms the totals in the original FBI spreadsheet in row 6.
The spreadsheet has calculated these totals for you by applying the formula you first typed in cell B23 to the rest of the
cells in row 23.
The spreadsheet keeps the formula (addition) the same, but shifts the cell numbers as it applies the formula to the other
cells to the right (so the formula in cell C23 is =C7+C8+C9+C10+C11, the formula in
cell D23 is =D7+D8+D9+D10+D11, and so on).
Editing a Formula
When you type a formula into a cell and then hit the enter/return key, the formula will disappear, replaced by a number
that’s the result of the calculation.
So how can you edit the formula?
There are two ways:
You can double click on the cell to display the formula in the cell and then edit or retype it there.
Or you can click once on a cell and use the Formula barabove to edit it.
If you click once on a cell that has a formula hidden in it (replaced by a number that’s the result of the calculation), the
formula you originally typed will appear in the Formula bar above the columns and rows.

To edit the formula you can click in the formula bar where the formula for this cell is displayed. Then change the existing
formula or type a new one into the Formula bar, press the enter/return key and the new formula will be applied and the
numbers will be recalculated in the cell.
Understanding Cell Formats
Cells can display their data in many different ways. For example, you can format a cell to display data as currency, as a
date, scientific notation, or several other formats. You can adjust this by highlighting a cell, and changing its format under
the Format -> Number menu.

This can sometimes be counter-intuitive because the cell can appear differently than the data that’s actually in the
cell. For example, in the case of currency format, the cell data could have several decimal places. But when formatting
for currency, a dollar symbol will display and the cell will only show the hundredths place (2 decimal points), even if the
actual data in the cell has is more exact and has more decimal points.

The way to understand what the actual data is in a cell is to look at the formula bar. This will sometimes show you the
raw data. The cell format is generally used to make thing more human-readable. But sometimes this can be the cause of
consternation, especially when using formulas. This could especially be tricky when using dates.

Percent Changes and Multiplying and Dividing


This next section will describe how to calculate a percent change between two numbers. A percent change is calculated
by finding the difference between the two numbers, and comparing that difference by the first number.
In our spreadsheet on murder weapons, we can calculate how much each weapon increased or decreased between 2010 to
2014.
First click on cell G5 to the right of our existing data.
Type in the following formula:
+(F5-B5)/B5

Now let’s do the percent calculation, starting with the percent change in the total number of homicides (row 5).
First click on cell H5 to the right.
And type in the following formula:
=(F5-B5)/B5*100
This is the formula for calculating the percent change between two numbers.
This formula tells the spreadsheet to find the difference of homicides by subtracting the total homicides in 2014 from
2010. After that, divides the results to the original value.
The backslash ( / ) is the symbol for dividing, while the asterisk ( * ) is the symbol for multiplying.
(Note: The parentheses in this formula are also important to define the correct order of operations.)
Now hit the enter/return key to see the final result of the percent formula in cell G5:
-0.09138559708
The total number of homicides by all types of weapons declined by 9.1 percent from 2010 to 2014. But to make it into a
more human-readable format, we can change the data format of the cell to a percentage.

Now it will display as:


-9.14%
Apply to the rest of the cells
Now let’s apply this percent change formula to the rest of the murder-by-weapon numbers. Click on cell:
G5
Pass your mouse over the bottom right corner of the cell until the cursor changes to thin crosshairs.
Click and drag the mouse cursor down over the rest of the cells in the H column. Release your mouse button when you
get to cell:
G22
The percent changes for all the different types of weapons used in homicides will appear on your screen.

Parentheses in a Formula

In the formula for percent change we used in the previous section, parentheses ( ) were included in the formula:
=(F5-B5)/B5
The parentheses in this formula are very important. These tell the spreadsheet to subtract the number of homicides in 2010
(B5) from the number of homicides in 2014 (F5) first, and then divide that amount by the number of homicides in 2010
(B5).
If you didn’t include the parentheses and had just typed in =F5-B5/B5, the spreadsheet first would divide B5 by B5
(yielding 1). Finally it would subtract the result from F5, resulting in an incorrect number.
So if you are doing a calculation involving several steps, it is important to include parentheses so you can group the
numbers properly and the spreadsheet thus knows the order in which to do the calculations.

Using Formulas with a Fixed Cell


Another feature you can do with a spreadsheet is building a formula with a fixed cell, so that when you drag your formula
to apply them to other cells, it doesn’t automatically switch its reference to a new cell.
In our spreadsheet, for example, we might want to know what percentage of homicides involved each different type of
weapon compared to a specific year. We would compare each cell to the total number of homicides for only that year, so
we don’t want the reference to that year’s total to change.
Let’s start with 2014. To create our percent formula click on cell:
H6
And type in this formula:
=F6/F5

This formula tells the spreadsheet to divide the number of homicides involving firearms in 2010 (F6) by the total number
of homicides that year (F5).
Press the enter/return key and swith the cell format to percentage. You’ll see the total is:
67.92%
So firearm related homicides were about two thirds of the total number of homicides in 2014. Good… so far.
But, you might then try to apply this same formula to the cells for the other types of weapons by dragging the crosshairs,
as we did in the previous example. But if you tried this, it would produce bizarre numbers in the G column, including
that some weapons-related homicides are more than 100% of the total.

What went wrong?


The problem is that when the spreadsheet copies a formula using this method, it shifts the letters for both cells in the
original formula (F6 and F5) as it applies that formula to other cells (resulting in F7 divided by F6 in the next cell down).
To fix this, we need to force the spreadsheet to always divide the numbers for each type of weapon used by a constant
number – the total number of homicides in cell F5. This is called anchoring the cell in our formula, and force the
spreadsheet always to use one cell each time.
You accomplish this by adding some $ signs to the formula that instruct the spreadsheet not to change cell F5 when
applying the formula to other cells.
So go back and click on cell:
F6
Delete that formula (press delete key), and instead type in this:
=F6/$F$5
The dollar signs tell Excel to always keep anchored on cell F5 and the data in it when applying this formula to other cells.
Now we can drag the formula down through the column of cells and get the correct results.
So hover your mouse over cell:
F6
Then click on the crosshairs in the bottom right corner of the cell and drag down to cell:
F22
And release your mouse.

The correct percentage figure for each weapon type will now appear in the spreadsheet.

Adding Numbers Using the SUM Formula


If you want to add a large group of numbers in a row or column, there’s another way to do that quickly in a spreadsheet by
using the SUM formula.
For example, in our example spreadsheet on weapons used in homicides, what if you wanted to know the total number of
homicides in which did not include a firearm?
To calculate that, you could add up the numbers in rows 12 to 21 for each year using the SUM formula
(Note: row 22 – “Other weapons or weapons not stated” – may or may not involve a non-firearm-related homicide, so
we’re leaving that out of this calculation)
To use the SUM formula to calculate the number of non-firearm-related homicides in rows 12 to 21, first click on cell:
B23
In that cell type this formula
=SUM(B12:B21)
You’ll see there were 3,418 non-firearm-related homicides in 2014. In our formula, =SUM() is shorthand for telling a
spreadsheet to add up a series of numbers.
After typing =SUM, you type a set of parentheses, and inside the parenthesis you will include something called a range.

A range has two cell references separated by a colon. B12:B21. Ranges can even span multiple row or multiple columns,
and can be used in numerous formulas.

Adding selected cells with the SUM formula instead of a range


You also can add up select numbers in a column, rather than a span of them, using the SUM formula.
To do that, in the SUM formula you replace the colon with commas to separate the specific cells you want to total up.
Thus if you wanted to total up only the number of homicides in 2014 in which either poison (cell B15) or narcotics
(cell B18) was involved, you would type this formula.
=SUM(B15,B18)

Shortcuts to Writing Formulas


There are a number of shortcuts for writing formulas in a spreadsheet.
To illustrate these, in our spreadsheet on types of weapons used in homicides, let’s add up the total number of firearm-
related homicides from 2010 to 2014. This would mean adding cells B6 through F6. We could manually type in
the =SUM(B6:F6) formula, but there is a more user-friendly tool for doing this without having to remember formulas.
To do this, first click on cell:
I6
Then use the spreadsheet’s Formulas tool that will shorten what you have to type.

Click on it and you’ll see a series of formulas you can select to insert into your spreadsheet.

In this case pick SUM and the formula =SUM() will be inserted into cell G6.
Now you can click the cells you want to be referenced, and they will be auto-populated into the formula. You can click-
and-drag to specify a range, or click and hold down the shift key and click another cell. To specify specific cells to add
without making it a range, you should hold down the command key (Mac) or Control key (PC) and click all the cells you
want.

Averaging Numbers
Another common calculation is averaging a series of numbers.
In our spreadsheet on the types of weapons used in homicides, for example, what if we wanted to know the average
number of firearm-related homicides each year between 2010 and 2014 (cells B6 to F6).
To do this, click on cell:
J6
And in that cell type:
=AVERAGE(B6:F6)

This same process can be used to also calculate the MEDIAN(), MODE(), STDEV() (standard deviation) and other
statistical functions for a series of data points.
Using Functions to Import Website Data
One advantage to Google spreadsheets is that it is designed to work with the Web. Specific functions allow you to load
data dynamically directly from a website.
Import a data file published on the Web into your spreadsheet
CSV files (comma separated values) can be imported directly into a spreadsheet from anywhere on the Web. CSV is one
of the most common data formats and can be found with a simple Google search.
For sample data, we will use a piece of crime data from UC Berkeley in 2015 hosted on Github. The url
is https://raw.githubusercontent.com/jrue/ucpd-crime/master/data/ucpd/ucpd_data_6.csv.
Let’s import this data into a new sheet. Click the small plus button at the bottom of our workbook document:

Click in cell A1 and type (or copy-and-paste) the following:


=ImportData("https://raw.githubusercontent.com/jrue/ucpd-crime/master/data/ucpd/ucpd_data_6.csv")

After a moment the data will load and should look like this:
Many files will not be this clean and may require cleanup. But if you can use the file as is, it’s especially useful.
Governments regularly update CSV files on their servers. This may happen frequently with certain files such as election
results.

Adjusting Data Display by Changing Cell Formats


In the previous example, you might have noticed the date and time columns display these strange numbers which should
be dates and times of each crime. Raw cell data for a time value is the number of days since Jan 1, 1900 (and may even be
different when using Microsoft Excel).

We can easily adjust this by changing the cell format. Click on the column’s heading, then under the Formatmenu,
select Date for the first column, and a Time for the second column.
Import a table or list directly from a Web page
Tables can frequently be imported directly from a Web page into a spreadsheet. Let’s import the same data from
the Wikipedia’s page on Gun Violence by State.
Note: This example will tie into the next section on charts, so we use it for convenience. However, we do not advocate
using data from Wikipedia in any production sense. Always vet and corroborate data directly from the source when used
in journalism.
Open a new sheet and click in cell A1. Type:
=IMPORTHTML("https://en.wikipedia.org/wiki/Gun_violence_in_the_United_States_by_state", "table", 1)
The first parameter is the webpage Google will scan (make sure it’s in quotes). The second parameter is the HTML
element it’s looking for. In our case, we want it to find a <table> element. The third parameter is which table element we
should find, in case there are multiple. You may need to change the third parameter through trial-and-error, or look at the
source code of the webpage you’re scrapping.
Hit enter and the spreadsheet should look like this:

The “table” parameter can be replaced with “list” so that it will look for the contents of <ul> <ol> and <dl> tags.
Load Dynamic Financial Data
Live data from Google finance can be imported into your spreadsheet. The data updates automatically every time the
spreadsheet is loaded. Quotes can have up to a 20 minute delay, which is common for financial data.
Create a new spreadsheet that looks like this:
Type =GoogleFinance(".DJI", "price") in cell B2
Type =GoogleFinance(".INX", "price") in cell B3
Type =GoogleFinance(".IXIC", "price") in cell B4
The initials at the beginning of the parentheses are stock ticker symbols. You can find the symbol for any stock at Google
Finance.
The cells should update in a few moments and your spreadsheet should look like this:

Load historic financial data


The same function can be used to load historic data. Let’s pull in the daily closing price of Google stock for 2009.
Create a new spreadsheet.
In cell A1, type:
=GoogleFinance("GOOG", "close", "01/01/2009" , "12/31/2009", "DAILY")
Hit enter and the daily closes for 2009 should load into your spreadsheet.
The full documentation on all of the different parameters for the Google Finance function are listed on Google’s help
pages.
Sorting Results
After you’ve entered numbers or done calculations in a spreadsheet, you may want to sort the results from highest to
lowest or lowest to highest.
With the spreadsheet on types of weapons used in homicides, for example, you could more easily see which weapons are
most frequently used by ranking them from the highest number to the lowest number for any given year.
To do this, you first need to highlight the area of the spreadsheet that you want to sort.
Don’t just highlight a whole column of numbers to sort because the spreadsheet then will sort only the cells in that
column and not change the order of the corresponding cells in other columns (such as the headings that tell you which
type of weapon corresponds with the numbers of homicides).

The highlighted area now includes the headings for the types of weapons used and then the numbers for each type of
weapon for each year.
To sort the data, in the menu at the top, click on Data … Sort Range
In the box that appears, you’ll see the range of selected cells displayed at the top (in this case, cells A5 to F22).

You now can select the column by which you want to sort the data.
You also can select whether to sort that data in ascending order (A – Z) so the smallest number appears at the top of the
sorted data, or descending order (Z – A) so the largest number appears at the top.
Formatting Cells
A spreadsheet provides a lot of options for re-formatting the information being displayed. These are similar to the options
in a word processing program like Microsoft Word or many other applications. They include:
 Changing the font size or style
 Defining the format for the kind of data in a cell, such as dates, times, currency or percents
 Changing the number of decimal places displayed in a number
 Changing the text color or the background color
 Adding borders around the cells
Some of these options are available by selecting Format in the menu at the top and then picking one of the choices in the
drop-down menu.
Or you can click on the icons in the middle of the toolbar for other options.

Lesson 3: What is a Database?

A database is a collection of related data which represents some aspect of the real world. A database system is designed to
be built and populated with data for a certain task.

What is DBMS?
Database Management System (DBMS) is a software for storing and retrieving users' data while considering
appropriate security measures. It consists of a group of programs which manipulate the database. The DBMS accepts the
request for data from an application and instructs the operating system to provide the specific data. In large systems, a
DBMS helps users and other third-party software to store and retrieve data.
DBMS allows users to create their own databases as per their requirement. The term “DBMS” includes the user of the
database and other application programs. It provides an interface between the data and the software application.
History of DBMS

Here, are the important landmarks from the history:

 1960 - Charles Bachman designed first DBMS system


 1970 - Codd introduced IBM'S Information Management System (IMS)
 1976- Peter Chen coined and defined the Entity-relationship model also know as the ER model
 1980 - Relational Model becomes a widely accepted database component
 1985- Object-oriented DBMS develops.
 1990s- Incorporation of object-orientation in relational DBMS.
 1991- Microsoft ships MS access, a personal DBMS and that displaces all other personal DBMS products.
 1995: First Internet database applications
 1997: XML applied to database processing. Many vendors begin to integrate XML into DBMS products.

Characteristics of Database Management System

 Provides security and removes redundancy


 Self-describing nature of a database system
 Insulation between programs and data abstraction
 Support of multiple views of the data
 Sharing of data and multiuser transaction processing
 DBMS allows entities and relations among them to form tables.
 It follows the ACID concept ( Atomicity, Consistency, Isolation, and Durability).
 DBMS supports multi-user environment that allows users to access and manipulate data in parallel.

DBMS vs. Flat File

DBMS Flat File Management System

Multi-user access It does not support multi-user access

Design to fulfill the need for small and large businesses It is only limited to smaller DBMS system.

Remove redundancy and Integrity Redundancy and Integrity issues

Expensive. But in the long term Total Cost of Ownership It's cheaper
is cheap

Easy to implement complicated transactions No support for complicated transactions

Users in a DBMS environment

Following, are the various category of users of a DBMS system

Component Name Task

Application Programmers The Application programmers write programs in various


programming languages to interact with databases.

Database Administrators Database Admin is responsible for managing the entire


DBMS system. He/She is called Database admin or DBA.

End-Users The end users are the people who interact with the
database management system. They conduct various
operations on database like retrieving, updating, deleting,
etc.

Popular DBMS Software

Here, is the list of some popular DBMS system:

 MySQL
 Microsoft Access
 Oracle
 PostgreSQL
 dBASE
 FoxPro
 SQLite
 IBM DB2
 LibreOffice Base
 MariaDB
 Microsoft SQL Server etc.

Application of DBMS

Sector Use of DBMS

Banking For customer information, account activities, payments,


deposits, loans, etc.

Airlines For reservations and schedule information.

Universities For student information, course registrations, colleges and


grades.

Telecommunication It helps to keep call records, monthly bills, maintaining


balances, etc.

Finance For storing information about stock, sales, and purchases


of financial instruments like stocks and bonds.

Sales Use for storing customer, product & sales information.

Manufacturing It is used for the management of supply chain and for


tracking production of items. Inventories status in
warehouses.

HR Management For information about employees, salaries, payroll,


deduction, generation of paychecks, etc.

Types of DBMS

Types of DBMS
Four Types of DBMS systems are:

 Hierarchical database
 Network database
 Relational database
 Object-Oriented database

Hierarchical DBMS

In a Hierarchical database, model data is organized in a tree-like structure. Data is Stored Hierarchically (top down or
bottom up) format. Data is represented using a parent-child relationship. In Hierarchical DBMS parent may have many
children, but children have only one parent.

Network Model

The network database model allows each child to have multiple parents. It helps you to address the need to model more
complex relationships like as the orders/parts many-to-many relationship. In this model, entities are organized in a graph
which can be accessed through several paths.

Relational model

Relational DBMS is the most widely used DBMS model because it is one of the easiest. This model is based on
normalizing data in the rows and columns of the tables. Relational model stored in fixed structures and manipulated using
SQL.

Object-Oriented Model

In Object-oriented Model data stored in the form of objects. The structure which is called classes which display data
within it. It defines a database as a collection of objects which stores both data members values and operations.

Advantages of DBMS

 DBMS offers a variety of techniques to store & retrieve data


 DBMS serves as an efficient handler to balance the needs of multiple applications using the same data
 Uniform administration procedures for data
 Application programmers never exposed to details of data representation and storage.
 A DBMS uses various powerful functions to store and retrieve data efficiently.
 Offers Data Integrity and Security
 The DBMS implies integrity constraints to get a high level of protection against prohibited access to data.
 A DBMS schedules concurrent access to the data in such a manner that only one user can access the same data at
a time
 Reduced Application Development Time

Disadvantage of DBMS

DBMS may offer plenty of advantages but, it has certain flaws-

 Cost of Hardware and Software of a DBMS is quite high which increases the budget of your organization.
 Most database management systems are often complex systems, so the training for users to use the DBMS is
required.
 In some organizations, all data is integrated into a single database which can be damaged because of electric
failure or database is corrupted on the storage media
 Use of the same program at a time by many users sometimes lead to the loss of some data.
 DBMS can't perform sophisticated calculations

You might also like