Professional Documents
Culture Documents
Introduction
What is a computer network?
To a user - a network is something that allows applications to talk to each other. They have an idea as to what
the infrastructure should allow them to do. Email and web browsing should be reliable. Internet banking
should also be secure. Internet radio, voice over IP and video streaming should have quality. With VoIP
however you also want low latency (delay) too, as opposed to internet radio. The demands of the network
depend on what type of application you want to run.
To a developer - needs to understand how the network works in order to create applications that the user can
use. Has an Application Programming Interface (API) to create the network application. The API provides a
set of functions (services), with quality of service parameters (such as low delay, reliability etc.), and it has
documentation as to how the API should be used. The network application developer needs to know about
protocols, and the valid sequence of function calls through the API.
To a network designer - has a structural view of the network. The network designer knows about the cables
and connectivity, and how different types of network can be linked together.
To a network provider - something to charge for!
Different sizes of networks
Protocols
Protocols define interactions, such as the way the programmer uses the network. They have many elements,
such as links, switches, end-hosts, processes, and exist within a single layer of the network. A protocol is only
used for one service. It defines a service. You get stacks and layers of protocols in order to complete a full
operation, these are known as network layers.
Network Layers
OSI 7 Layer Model (A Theoretical Model)
The OSI 7 Layer model is a theoretical model, and isn't necessarily how the network is laid out. This model
was developed to be follows theoretically and consists of seven key layers, as follows.
Protocol Encapsulation
An application message will use the transport layer to move the message. In order for the transport layer to
used, it has to put a header onto the front of the message, before the message is moved down the network
layers. Each network layer adds its own header, so the link layer does the same. Once it reaches the link
layer it can be moved across the network. The message has now gained additional control information which
can be used at the receiver end. When the message is received then the reverse process occurs, the network
layer removes the header associated with it, performs any operations it needs to, and then passes it up to the
higher network level. This is process is known as encapsulation.
Physical Layer Connectivity
We're interested in measurements - the bandwidth available for the data, and whether or not the connection is
shared (e.g. wireless broadcasts) or dedicated.
At The Core
A packet has to pass through more than one network. The packet moves through the network using a
process of forwarding. It doesn't look at the whole network, it just uses local decisions. It's impossible to
get a global view of the internet.
Various tiers exist in the network. Any network tier has a series of point of presences (POPs) to
achieve a connection.
It's good not to go all the way through the network, so there's also connectivity between the same tiered
networks. Tier 3 is local networks. ISPs connect tier 3 to the higher level networks.
Network topologies (i.e. the layout of computers and how they're linked togethers); include ring, bus,
star, tree, and mesh layouts.
They all have their advantages and disadvantages (use common sense to work these out)
What can be very important is scalability. Some network topologies have a different ability to
support scalability than others.
Types of Communication
1. Single (unicast) - once source to one destination.
2. All-nodes (broadcast) - once source to all nodes.
3. Multiple nodes, a subset of all nodes (multicast) - similar to broadcast but less computers that receive the
message.
Addressing - Used for Connectivity
Every device has to have an address in order to communicate with other devices, which has to be unique
across the entire world. An IP address (a 32bit numeric value) can be used. An IP address consists of two
parts, firstly, the network of the device, and secondly, the computer on that particular network. IP concatenates
these two values, so for example:
130.88 (network) + 0.28 (machine) gives the IP address = 130.88.0.28.
Ports - Used for Connectivity
A computer can run multiple applications, so each application has a unique port number in order ensure a
particular network application gets the information it wants, and not another applications data. TCP and UDP
use a 16-bit port number. Web browsers use port 80 for web servers.
The Reality of Networks - Networks are Unreliable!
The network is not reliable - but it should always appear to the end user (and perhaps to different network
layers) that it is. There are many reasons why things may go wrong, and so there are methods to work around
these faults.
Codes (checksums) are used to detect errors in the transmission. Acknowledgments are used to signal that a
message has arrived, or negative acknowledgments can be used to signal that a message didn't arrive.
Timeouts are used with acknowledgements - when they expire, a retransmission will take place. The main
principle is to hide some kinds of network failure and make it look more reliable than it is.
Some good principles in which to create applications that achieve functionality and good network design are
the following:
Service model.
Global coordination (e.g. port 80 is a web service).
Minimise manual setup.
Minimise volume of information at any point (otherwise bottlenecks at nodes can occur).
Distribute information capture and management.
Extensibility.
Integration with all different systems (e.g. Windows/Mac/Linux).
Error detection.
Error recovery (reliability).
Scalability.
Fixed path (channel) through the network is set up with dedicated resources to the connection.
Each path through the network becomes dedicated to the first connection made until it's released.
Establishing a second link isn't possible because the resources aren't available.
This idea comes from the way telephones used to work.
Advantages; guaranteed performance.
Disadvantages; setup time and the limitation of fixing an entire path through the network for only one
connection.
Packet Switching
The data is broken up into discrete 'chunks' and send it when the resources are available.
These pieces are normally of a fixed size.
All the bits in a piece are reserved for an end-to-end transfer.
The resource piece is idle if not used by the owning transfer.
Split up time.
At any time, the user gets all of the bandwidth
The user gets bursts of connection time.
The bursts are so small you probably wouldn't notice the difference.
Advantages; constant speed, good for latency needs.
Rather than throwing packets away, the switch uses a buffer that the packets go into.
An assumption is made that the data traffic is in bursts, and a buffer can level out these bursts of
data transfer into a more level amount.
Disadvantages: The packets may be lost if the buffer becomes full.
The total time to get a packet from one switch to another is the transmission time multiplied by the size of the
packet, then added to the propagation time.
Total Packet Transmission Time = Transmission Time * Size of the Packet + Propagation Delay
Traceroute
Traceroute is an application that shows the path a packet takes through a network from a source to a
destination.
Units of Measurement
1kbps is 1ms per bit.
1mbps is 1microsecond per bit.
Network Applications
Examples include mail clients, web browsers, video games and there are many more!
A network application is an application that has parts running on different computers. They communicate over
the network. They run at the edge of the network.
Architecture
Client-Server
Always on. Has a permanent IP address so that it can always be found. There may be multiple
IP addresses for a popular site such as Google to improve performance.
Client:
Has a dynamic IP address. The client may not always be at the same address. Clients
communicate through a server.
Advantages: It's easy to find the information because the server never changes.
Disadvantages: Not very scalable.
Peer-to-Peer (P2P)
No server. Instead, a collection of machines that change over time. Peers are intermittently connected
and may have a different address for each connection. Highly scalable.
Disadvantage: It's hard to manage because it can be hard to know where the information is.
Reliable
Can recover errors.
Has delays in recovering errors.
Example uses: E-mail, remote terminal access, web, file transfer, streaming media.
UDP
Unreliable.
No error recovery.
RFCs (request for comments) - used to define worldwide protocols. (e.g. Email).
Proprietary implementations - the application just decides (e.g. Skype).
Application Data
The application source and destination must make sure they know and have the same interpretation of the
data.
The applications also need to know what encoding is being used.
Compression if a form of encoding that makes minimises the size on the cable.
Understanding Data
Implicit Typing - the application at each end has to know what the format of the data will come in.
Explicit Typing - the data has in it flags (typically 1 bit) which tells the application what's coming next. The
application reads flag/data/flag/data etc.
Data Conversion
The data may need to be converted, for example if the size of an int is different for different applications on
different systems (this is an application-layer level of conversion).
Heterogeneous systems - different operating systems working together may have to be allowed for, so for
example one messaging client may run on Mac OS X, whilst another client may be running on Linux - the two
have to be able to communicate together and this is an application-level issue.
Canonical approach - same representation across the cable. The source converts to this representation if it
needs to and the destination translates from this representation if it needs to. For example, all integers could
be converted to, say 16, bits before being transmitted, and then converted.
Some information can be sent at the start of the transmission that identified what needs to be converted (for
example how to convert ints). This information only needs to be sent once.
Binary attachments for emails are converted into 7-bit ASCII values. This encoding takes series of 3 bytes and
then converts it into a ASCII values. This is base64 encoding.
Application Extensibility
Communicating between different versions of the application can cause problems. You want version 2 to be
compatible with version 1. If this is possible then application extensibility is achieved.
Client-server model.
Client sends a request and the server sends something back.
Uses TCP and port 80, which has been universally accepted.
Stateless.
Every time the client sends a request, it is treated by the server as if its never communicated with
the client before.
Maintaing state becomes to complicated - because it's a distributed system, its possible for one
end to crash, or bits of the network can go down. A form of recovery mechanism would be
required.
Because of these complications, HTTP decided it's not gong to bother with state.
The protocol has no concept of state. BUT cookies can allow a form of state.
The cookie can be sent back in the response message header lines.
The client can then use this cookie when making further requests to the server, with the cookie
being sent in the header file of the request message.
The server can then know things about the client with this cookie it receives.
The server can perform backend operations with a database before sending back information
relating to the user.
The header tells the server information about what the client wants to get or send send and
where, or what is going to be received (how big, what type, the date, other information).
GET
HEAD
POST
PUT
DELETE
Web caching
A cache keeps a copy of web items, which avoids re-fetching from the server.
If the version on the server has changed, then the information in the cache is not required.
The server can be asked if the version it has is different to the one in the cache.
Electronic Mail
Mail servers:
Dedicated mail servers are used to hold all the user's messages.
Mail must be sent to the correct server, which must then go into the correct mailbox on the server.
The server is often acting like a client in sorting these things out.
User agents:
These are followed by a blank line and body (the main message of the email and attachments via
MIME).
MIME (Multipurpose Internet Mail Extensions):
Allow non-ASCII character sets and file attachments to be sent via the e-mail ASCII-encoded systems.
Additional header lines are defined in order to tell the client to interpret the data in the message in a
different way.
Content types - discrete types (e.g. image/gif, text/plain), application discrete types (the subtype is
application, followed by the type e.g. "application/word" it means that this application is responsible for
interpreting this section), multipart type means that the body can contain multiple types.
MIME Encodings:
Only 7-bit ASCII values can be transmitted, so a message that is a straight text message doesn't need any
translation. However, non-ASCII characters have to be translated into ASCII values and then translated back
POP
The user agent communicates with the server and downloads all the messages.
So you can really only use one client, as the emails are being downloaded to this single client.
The client manages the emails on the server, as opposed to getting a copy.
The messages stay on the server, so whatever client you use you always get the same state of
your messages.
Hierarchical namespace for internet objects (e.g. .co.uk, .com, .ac.uk are different hierarchies).
Names have to be unique. But not worldwide unique, just within the hierarchy.
Decentralising because there's a lot of name mappings out there, so the looking up of these names
has to be optimised.
It's a "decentralised database" (but not a real database).
There's global coordination of the names.
A top organisation manages the top-level names, for example .com names. The organisation
delegates down to other organisations lower-level hierarchical names such as .co.uk. Further
delegations are made, such as .ac.uk.
Worldwide coordination.
Looking up the address for a name is done by first querying the root name server.
There is an implicit dot ('.') that's never written at the end of a web address. This is the root server. The
principle is that the name server for the next level down is then looked up at the current name server,
so the root server can be queried for a '.com' server, which can then be queried for 'google.com'.
The implication is that the traffic levels for the root servers will be huge - so the root servers are
distributed across the world to spread out the load. The root server that's closest is the one that's
looked up.
There are around thirteen major root servers across the planet, with the majority in the USA.
ISPs, companies and universities have local servers.
Host queries are sent via a local DNS server that can act as a proxy forwarding the query into the
hierarchy.
All name servers can cache mapping they discover, which speeds up the response to queries and
minimises the remote load on the servers.
DNS: Iterative Resolution for name lookup
A host sends the name of the server it's after to a local DNS server, which then asks all the other
servers for the name before sending the name of the server that has the name back to the
original host. Lots of network traffic.
DNS: Recursive Resolution for name lookup
A host sends the name of the server it's after to a local DNS server, which then asks the next
server if it doesn't know, which then asks the next server if it doesn't know and so on, until the
server that knows which computer has that address responds, to the previous server, which
responds to the previous server, and so on until the information is passed back to the original
host.
DNS Zone
Type specifies how to interpret the file (A = address, NS = authoritative name server for
zone, CNAME = true name of alias, MX = mail exchange/relay for zone, PTR = used to
map addresses to names).
Class defines purpose, which is an extension mechanism because at the moment it's only
ever set to IN (internet).
Multimedia
Multimedia is anything that is transmitted over a network that is not ordinary textual data. Examples include
pictures, movies and sound (voice or music).
Most forms of media have analogue representation, yet we can interpret it as binary data. We can convert
sound wave voltages into binary information for transfer or manipulation, and then convert it back to analogue
in order to play it back (for example when playing back sound through some speakers).
Text when stored into a computer using binary data is very small, but sounds and videos can be quite large in
file size, as well as being of mixed formats, requiring multiple interpretations. Because of this, we have to be
aware of the amount of network bandwidth available for transmitting large files.
Data can be delivered via:
Broadband - uses the same copper wire but can now go up to 20mbs.
Broadcast television - now converging to broadband multi-service networks. HD television shows are
becoming available over a broadband link.
Media is delay sensitive. When playing media from a remote source, you have two options; either you
download the entire media piece and play it back, or you ensure that enough of the data has been transferred
before playing it back.
Media is loss tolerant. You don't notice small irregularities if the data received isn't exactly the same as what
was sent, although there is a threshold point beyond which the user starts to notice. There are methods of
testing what this threshold of loss toleration is (such as investigating the percentage of randomised pixel data
in an image before the quality of the image becomes unsatisfiable).
All digital.
Information can be compressed further/better.
The phone makes the audio to digital data conversation, compresses it and sends it to a Base
Transceiver Station (BTS), which then passes it via a standardised Abis Interface to a Base Station
Controller (BSC). The Base Station Controller then uses another standardised interface to send the
information to a Mobile Switching Centre (MSC). SS7 (Signalling Service 7) is used to set up
connections and tear them down. The information continues to move across the backbone of the
phone network using standard IP and networking.
2G phones can't use VoIP. GSM (the phone standard) uses a time sharing system that uses eight slots
per transmission channel (each users gets 1/8th of the transmitter at a time and only gets this time with
the transmitter in bursts with long pauses between).
3G uses a bandwidth dependant on what other users are doing and you don't have to find a slot on the
transmitter because you can always connect. The bandwidth is greater but still not great - but good
Broadcast Television
A much more effective mechanism for broadcasting the same data to a wide range of devices. Much better
than the internet, because if the same amount of people try to get hold of the data via the internet, the servers
will become overloaded.
Files are then transferred as HTTP objects embedded in TCP. The client then received the data, and
passes it to the player.
This is not streamed - just getting a simple file! There's a long delay before you can play it back
because the entire file has to be downloaded to the client's local machine.
Streaming live multimedia
The data is then stored in a buffer when it arrives, and is played back to the user from this buffer.
An initial delay is set up so that the buffer has a bit of time to fill up.
If the buffer is under-filled then the video will stop and start.
Real Time Streaming Protocol (RTSP)
A metatrack has all the information about the streamed media that the player can then use to get the
data its after.
If the delay across the network changes, then the buffer time has to be adapted.
A buffer is a block of memory that you can put things into. You have a variable fill rate, a constant drain
rate. So long as the fill rate is greater than the drain rate everything will be smooth. If the buffer fills up,
it can give the server a rest.
Real Time Interactive Multimedia and Internet Phone
Some packets may get lost, or some may arrive in a different order. At the receiver's end the packets
must be reassembled in a sensible and coherent way before data is played to the user. A buffer is used
to facilitate this. The buffer will allow for jitter (the variance in the delay) and make the delay be a set
amount as opposed to a varying amount which could cause problems with the smoothness and quality
of service of transmission.
Packets are sent every 20 milliseconds. Ideally at the client every 20 milliseconds a packet will arrive
and be played back. Again, a buffer and fixed delay time will have to be incorporated to make sure that
this happens over time, and to compensate for jitter.
Fixed play out delay - delay the playing of the data by a value of q, so that if data arrives late then it
won't be disregarded because it's missed it's time to be played. A value of q must be selected to
maximise the amount of packets that won't get lost, but also minimise the delay so as to make a more
interactive experience. An adaptive pay-out delay could be used.
Most multimedia is transmitted via TCP or UDP. Some media transmissions will also use RTP or RTCP, which
are extensions of UDP.
If a packet is missing but the next one is here, just play the next one.
For media you need quality of service, so we want the "best effort" to get this. Best effort is the idea of the
protocol doing the best it can to deliver the data, but, if things go wrong then it's not the end of the world.
The internet gives no promises.
Sometimes the messages don't get delivered properly.
Causes of packet delay:
Encoding, sampling, packetising.
Queues and scheduling at the router.
Decoding, de-packetising etc.
Multi-media is delay sensitive, so we care about the best effort:
Delay - the difference between the time sent and the time received.
Jitter - the difference between the delay for the current pack and the previous one.
The delay and the jitter can be taken as an average over a period of time, which can be used to find
resolutions to network problems (more on this later).
Loss tolerant - infrequent losses cause minor glitches.
Opposite of data transfer (files, web pages etc.) - data is loss intolerant but delay tolerant.
Time stamp (32 bytes long) - sampling of the first byte in this data packet.
SSRC (32 bits long) - identifies the source of the RTP stream. For example, using sound and video
would have two unique SSRC numbers.
The sender sends RTP and RTCP to the internet, which goes to a number of receivers. The receivers
then send control packets back to the sender.
Receiver report packets include the fraction of packets lost, the last sequence number, the average
inter-arrival jitter.
The sender report packets include the SSRC of the RTP stream (this is the ID it's using), the current
time, the number of packets sent and the number of bytes sent.
Source description packets include the email address of the sender, the sender's name, the SSRC of
the associated RTP stream - the aim of which is to provide a mapping between the SSRC and the user/
host name.
We can now synchronise streams.
RTCP attempts to limit its traffic to 5% of the session bandwidth. RTCP gives 25% of the speed to senders,
and 75% to the receivers.
Routers
Providing Quality of Service
Quality of service is about trying to find the best service over the resources you have. Packets are divided into
different classes and isolated. These classes are then allocated resources. Fixed, non-sharable bandwidth is
allocated to the classes. At a router, packets can arrive in any order - they go into a queue. If the queue
becomes full then packets have to be dropped. Particular classes of router can have a higher priority for their
packets to be dropped.
Scheduling Policies for Routers
Prioritising - assigning priorities using classes to different routes. Classes with a higher priority will be
forwarded first. This may not be fair on some classes.
Round robin - going round all the classes and forwarding a packet from each. However, if there's congestion
then classes such as 'Voice Data' may arrive too late.
Weighted fair queue - different classes of data coming in are divided into a different set of queues, so they
each get a fixed proportion of the bandwidth.
Policing Traffic
Traffic arrives in bursts. The aim of a policing mechanism is to limit the traffic to three set parameters:
1. Long term average rate - the number of packets that can be sent per time unit.
2. Peak rate - the maximum number of packets that can be sent at one time in packets per minute. This must
support the long term average rate above.
3. Maximum burst size - the maximum number of consequentially sent packets.
A token bucket is used to throttle and limit the burst size and average rate. Tokens are added to the bucket
periodically. In order for a packet to pass through the router it must obtain a token from the bucket. This
consequently means that if too few data is provided then the token bucket will fill up with tokens at a
dynamically changeable rate, and if too much data is provided then the token bucket will be emptied of
tokens. If there are no tokens then the data has to wait for tokens to become available before it can continue
through the network. If there are lots of tokens then a burst of data that is received is just forwarded through.
Content Distribution Networks (CDNs)
Content Replication
Origin server with all the original data -> distributes the information to multiple systems spread around ->
accessed by the user.
DNS can be used to replace (redirect) a query for a document to a more local query based on your current
location. This system can also be used to determine the data to be sent based on your location (such as local
language).
CDN creates a "map" indicating distances from leaf ISPs and CDNs. It picks the closest CDN and redirects a
user's query accordingly. The traditional client-to-server is inefficient for mass downloads. The servers can't
handle the vast quantity of users demanding the data.
Peer-2-Peer
The client sends out a query which searches for the file on other machines recursively until one system
replies that it has the information. The same document can then be sent to multiple systems so there are
multiple locations to get the data from.
The load is now distributed and much lower. There's no heavy load on one single server.
Bit Torrent uses a swarm of machines (any machine currently connected to the bit torrent facility) and a
tracker. When you request a document, the tracker then asks the swarm for the information and gets a section
of the data from each machine. The information is now coming from different machines so it's even more
distributed. However, the upload speed of user's client is often slower than that of a dedicates server.
to that process. The application (running as a client) then sends a message to the server with information
about its port number. When the server then sends back information, it supplies the port number it received so
as when this information is received by the client it can be passed to the correct process.
There's a buffer for each of the applications. When the information arrives, the transport layer puts the
information into the correct buffer. The data will then wait there for the application to request it (when it's
ready). The buffer is of a finite size, so when the buffer becomes full then the information is thrown away (this
may waste the network connection because the sender will keep sending the information, as no
acknowledgment will be sent back).
send all the data in the buffer, and the receiver will get more than it was originally supposed to all at once. So
if the sequence number counts segments, then the numbers would be meaningless, as this segment of data
is now different to what it should have been (because it effectively is a combination of two segments). If each
byte is numbered, then the receiver can tell exactly what it's receiving, and duplicate information can be
identified.
A TCP acknowledgement acknowledges by stating the byte sequence number that it next expects to receive that's just how it works!
Sequence numbers are 32-bits long. The sequence number of the first byte in the segment is needed, and the
other sequence numbers in the segment are implicitly calculated form the first. Sequence numbers in each
direction of the transmission are independent of each other.
The value of the retransmission timeout can't be too small (retransmission will occur too often), nor too large
(excessive delays before a retransmission takes place). An appropriate value will to the round trip time.
Because the round trip time varies depending on the current level of traffic going through the network an
adaptable algorithm is needed. This algorithm can determine the current round trip time and adjust the the
timeout accordingly. There are three main algorithms for this:
1. Basic algorithm - set the timeout time to twice the roundtrip time (gives enough margin). An average round
trip time (and thus the timeout time) is taken, which is updated every time a packet is received (because the
round trip time can be calculated every send-receive-acknowledge cycle). However, the problem is that
duplicate packets being retransmitted can be received, which will make the average less representative.
2. Karn/Partridge algorithm - only measures the round trip time for non-retransmitted segments in order to
work around the issue outlined for the basic algorithm.
3. Jacobson/Karels algorithm - more suited to communications where the round trip time is more varied and
an average isn't appropriate. This algorithm takes into account the variation in the round-trip time (the jitter).
TCP Data Flow Control
The receiving buffer has a finite size. If the data is arriving more quickly than the application reads from it, then
the buffer will fill up. When the buffer becomes full, the receiver can't acknowledge any more data that it
receives (because it can't store it anywhere so it'll just loose it). The sender's timeout will expire and a
retransmission will take place. Because the receiver's buffer may still not be empty this loop will occur and the
transfer will be wasted sending the same thing again and again until the buffer is emptied.
A mechanism is required so that the receiver can control how much information the sender is sending. This
mechanism is known as flow control
The sliding window size is not fixed. In the acknowledgment, the receiver lets the sender know how much
data it's prepared to receive. The window size can be set to 0 (so no more data is transmitted until the buffer is
freed, at which point the receiver will re-acknowledge the last byte with a non-0 window size). The flaw is that
the re-acknowledgment packet could be lost, which would cause a deadlock. The solution is that once a
window size of 0 is set at the sender a much longer timeout is used (such as two minutes) before sending the
next segment as normal.
Because the window size is determined in a 16 bit value, the maximum window cannot exceed 64kb. This
means that for very fast connections the network cannot be fully utilised as the sender will be waiting for the
acknowledgment after sending its window. One solution is to use multiple TCP connections. Another solution
would be to use a multiplier factor of, say 10, for the sequence number and window size, so that a segment
can go up to 640kb.
TCP Connection Control
Setup is asymmetric, as one side is active and the other side is passive. The teardown is symmetric, as both
sides must perform a close symmetrically.
A three way handshake is used to establish a connection:
A - sends a packet across to B, setting the SYN control flag and an initial sequence number
B - acknowledges this packet, sending its own initial sequence number by setting the SYN control flag.
In the reality of networks, data can become lost and re-emerge a lot later. If a first TCP connection is set up
and torn down whilst information from this connection is still travelling through the network and a new
connection is set up using the same addresses and port numbers, then when the data is finally received it's
treated as data for the new connection (if it's within the window size expected). If the sequence number is not
set to zero every time, then sequence numbers in the window the receiver is expecting will be different, so the
old data will be thrown away when its received because it will have a sequence number outside the window
of sequences numbers that were expected. This is why TCP negotiates an initial sequence number.
TCP control is defined in a state transition diagram, which gives a structured method of laying out the
protocol.
TCP Congestion Control
TCP implements an algorithm that attempts to detect if congestion is occurring within the network. If an
acknowledgment is received, it determines that there was enough space for that segment to go through - so it
increases the amount of data it sends. When a segment gets lost, it determines that there must be congestion,
so it decreases the rate in which it sends data. The increase of data occurs slowly, where as the decrease in
the data rate occurs very sharply, so a graph showing the amount of data TCP is transmitting would look like a
'sawtooth'. When TCP first connects, the transfer would be very slow by the rules of congestion control just
outlined. So, a different approach is used when the connection is first established. As soon as the sender
receives an acknowledgement it doubles the rate it sends data (instead of the normal, slow increase) until a
particular threshold is met, at which point the sending rate increases using the standard slow increase.
TCP Fairness
TCP tries to divide the connection bandwidth between the number of TCP connections currently active. But, if
an application opens up multiple TCP connections, then that application will get an unfair share of the
connection (TCP will have no knowledge of this).
Inter-Networking
A collection of networks (each with their own address scheme, service model etc.) can be made to look like
one huge, single network. The Internet Protocol (IP) manages to achieve this.
Differences between physical networks:
IP has to work round all these differences and create a uniformed network where it doesn't matter what the
individual attributes of linked networks are. IP introduces a secondary, universal, logical, address space
which maps physical addresses to logical addresses. This is to give every location a unique universal
identifier across the planet.
Different sizes of physical packets can be uniformed by setting a packet size. The minimum packet size
depends on the path being followed through a network, which could vary.
The packet formats can be different. The packets could be translated at each piece of technology, but this is
not always possible, so a new universal packet format may need to be introduced in order to encapsulate
packets.
Broadcasting can be implemented by sending messages to every single computer on the network (multiple
unicasts).
Service Model
This will be provided to the transport layer that the other layers that exist above.
The first few bits contain the IP protocol version (typically v4 and v6 are in use at the moment).
There's a 32 bit source IP address.
There's a 32 bit destination IP address.
Options can hold a timestamp, record of the route taken, or a specified list of routers to take.
In order to stop an IP packet going round and round the network, the packet header also has a time-tolive which is decremented each time it passes through a router. Once it gets down to 0, the packet is
deleted and the source IP address is notified.
There's information about the upper layer protocol to use.
Datagrams must be encapsulated into a physical frame to be transmitted across the physical link layer. The
physical address must be mapped to the logical IP address. MAC addresses aren't used, instead, an
Address Resolution Protocol (ARP) is used, which is associated with Ethernet/IEEE 802.3. This is a
broadcast message which asks for the physical address for the IP. The system with the physical address of
the IP address responds with the physical address. In order to reduce the amount of traffic caused by this
protocol, the responses are cached and nodes along the network will also cache queries they see.
If a packet is too large for a network it's about to enter, it is split up. There are fields in the header of the
datagram that allow the packets to be reassembled, which occurs at the final destination. The network
becomes more unreliable the more the packet is split up.
Dynamic Host Configuration Protocol (DHCP) is used to configure a host with an IP address, so that
information such as the default router, netmask and DNS servers are known. It's a client-server protocol.
When a client starts up, it broadcasts for the DHCP connection information. The server then sends back the
details of the configuration. The DHCP server is part the routing hub. Clients can use static or dynamic
address assignment.
An organisation may want to restrict access to its network. This can be done using your own cable, although
this can be very expensive. Instead, a virtual private network can be implemented which creates a secure
tunnel between two ends. All the data has to pass through this tunnel.
Network Address Translation (NAT) allows multiple computers to share a unique worldwide address. Local
addresses, such as 192.168 known as private addresses. Packets from these addresses that want to go to
another network have to go to a NAT box, which maps the address and port number to a worldwide address
and vice versa. To a private network, the NAT box works like a router.
Internet Protocol Version 6 (IPv6)
The motivation to develop a new version of the internet protocol was to deal with the growth of the internet.
The datagram format had to be changed for the new version, for example 32 bits for an address was
determined to be too small to hold all the network addresses one would want to use, so it was changed to 128
bit - which means that every atom on the planet can have its own unique IP address - but who knows, maybe
things in space may need IP addresses - planning for the future!
Multicast is to be implemented, which means that computers will be able to broadcast beyond the local
network. There will be a method of specifying how far you want to broadcast.
The main aim of the link layer is to provide node-to-adjacent-node transfer of a datagram over a link.
Services required to achieve this main aim
Framing - encapsulation needs to take place to a packet before it can sent. Nothing about a packet that is
received is looked at.
Sharing - if a wire is to be shared this needs to be supported by the link layer.
Addressing - one machine needs to know what the other machine is that it's trying to communicate with is.
Flow control - the amount of data that's being sent needs to be controllable. (This is despite higher-level
protocols such as UDP having no flow control).
Error correction - forward error correction and other techniques are required at the other end in order get
around realistic problems of the network. This is to give the layers above the illusion of a reliable network.
1 bit in a million is the typical error rate for an Ethernet cable.
1 bit in a billion billion is the typical error rate in fibre optic cables.
Much more than 1 bit in a million is the typical error for wireless communications.
Full or half-duplex - determining the direction that information can be sent at one time, or whether information
can be sent in both directions.
Where is it implemented?
In every node and in every adaptor such as a network interface card (NIC) - such as an Ethernet card.
The link layer is not handled by the CPU (which handles all of the higher network levels). Instead, the link
layer is handled by a controller in the network card that passes information to a host BUS, which then passes
it via interrupts and registers to the CPU.
Network adaptor sending:
Encapsulates the datagram in a frame, adding error checking bits, flow control and more.
Network adaptor receiving:
Checks for errors, flow control and other information in the frame. It extracts the information from the datagram
and passes it to the upper layer at the receiving side.
Packet Encapsulation
A header and a tail is added to the message, which is taken off when it's received.
Ethernet frame structure:
Header:
Preamble - allows the receiver to learn what's going on during the networks. The preamble is set to 1,0,1,0, ..
etc. a number of times. The preamble is known, so errors in it can be learnt and recognised to be possible
flaws in the network.
Start of frame delimiter - lets the receiver know that the rest of the frame is about to continue, again preset.
Mac address of the destination - 48 bits in size.
Mac address of the source - again 48 bits in size.
Ether-type or packet length - so you know much encapsulated data you have.
Tail:
CRC32 - A cyclic redundancy check - calculated by the sender and checked by the receiver to see if the
data has been received correctly. Not perfect. It's possible to pass the check and be passed up to the other
layers despite there being an error.
Inter-frame gap - a bit of space before the next frame comes along.
Flow Control
Optional.
Most network cards do have flow control. The aim is to ensure that the receiver's buffer doesn't overflow.
Implemented via either:
Handshake - a wire can be set to high or low to indicate whether the receiver is ready or not. This can
also be done using a software interpreted X-ON/X-OFF.
or...
or...
Open-flow - pre-reserve and negotiate some of the resources of the receiver in order to deal with the
sender's data. A protocol that handles this idea is Connection Admission Control (CAC).
Closed-loop - a method of reporting resource availability and resource needs and sending data
according to these reports. A message is broadcast to a special multicast address with a 16 bit time
request for a pause in the data transmission. The Asynchronous Transfer Mode (ATM) has an
Available Bit Rate (ABR) which guarantees a minimum bit rate to a sender and allows the receiver to
report back congestion - at which point the minimum bit rate is used.
32-bit.
Network-layer.
Partially geographical.
MAC Address:
MAC address is the link-layer address of your machine which is essential for communicating with
anything.
48 bits (6 bytes) - for most LANS: 3 bytes for the organisation identifier and three bytes for the NIC
identifier.
Set into the Network Interface Card (NIC), but can be software settable.
Expected to last until the year 2100 before the same addresses will be used again...
The aim of the MAC address is to assist the link layer in transmitting the framed packet from one
interface to another on a physically-connected interface.
Administered by IEEE - the manufacturer buys a portion of MAC address space in order to assure
uniqueness.
MAC addresses are flat, which means that you can't tell anything about the sender or receiver (other
than the manufacturer) from the numbers. This is opposed to IP addresses, which are geographically
allocated (there's not need for geographic information with MAC addresses).
Mapping IP to/from MAC Addresses
Address Resolution Protocol (ARP) - an address resolution protocol table is used. When an IP address is
known, but the MAC address isn't (i.e. it's not in the lookup/mapping table) of a destination machine, then a
new frame is broadcast detailing the sender's IP and MAC address, and requesting the MAC address of the
system with the IP address we want as the destination. The destination address responds with its MAC
address, and nodes along the path and the original sender make a note of this mapping within the ARP table
as a cache for later. Entries stay in the table for typically 20 minutes. This cache is to ensure the network isn't
constantly filled with traffic requesting the MAC address.
Hubs and Switches
Hubs - allow the connection of wires from a number of different machines to join together into an effectively
analogue communication. The aim is to allow multiple machines to share a connection. A frame is sent out on
all systems in the hub. Nodes connected to the hub can collide with each other and can listen to other
messages. When multiple messages get mixed up (and no longer make sense!) then collision avoidance has
to take place, which may mean turning off communications and backing off before retransmitting. Not very
common any more.
Switches - store and forward devices. Transparent - the hosts are unaware of switches. Plug-and-play and
self-learning. Each wire is a separate collision domain - they're separate so there's no collisions. They're also
full duplex wires. Switches can also buffer and queue the packets. Switches can do much more than hubs
can, which is why they've pretty much replaced hubs.
For each of the connections it provides it has a network interface card and a processor in order to look at the
data coming in, check CRCs, extract the datagrams and then send the data up the layers. If the data is to be
forwarded then it takes it up a a layer, moves it to the correct network interface, encapsulates it, and sends it
down the wire.
A switch knows that destinations are reachable using a switch table. It broadcasts a message in order to fill
the table up with MAC addresses of the destination. This is very similar to how ARP works.
Switches can be connected together - the same principles apply, the switches will talk to each other in exactly
the same way.
Gigabyte Ethernet is used for faster transmissions, and is often used for backbones to multiple standard
Ethernet switches as it has a higher bandwidth, but is becoming more popular and mainstream for Ethernet
solutions.
Switches vs. Routers
They both act in the same store-and-forward way, but routers maintain routing tables and implement routing
algorithms, whilst switches maintain switch tables, implement learning algorithms, and implement filtering.
Routers can do more complicated forwarding because they can access the higher network layers and look at
IP addresses in the datagram etc.
One node with the data using the whole bandwidth of the channel so as there's no competition for
resources. In practice however, nodes are not capable of using the full channel.
The connection to be shared equally between all the nodes currently using the connection.
Decentralisation - so as there's no central point of failure.
Simple and inexpensive.
which don't affect other coded communications. The phones listen out for communications that are coded in
the format that they understand.
Clock Recovery
The clock is not sent as this is a waste of the network's resources.
Non-Return to Zero Inverted
A transition in the voltages levels always converts to a binary bit 1, where as a consecutive voltages across
multiple clock cycles indicates a 0 bit each clock cycle. Multiple ones then make good clock recovery, as
there'll be multiple transitions on the clock edge. This is why an ideal synchronisation pattern is 111111. On
the other hand, a whole sequence of 0s can mean the clock get's lost.
Manchester Encoding
Used in IEEE 802.3 Ethernet.
The clock runs at twice the data rate. The sender sends the XOR of the clock (which is twice as fast as the
data) and the data.
However, using this encoding, only 50% of the speed of the network can be used, as the clock has to be twice
the speed of the data. But there is very easy clock recovery. Strings of 0s and 1s can be dealt with quite
easily.
4B/5B
The bits are broken down into a series of 4 bits. Each series of 4 bits is encoded as a 5 bit code. The 5
bit code ensures there's only ever:
This means that there's only ever 3 zeroes in a row before there's a transition to a 1. This encoding can
then be sent over Non-Return-to-Zero, so every 1 transmitted has a voltage transition (the more
transitions the better!), and has an efficiency of 80%.
There are spare codes (as you can use 32 codes with 5 bits and encoding 4 bits only takes up 16
codes). Some of these spare codes are used for transmitting control. 11111 == Idle (1s are used to
maintain the transitions and clock synchronisation so that when data does come in there's no transition
problems). 00000 == dead (no transitions, no transmission - it's dead because it's not moving!). 00100
== halt (a control to stop the transmission). There are 6 more control signals, and 7 unused codes
because they break the zeros rule (described above).
Signals and Modulation
Whatever medium is being used, the signals are usually electromagnetic waves - the speed of light. Different
materials have different refractive indexes, which varies the speed of light. When transmitting light down a
copper wire, to find the velocity, it is 2/3 * speed of light. The velocity factor for a medium determines how
much the light will slow down when it passes through that medium.
Different methods of transmission:
more than 1 bit, for four different frequencies could represent 2 bits.
Phase (PM) - vary the angle of the sine wave. The angle determines where the changes occur.
Phase Shift Keying - multiple angles are used to transmit multiple bits per signal. The more bits per
signal, the faster the data rate.
Quadrature Amplitude Modulation (QAM):