You are on page 1of 289

Prof NB Venkateswarlu

B.Tech(SVU), M.Tech(IIT-K), Ph.D(BITS, Pilani), PDF(U of Leeds,UK)

ISTE Visiting Fellow 2010-11


AITAM, Tekkali

` ` ` ` ` ` ` ` `

A Small Dose of Questions To Know You Little Briefing about Unix Internals Recapitulation of What is Internet Variety of Addresses involved Socket Concepts Related System Calls Simple TCP Client and Server in action Simple UDP Client and Server in action What is DNS

` ` `

` `

What is the Difference between Data Communications and Computer Networks?. What is firmware?. Why do we need to split a message?. Why do we require so many levels of control? (Is network system is reliable?) What are physical and logical addresses?. What is the conceptual difference between DLL and NLL?.

` ` ` ` ` ` `

What is fork()? What is signal? What is Process and Thread? What is a device driver?. What is a daemon?. What is exec() What are locks?.

A collection of interconnected networks Networks: Different depts, labs, etc.

Internet[work]

CS
`

Router: node that connects distinct networks Host: network endpoints (computer, PDA, light switch, ) Together, an independently administered entity
Enterprise, ISP, etc.
EE ME

Many differences between networks


Address formats Performance bandwidth/latency Packet size Loss rate/pattern/handling Routing

Internet[work]

ATM

802.3

Frame relay

How to translate and inter-operate?

` `

Internet vs. internet The Internet: the interconnected set of networks of the Internet Service Providers (ISPs) and endnetworks, providing data communications services.
Network of internetworks, and more About 17,000 different ISP networks make up the Internet Many other end networks 100,000,000s of hosts

Node
`

Link

Node

Links can be
Wired or wireless

10

Routers send packet towards destination

R R

R R

H
R R

H: Hosts R: Routers

11

Because of Noise Conditions of the Channels Noise is rated as: 1 in 105

Packets

Better Link Utilization


13

Problem: Network Overload

Solution: Buffering and Congestion Control


` ` `

Short bursts: buffer Buffer sizes varies from network to network. So, fragmentation takes places What if buffer overflows?
Packets dropped Sender adjusts rate until load = resources congestion control

14

Problem: Packet size On Ethernet, max packet is 1.5KB Typical web page is 10KB

Solution: Fragment data across packets ml x.ht inde GET GET index.html
15

Implements an agreement between parties on how communication should take place

Friendly greeting

Muttered reply

Destination?

Madison Thank you

16

Each protocol offers interfaces


One to higher-level protocols on the same end hosts
x Expects one from the layers on which it builds x Interface characteristics, e.g. IP service model

A peer interface to a counterpart on destinations


x Syntax and semantics of communications x (Assumptions about) data formats
`

Protocols build upon each other


Adds value, improves functionality overall
x E.g., a reliable protocol running on top of IP

Reuse, avoid re-writing


x E.g., OS provides TCP, so apps dont have to rewrite

17

Protocols are the key to interoperability.


Networks are very heterogenous:
Ethernet: 3com, etc. Routers: cisco, juniper etc. App: Email, AIM, IE etc. Hardware/link Network Application

The hardware/software of communicating parties are often not built by the same vendor Yet they can communicate because they use the same protocol
x Actually implementations could be different x But must adhere to same specification
`

Protocols exist at many levels.


Application level protocols Protocols at the hardware level

18

One or more protocols implement the functionality in a layer


Only horizontal (among peers) and vertical (in a host) communication

Protocols/layers can be implemented and modified in isolation Each layer offers a service to the higher layer, using the services of the lower layer. Peer layers on different systems communicate via a protocol.

higher level protocols (e.g. TCP/IP, Appletalk) can run on multiple lower layers multiple higher level protocols can share a single physical network

19

Application Presentation Session Transport Network Data link Physical

Application (plus libraries) TCP/UDP IP Data link Physical

20

FTP

HTTP

NV

TFTP UDP

App protocols Two transport protocols: provide logical channels to apps Interconnection of n/w technologies into a single logical n/w

TCP IP NET1 NET2

NETn Network protocols implemented by a


comb of hw and sw.

Note: No strict layering. App writers can define apps that run on any lower level protocols.
21

FTP

HTTP

NV

TFTP UDP

Applications UDP TCP

TCP IP NET1 NET2

Waist
Data Link

NETn

Physical

The Hourglass Model The waist: minimal, carefully chosen functions. Facilitates interoperability and rapid evolution
22

Application Transport Network Link Physical Host Bridge/Switch Router/Gateway Host

23

User A
Get index.html

User B

Connection ID

Source/Destination Link Address

Header
24

` `

Multiple choices at each layer How to know which one to pick?


FTP HTTP NV TFTP UDP IP
Many Networks IP TCP/UDP

TCP

NET1

NET2

NETn

25

Multiple implementations of each layer

How does the receiver know what version/module of a layer to use?

V/HL ID TTL

TOS Prot.

Length Flags/Offset H. Checksum

Packet header includes a demultiplexing field


Used to identify the right module for next layer Filled in by the sender Used by the receiver

Source IP address Destination IP address Options..

Multiplexing occurs at multiple layers. E.g., IP, TCP,

TCP IP

TCP IP

26

TCP
` ` ` `

Telephone Call
Guaranteed delivery In-order delivery Setup connection followed by conversation

Reliable guarantee delivery Byte stream in-order delivery Checksum for validity Setup connection followed by data transfer

Example TCP applications Web, Email, Telnet

27

UDP
No guarantee of delivery Not necessarily in-order delivery No validity guaranteed Must address each independent packet

Postal Mail
Unreliable Not necessarily in-order delivery Must address each reply

Example UDP applications Multimedia, voice over IP


28

Application file transfer e-mail web documents real-time audio/ video stored audio/video interactive games financial apps

Data loss no loss no loss no loss loss-tolerant loss-tolerant loss-tolerant no loss

Bandwidth elastic elastic elastic audio: 5Kb-1Mb video:10Kb-5Mb same as above few Kbps elastic

Time Sensitive no no no yes, 100s msec yes, few secs yes, 100s msec yes and no

29

Byte Order
Different computers may have different internal representation of 16 / 32-bit integer (called host byte order). Examples Big-Endian byte order (e.g., used by Motorola 68000):

Little-Endian byte order (e.g., used by Intel 80x86):

TCP/IP specifies a network byte order which is the bigendian byte order. For some WinSock functions, their arguments (i.e., the parameters to be passed to these functions) must be stored in network byte order. WinSock provides functions to convert between host byte order and network byte order:

32

Processes
A process has text: machine instructions (may be shared by other processes) data stack Process may execute either in user mode or in kernel mode. Process information are stored in two places: Process table User table
36

User mode and Kernel mode


At any given instant a computer running the Unix system is either executing a process or the kernel itself is running The computer is in user mode when it is executing instructions in a user process and it is in kernel mode when it is executing instructions in the kernel. Executing System call ==> User mode to Kernel mode perform I/O operations system clock interrupt

37

Process Table
Process table: an entry in process table has the following information: process state: A. running in user mode or kernel mode B. Ready in memory or Ready but swapped C. Sleep in memory or sleep and swapped PID: process id UID: user id scheduling information signals that is sent to the process but not yet handled a pointer to per-process-region table 38 There is a single process table for the entire system

User Table (u area)


Each process has only one private user table. User table contains information that must be accessible while the process is in execution. A pointer to the process table slot parameters of the current system call, return values error codes file descriptors for all open files current directory and current root process and file size limits. User table is an extension of the process table.
39

Process table
Active process

Kernel user address address space space

resident swappable
text u area data stack

Region table

Per-process region table

40

Shared Program Text and Software Libraries


programs, such as shell, are often being executed by several users simultaneously. The text (program) part can be shared. In order to be shared, a program must be compiled using a special option that arranges the process image so that the variable part(data and stack) and the fixed part (text) are cleanly separated. An extension to the idea of sharing text is sharing libraries. Without shared libraries, all the executing programs 41 contain their own copies.
Many

Process table
text data stack Active process

Region table

text data stack

Reference count = 2

Per-process region table

42

System Call
A process accesses system resources through system call. System call for Process Control: fork: create a new process wait: allow a parent process to synchronize its execution with the exit of a child process. exec: invoke a new program. exit: terminate process execution File system: File: open, read, write, lseek, close inode: chdir, chown chmod, stat fstat 43 others: pipe dup, mount, unmount, link, unlink

System call: fork()


fork: the only way for a user to create a process in Unix operating system. The process that invokes fork is called parent process and the newly created process is called child process. The syntax of fork system call: newpid = fork(); On return from fork system call, the two processes have identical copies of their user-level context except for the return value pid. In parent process, newpid = child process id 44 In child process, newpid = 0;

/* forkEx1.c */ #include <stdio.h>

main() { int fpid; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d\n", fpid); } else { printf("Parent Process fpid=%d\n", fpid); } printf("After forking fpid=%d\n", fpid); }

$ cc forkEx1.c -o forkEx1 $ forkEx1 Before forking ... Child Process fpid=0 After forking fpid=0 Parent Process fpid=14707 After forking fpid=14707 $

45

/* forkEx2.c */ #include <stdio.h> main() { int fpid; printf("Before forking ...\n"); system("ps"); fpid = fork(); system("ps"); printf("After forking fpid=%d\n", fpid); }

$ forkEx2 Before forking ... PID TTY TIME CMD 14759 pts/9 0:00 tcsh 14778 pts/9 0:00 sh 14777 pts/9 0:00 forkEx2 PID TTY TIME CMD 14781 pts/9 0:00 sh 14759 pts/9 0:00 tcsh 14782 pts/9 0:00 sh 14780 pts/9 0:00 forkEx2 14777 pts/9 0:00 forkEx2 After forking fpid=14780 $ PID TTY TIME CMD 14781 pts/9 0:00 sh 14759 pts/9 0:00 tcsh 14780 pts/9 0:00 forkEx2 After forking fpid=0

$ ps PID TTY TIME CMD 14759 pts/9 0:00 tcsh $


46

System Call: getpid() getppid()


Each process has a unique process id (PID). PID is an integer, typically in the range 0 through 65535. Kernel assigns the PID when a new process is created. Processes can obtain their PID by calling getpid(). Each process has a parent process and a corresponding parent process ID. Processes can obtain their parents PID by calling getppid().

47

/* pid.c */ #include <stdio.h> #include <sys/types.h> #include <unistd.h> main() { printf("pid=%d ppid=%d\n",getpid(), getppid()); } $ cc pid.c -o pid $ pid pid=14935 ppid=14759 $

48

/* forkEx3.c */ #include <stdio.h> #include <sys/types.h> #include <unistd.h> main() { int fpid; printf("Before forking ...\n"); if((fpid = fork())== 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } printf("After forking fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); }

49

$ cc forkEx3.c -o forkEx3 $ forkEx3 Before forking ... Parent Process fpid=14942 pid=14941 ppid=14759 After forking fpid=14942 pid=14941 ppid=14759 $ Child Process fpid=0 pid=14942 ppid=1 After forking fpid=0 pid=14942 ppid=1 $ ps PID TTY TIME CMD 14759 pts/9 0:00 tcsh

50

System Call: wait()


wait system call allows a parent process to wait for the demise of a child process. See forkEx4.c

51

#include <stdio.h> #include <sys/types.h> #include <unistd.h> main() { int fpid, status; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } wait(&status); printf("After forking fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); }

52

$ cc forkEx4.c -o forkEx4 $ forkEx4 Before forking ... Parent Process fpid=14980 pid=14979 ppid=14759 Child Process fpid=0 pid=14980 ppid=14979 After forking fpid=0 pid=14980 ppid=14979 After forking fpid=14980 pid=14979 ppid=14759 $

53

System Call: exec()


exec() system call invokes another program by replacing the current process No new process table entry is created for exec() program. Thus, the total number of processes in the system isnt changed. Six different exec functions: execlp, execvp, execl, execv, execle, execve, (see man page for more detail.) exec system call allows a process to choose its successor.
54

int execl(file_name, arg0 [, arg1, ..., argn], NULL) char *file_name, *arg0, *arg1, ..., *argn; int execv(file_name, argv) char *file_name, *argv[]; int execle(file_name, arg0 [, arg1, ..., argn], NULL, envp) char *file_name, *arg0, *arg1, ..., *argn, *envp[]; int execve(file_name, argv, envp) char *file_name, *argv[], *envp[]; int execlp(file_name, arg0 [, arg1, ..., argn], NULL) char *file_name, *arg0, *arg1, ..., *argn; int execvp(file_name, argv) char *file_name, *argv[];
55

/* execEx1.c */ #include <stdio.h> #include <unistd.h> main() { printf("Before execing ...\n"); execl("/bin/date", "date", 0); printf("After exec\n"); } $ execEx1 Before execing ... Sun May 9 16:39:17 CST 1999 $

56

/* execEx2.c */ #include <sys/types.h> #include <unistd.h> #include <stdio.h> $ execEx2 Before execing ... After exec and fpid=14903 main() $ Sun May 9 16:47:08 CST 1999 { $ int fpid; printf("Before execing ...\n"); fpid = fork(); if (fpid == 0) { execl("/bin/date", "date", 0); } printf("After exec and fpid=%d\n",fpid); 57 }

Handling Signal
A signal is a message from one process to another. Signal are sometime called software interrupt Signals usually occur asynchronously. Signals can be sent A. by one process to anther (or to itself) B. by the kernel to a process. Unix signals are content-free. That is the only thing that can be said about a signal is it has arrived or not

58

Handling Signal
Most signals have predefined meanings: A. sighup (HangUp): when a terminal is closed, the hangup signal is sent to every process in control terminal. B. sigint (interrupt): ask politely a process to terminate. C. sigquit (quit): ask a process to terminate and produce a codedump. D. sigkill (kill): force a process to terminate. See signEx1.c

59

#include <stdio.h> #include <sys/types.h> #include <unistd.h> main() { int fpid, *status; printf("Before forking ...\n"); fpid = fork(); if (fpid == 0) { printf("Child Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); for(;;); /* loop forever */ } else { printf("Parent Process fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); } wait(status); /* wait for child process */ printf("After forking fpid=%d pid=%d ppid=%d\n", fpid, getpid(), getppid()); }

60

$ cc sigEx1.c -o sigEx1 $ sigEx1 & Before forking ... Parent Process fpid=14989 pid=14988 ppid=14759 Child Process fpid=0 pid=14989 ppid=14988 $ ps PID TTY TIME CMD 14988 pts/9 0:00 sigEx1 14759 pts/9 0:01 tcsh 14989 pts/9 0:09 sigEx1 $ kill -9 14989 $ ps ...

61

Scheduling Processes
On a time sharing system, the kernel allocates the CPU to a process for a period of time (time slice or time quantum) preempts the process and schedules another one when time slice expired, and reschedules the process to continue execution at a later time. The scheduler use round-robin with multilevel feedback algorithm to choose which process to be executed: A. Kernel allocates the CPU to a process for a time slice. B. preempts a process that exceeds its time slice. C. feeds it back into one of the several priority queues.
62

Process Priority
Priority Levels
swapper wait for Disk IO wait for buffer wait for inode ... wait for child exit User level 0 User level 1 ... User level n
63

Processes

Kernel Mode User Mode

Process Scheduling (Unix System V)


There are 3 processes A, B, C under the following assumptions: A. they are created simultaneously with initial priority 60. B. the clock interrupt the system 60 times per second. C. these processes make no system call. D. No other process are ready to run E. CPU usage calculation: CPU = decay(CPU) = CPU/2 F. Process priority calculation: priority = CPU/2 + 60. G. Rescheduling Calculation is done once per second.
64

Process A Priority CPU count

Process B Priority CPU count 60 0

Process C Priority CPU count 60 0

0 1 2

60

75

0 60 30

60

67

15

75

0 60 30

60

60

63

7 67 33

67

15

75

0 60 30

3 4

76

63

7 ...

67

15

65

Booting
When the computer is powered on or rebooted, a short built-in program (maybe store in ROM) reads the first block or two of the disk into memory. These blocks contain a loader program, which was placed on the disk when disk is formatted. The loader is started. The loader searches the root directory for /unix or /root/unix and load the file into memory The kernel starts to execute.
66

The first processes


The kernel initializes its internal data structures: it constructs linked list of free inodes, regions, page table The kernel creates u area and initializes slot 0 of process table Process 0 is created Process 0 forks, invoking the fork algorithm directly from the Kernel. Process 1 is created. In kernel mode, Process 1 creates user-level context (regions) and copy code (/etc/init) to the new region. Process 1 calls exec (executes init).
67

init process
The init process is a process dispatcher:spawning processes, allow users to login. Init reads /etc/inittab and spawns getty when a user login successfully, getty goes through a login procedure and execs a login shell. Init executes the wait system call, monitoring the death of its child processes and the death of orphaned processes by exiting parent.

68

Init fork/exec a getty progrma to manage the line When the shell dies, init wakes up and fork/exec a getty for the line

Getty prints login: message and waits for someone to login

The shell runs programs for the user unitl the user logs off

The login process prints the password message, read the password then check the password

69

File Subsystem
A file system is a collection of files and directories on a disk or tape in standard UNIX file system format. Each UNIX file system contains four major parts: A. boot block: B. superblock: C. i-node table: D. data block: file storage

70

File System Layout


Block 0: bootstrap Block 1: superblock Block 2 Block 2 - n:i-nodes

...
Block n Block n+1 Block n+1 - last:Files

...
The last Block
71

Boot Block
A boot block may contains several physical blocks. Note that a physical block contains 512 bytes (or 1K or 2KB) A boot block contains a short loader program for booting It is blank on other file systems.

72

Superblock
Superblock contains key information about a file system Superblock information: A. Size of a file system and status: label: name of this file system size: the number of logic blocks date: the last modification date of super block. B. information of i-nodes the number of i-nodes the number of free i-nodes C. information of data block: free data blocks. 73 The information of a superblock is loaded into memory.

I-nodes
i-node: index node (information node) i-list: the list of i-nodes i-number: the index of i-list. The size of an i-node: 64 bytes. i-node 0 is reserved. i-node 1 is the root directory. i-node structure: next page

74

mode owner timestamp Data block Size Reference count Block count Data block Data block Data block

I-node structure

...
Data block

...
Data block

Direct blocks 0-9 Indirect block Indirect block Indirect block Indirect block Triple indirect
75

Single indirect Double indirect

...

I-node structure
mode: A. type: file, directory, pipe, symbolic link B. Access: read/write/execute (owner, group,) owner: who own this I-node (file, directory, ...) timestamp: creation, modification, access time size: the number of bytes block count: the number of data blocks direct blocks: pointers to the data single indirect: pointer to a data block which pointers to the data blocks (128 data blocks). Double indirect: (128*128=16384 data blocks) 76 Triple indirect: (128*128*128 data blocks)

Data Block
A data block has 512 bytes. A. Some FS has 1K or 2k bytes per blocks. B. See blocks size effect (next page) A data block may contains data of files or data of a directory. File: a stream of bytes. Directory format:
i-# Next size File name pad

77

home

alex

jenny

john

Report.txt grep

bin find

notes

i-#

Next

10

Report.txt

pad

i-#

Next

bin

pad

i-#

Next

notes

pad

Next

78

home

Boot Block SuperBlock

alex

kc

i-nodes
notes

i-node

... ...

i-node

Report.txt grep

source find

i-node

u area
Current directory inode

In-core inodes
i-node

...

Device driver & Hardware control

... i-node ... notes ...


source

i-node

...
Report.txt

...
i-node

Data Blocks

...
Current Dir
79

In-core inode table


UNIX system keeps regular files and directories on block devices such as disk or tape, Such disk space are called physical device address space. The kernel deals on a logical level with file system (logical device address space) rather than with disks. Disk driver can transfer logical addresses into physical device addresses. In-core (memory resident) inode table stores the inode information in kernel space.
80

In-core inode table


An in-core inode contains A. all the information of inode in disks. B. status of in-core inode inode is locked, inode data changed file data changed. C. the logic device number of the file system. D. inode number E. reference count
81

File table
The kernel have a global data structure, called file table, to store information of file access. Each entry in file table contains: A. a pointer to in-core inode table B. the offset of next read or write in the file C. access rights (r/w) allowed to the opening process. D. reference count.

82

User File Descriptor table


Each process has a user file descriptor table to identify all opened files. An entry in user file descriptor table pointer to an entry of kernels global file table. Entry 0: standard input Entry 1: standard output Entry 2: error output

83

System Call: open


open: A process may open a existing file to read or write syntax: fd = open(pathname, mode); A. pathname is the filename to be opened B. mode: read/write Example

84

#include <stdio.h> #include <sys/types.h> #include <fcntl.h>

main() { int fd1, fd2, fd3; printf("Before open ...\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("./openEx1.c", O_WRONLY); fd3 = open("/etc/passwd", O_RDONLY); printf("fd1=%d fd2=%d fd3=%d \n", fd1, fd2, fd3); }

$ cc openEx1.c -o openEx1 $ openEx1 Before open ... fd1=3 fd2=4 fd3=5 $

85

U area

User file descriptor table


0 1

file table
... CNT=1 R ... CNT=1 W

in-core inodes
CNT=2 /etc/passwd

Pointer to Descriptor table

2 3 4 5 6 7

...
CNT=1 ./openEx2.c

...

. . .

CNT=1 R ...

...
86

System Call: read


read: A process may read an opened file syntax: fd = read(fd, buffer, count); A. fd: file descriptor B. buffer: data to be stored in C. count: the number (count) of byte Example

87

#include <stdio.h> #include <sys/types.h> #include <fcntl.h>

main() { int fd1, fd2, fd3; char buf1[20], buf2[20]; buf1[19]='\0'; buf2[19]='\0'; printf("=======\n"); fd1 = open("/etc/passwd", O_RDONLY); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd1, buf2, 19); printf("fd1=%d buf2=%s \n",fd1, buf2); printf("=======\n"); }

$ cc openEx2.c -o openEx2 $ openEx2 ======= fd1=3 buf1=root:x:0:1:Super-Us fd1=3 buf2=er:/:/sbin/sh daemo ======= $

88

#include <stdio.h> $ cc openEx3.c -o openEx3 #include <sys/types.h> $ openEx3 #include <fcntl.h> ====== main() fd1=3 buf1=root:x:0:1:Super-Us { fd2=4 buf2=root:x:0:1:Super-Us int fd1, fd2, fd3; ====== char buf1[20], buf2[20]; $ buf1[19]='\0'; buf2[19]='\0'; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("/etc/passwd", O_RDONLY); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd2, buf2, 19); printf("fd2=%d buf2=%s \n",fd2, buf2); printf("======\n"); }

89

U area

User file descriptor table


0 1

file table
... CNT=1 R ... ...

in-core inodes
CNT=2 /etc/passwd

Descriptor table

2 3 4 5 6 7

... ... ...


90

...

. . .

CNT=1 R ...

System Call: dup


dup: copy a file descriptor into the first free slot of the user file descriptor table. syntax: newfd = dup(fd); A. fd: file descriptor Example

91

#include <stdio.h> $ cc openEx4.c -o openEx4 #include <sys/types.h> $ openEx4 #include <fcntl.h> ====== main() fd1=3 buf1=root:x:0:1:Super-Us { fd2=4 buf2=er:/:/sbin/sh int fd1, fd2, fd3; daemo char buf1[20], buf2[20]; ====== buf1[19]='\0'; $ buf2[19]='\0'; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = dup(fd1); read(fd1, buf1, 19); printf("fd1=%d buf1=%s \n",fd1, buf1); read(fd2, buf2, 19); printf("fd2=%d buf2=%s \n",fd2, buf2); printf("======\n"); char buf1[20], buf2[20]; }

92

U area

User file descriptor table


0 1

file table
... CNT=2 R ... ...

in-core inodes
CNT=1 /etc/passwd

Descriptor table

2 3 4 5 6 7

... ... ...


93

...

. . .

... ...

System Call: creat


creat: A process may create a new file by creat system call syntax: fd = write(pathname, mode); A. pathname: file name B. mode: read/write Example

94

System Call: close


close: A process may close a file by close system call syntax: close(fd); A. fd: file descriptor Example

95

System Call: write


write: A process may write data to an opened file syntax: fd = write(fd, buffer, count); A. fd: file descriptor B. buffer: data to be stored in C. count: the number (count) of byte Example

96

/* creatEx1.c */ #include <stdio.h> #include <sys/types.h> #include <fcntl.h> main() { int fd1; char *buf1="I am a string\n"; char *buf2="second line\n"; printf("======\n"); fd1 = creat("./testCreat.txt", O_WRONLY); write(fd1, buf1, 20); write(fd1, buf2, 30); printf("fd1=%d buf1=%s \n",fd1, buf1); close(fd1); chmod("./testCreat.txt", 0666); printf("======\n"); }

97

$ cc creatEx1.c -o creatEx1 $ creatEx1 ====== fd1=3 buf1=I am a string ====== $ ls -l testCreat.txt -rw-rw-rw- 1 cheng $ more testCreat.txt ...

staff

50 May 10 20:37 testCreat.txt

98

System Call: stat/fstat


stat/fstat: A process may query the status of a file (locked) file type, file owner, access permission. file size, number of links, inode number, access time. syntax: stat(pathname, statbuffer); fstat(fd, statbuffer); A. pathname: file name B. statbuffer: read in data C. fd: file descriptor Example
99

/* statEx1.c */ #include <sys/stat.h> main() { int fd1, fd2, fd3; struct stat bufStat1, bufStat2; char buf1[20], buf2[20]; printf("======\n"); fd1 = open("/etc/passwd", O_RDONLY); fd2 = open("./statEx1", O_RDONLY); fstat(fd1, &bufStat1); fstat(fd2, &bufStat2); printf("fd1=%d inode no=%d block size=%d blocks=%d\n", fd1, bufStat1.st_ino,bufStat1.st_blksize, bufStat1.st_blocks); printf("fd2=%d inode no=%d block size=%d blocks=%d\n", fd2, bufStat2.st_ino,bufStat2.st_blksize, bufStat2.st_blocks); printf("======\n"); }

100

$ cc statEx1.c -o statEx1 $ statEx1 ====== fd1=3 inode no=21954 block size=8192 blocks=6 fd2=4 inode no=190611 block size=8192 blocks= ====== ...

101

System Call: link/unlink


link: hardlink a file to another syntax: link(sourceFile, targetFile); unlink(file) A. sourceFile targetFile, file: file name Example: Lab exercise: write a c program which use link/unlink system call. Use ls -l to see the reference count.

102

System Call: chdir


chdir: A process may change the current directory of a processl syntax: chdir(pathname); A. pathname: file name Example

103

#include <stdio.h> #include <sys/types.h> #include <fcntl.h> main() { chdir("/usr/bin"); system("ls -l"); }

$ ls -l /usr/bin $

104

` ` ` ` ` `

pipe(int a[]) FILE* popen(char *command, char *mode) pclose(FILE*) mknod(char *, S_IFIFO|0644, 0) mknod filename p mkfifo filename

Signal SIGABRT SIGALRM SIGFPE SIGHUP SIGILL SIGINT SIGKILL SIGPIPE SIGQUIT SIGSEGV SIGTERM SIGUSR1 SIGUSR2 SIGCHLD SIGCONT SIGSTOP SIGTSTP SIGTTIN SIGTTOU SIGBUS SIGPOLL SIGPROF SIGSYS SIGTRAP SIGURG SIGVTALRM SIG CPU SIG FSZ Process abort signal. Alarm clock.

Description

Erroneous arithmetic operation. Hangup. Illegal instruction. Terminal interrupt signal. Kill (cannot be caught or ignored). Write on a pipe with no one to read it. Terminal quit signal. Invalid memory reference. Termination signal. User-defined signal 1. User-defined signal 2. Child process terminated or stopped. Continue executing, if stopped. Stop executing (cannot be caught or ignored). Terminal stop signal. Background process attempting read. Background process attempting write. Bus error. Pollable event. Profiling timer expired. Bad system call. Trace/breakpoint trap. High bandwidth data is available at a socket. Virtual timer expired. CPU time limit exceeded. File size limit exceeded.

Signal number

int signal(int signo, void (*f)(int) );

Handler

#include <stdio.h> /* standard I/O functions */ #include <unistd.h> /* standard unix functions, like getpid() */ #include <sys/types.h> /* various type definitions, like pid_t */ #include <signal.h> /* signal name macros, and the signal() prototype */ /* first, here is the signal handler */ void catch_int(int sig_num) { /* re-set the signal handler again to catch_int, for next time */ signal(SIGINT, catch_int); /* and print the message */ printf("Don't do that"); fflush(stdout); } /* and somewhere later in the code.... */ /* set the INT (Ctrl-C) signal handler to 'catch_int' */ signal(SIGINT, catch_int); /* now, lets get into an infinite loop of doing nothing. */ for ( ;; ) pause(); }

Signal sets
Signal sets are data types (structures) to represent multiple signals. The following functions are used manipulate them.

int sigemptyset(sigset_t *set);


This function initializes the signal set pointed by set variable such that it contains no signals in it.

int sigfillset(segset_t *set);


This function fills the signal set pointed by set variable such that it contains all signals in it.

int sigaddset(segset_t *set,int signo);


This function adds a signal (with signal number signo) to the signal set pointed by set variable.

int sigdelset(segset_t *set,int signo);


This function removes a signal (with signal number signo) from the signal set pointed by set variable.

int issigmember(segset_t *set,int signo);


This function checks a signal (with signal number signo) is in the signal set pointed by set variable or not.

int sigpending(sigset_t *set);


This function returns the set of signals that are blocked from delivery and currently pending to the signal set pointed by set variable.

int sigsuspend(sigset_t *set);


This function sets the signal mask of the process to the signal set pointed by set variable. Also, the process is suspended until a signal is caught or until a signal occurs that terminates the process.

SIG_BLOCK SIG_UNBLOCK SIG_SETMASK

struct sigaction{

void (*sa_handler)(); /*pointer to function or SIG_DFL or SIG_IGN*/ sigset_t sa_mask/ /*additional signal to be blocked during execution of hander*/ int sa_flags; /*special flags and options*/}

#include <stdio.h> #include <sys/types.h> #include <sys/ipc.h> #include <sys/msg.h> int main(int argc, char* argv[]){ /* create a private message queue, with access only to the owner. */ struct msgbuf* msg; struct msgbuf* recv_msg; int rc; int queue_id = msgget(IPC_PRIVATE, 0600); if (queue_id == -1) { perror("main: msgget"); exit(1); } printf("message queue created, queue id '%d'.\n", queue_id); msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world")); msg->mtype = 1; strcpy(msg->mtext, "hello world"); rc = msgsnd(queue_id, msg, strlen(msg->mtext)+1, 0); if (rc == -1) { perror("main: msgsnd"); exit(1); } free(msg); printf("message placed on the queue successfully.\n"); recv_msg = (struct msgbuf*)malloc(sizeof(struct msgbuf)+strlen("hello world")); rc = msgrcv(queue_id, recv_msg, strlen("hello world")+1, 0, 0); if (rc == -1) { perror("main: msgrcv"); exit(1); } printf("msgrcv: received message: mtype '%d'; mtext '%s'\n", recv_msg->mtype, recv_msg>mtext); return 0; }

192.168.19.1

HTTP [80] FTP 198.163.197.4 [x] [21]


192.168.19.2 [21] 192.168.19.2[21] 192.168.19. [21]

Internet

192.168.19.2 [21]

SMTP [25]
192.168.19.2 192.168.19.0 198.163.197.4

Telnet [23]

192.168.19.3
11 5

Telephone Call Professors at CMU


412-268-8000 ext.123 412-268-8000 ext.654

Network Programming Applications/Servers


Web Port 80 Mail Port 25

Extension Telephone No Central Number Exchange Area Code


15-441 Students

Port No. IP Address Network No. Host Number

Clients
11 6

Port numbers are used to identify entities on a host Port numbers can be
x Well-known (port 0-1023) x Dynamic or private (port 1024-65535)

NTP daemon
port 123

Web server
port 80

Servers/daemons usually use wellknown ports


x Any client can identify the server/service x HTTP = 80, FTP = 21, Telnet = 23, ... x /etc/service defines well-known ports

TCP/UDP IP Ethernet Adapter

Clients usually use dynamic ports


x Assigned by the kernel at run time

11 7

Consider Railway Station Counter 0: Platform Tickets Counter 1: Enquiries Counter 2: Reservations -----Counter 8: Current Reservations Counter 9: Cancellations

medellin.cs.columbia.edu (128.59.21.14) newworld.cs.umass.edu (128.119.245.93) cluster.cs.columbia.edu (128.59.21.14, 128.59.16.7, 128.59.16.5, 128.59.16.4)

` `

Each host machine has an IP address When a packet arrives at a host


11 9

FTP FTP user client interfa e

file transfer

FTP server remote file system

user at host
` `

local file system

Transfer file to/from remote host Client/server model Client: side that initiates transfer (either to/from remote) Server: remote host ftp: RFC 959 ftp server: port 21

` `

Ftp client contacts ftp server at port 21, specifying TCP as transport protocol Two parallel TCP connections opened:
Control: exchange commands, responses between client, server. out of band control Data: file data to/from server
TCP control connection port 21

FTP client

TCP data connection port 20

FTP server

Sockets as means for inter-process communication (IPC)

application layer

application layer

Client Process Socket


transportnetwork OS layer (TCP/UDP) network layer (IP)

Internet

Server Process Socket


transport layer (TCP/UDP)

stack

Internet Internet

OS network network layer (IP) stack


physical layer

link layer (e.g. ethernet) physical layer

link layer (e.g. ethernet)

The interface that the OS provides to its networking subsystem

` ` `

Address the machine on the network


By IP address

Address the process


By the port-number

The pair of IP-address + port makes up a socket-address


Client socket address 128.2.194.242:3479 Server socket address 208.216.181.15:80

Client

Connection socket pair (128.2.194.242:3479, 208.216.181.15:80)

Server (port 80)

Client host address 128.2.194.242 Note: 3479 is an ephemeral port allocated by the kernel

Server host address 208.216.181.15 Note: 80 is a well-known port associated with Web servers

Examples of client programs


Web browsers, ftp, telnet, ssh

How does a client find the server?


The IP address in the server socket address identifies the host The (well-known) port in the server socket address identifies the service, and thus implicitly identifies the server process that performs that service. Examples of well known ports
x x x x Port 7: Echo server Port 23: Telnet server Port 25: Mail server Port 80: Web server

Server host 128.2.194.242 Client host Service request for 128.2.194.242:80 (i.e., the Web server) Web server (port 80) Kernel Echo server (port 7)

Client

Client

Service request for 128.2.194.242:7 (i.e., the echo server)

Web server (port 80) Kernel Echo server (port 7)

Servers are long-running processes (daemons).


Created at boot-time (typically) by the init process (process 1) Run continuously until the machine is turned off.

Each server waits for requests to arrive on a wellknown port associated with a particular service.
Port 7: echo server Port 23: telnet server Port 25: mail server See /etc/services for a Port 80: HTTP server comprehensive list of the services available on a Other applications should choose between 1024 and 65535
Linux machine.

What is a socket?
To the kernel, a socket is an endpoint of communication. To an application, a socket is a file descriptor that lets the application read/write from/to the network.
x Remember: All Unix I/O devices, including networks, are modeled as files.

Clients and servers communicate with each by reading from and writing to socket descriptors. The main distinction between regular file I/O and socket I/O is how the application opens the socket descriptors.

Endpoint Address
Generic Endpoint Address x The socket abstraction accommodates many protocol families. x It supports many address families. x It defines the following generic endpoint address: x ( address family, endpoint address in that family ) x Data type for generic endpoint address:

TCP/IP Endpoint Address x For TCP/IP, an endpoint address is composed of the following items: x Address family is AF_INET (Address Family for InterNET). x Endpoint address in that family is composed of an IP address and a port number.

12 8

x The IP address identifies a particular computer, while the port number identifies a particular application running on that computer. x The TCP/IP endpoint address is a special instance of the generic one:

x Port Number x A port number identifies an application running on a computer. x When a client program is executed, WinSock randomly chooses an unused port number for it. x Each server program must have a pre-specified port number, so that the client can contact the server.

12 9

x The port number is composed of 16 bits, and its possible values are used in the following manner: x 0 - 1023: For well-known server applications. x 1024 - 49151: For user-defined server applications (typical range to be used is 1024 - 5000). x 49152 - 65535: For client programs. x Port numbers for some well-known server applications: x WWW server using TCP: 80 x Telnet server using TCP: 23 x SMTP (email) server using TCP: 25 x SNMP server using UDP: 161.

13 0

Unix File Descriptor Table


Descriptor Table

0 1 2 3 4

Standard input Standard output Standard error Data structure for file 0 Data structure for file 1 Data structure for file 2 131

Socket Descriptor Data Structure


Descriptor Table

0 1 2 3 4

Family: PF_INET Service: SOCK_STREAM Local IP: 111.22.3.4 Remote IP: 123.45.6.78 Local Port: 2249 Remote Port: 3726

132

Hierarchical vs. flat


Wisconsin / Madison / UW-Campus / Aditya vs. Aditya:123-45-6789 Ethernet addresses are flat

What information would routers need to route to Ethernet addresses?


Hierarchical structure crucial for designing scalable binding from interface name to route Route to a general area, then to a specific location

What type of Hierarchy?


How many levels? Same hierarchy depth for everyone?

Address broken in segments of increasing specificity


Uniform for everybody: needs centralized management Non-uniform: more flexible, needs careful decentralized management

13 3

` ` `

Fixed length: 32 bits Total IP address size: 4 billion Initial class-ful structure (1981)
Class A: 128 networks, 16M hosts Class B: 16K networks, 64K hosts Class C: 2M networks, 256 hosts

134

Network ID 8 16

Host ID 24 32

Class A 0 Network ID Class B 10 Class C 110 Class D 1110 Class E 1111

Host ID

Multicast Addresses Reserved for experiments

13 5

Address would specify prefix for forwarding table


Simple lookup

www.cmu.edu address 128.2.11.43


Class B address class + network is 128.2 Lookup 128.2 in forwarding table Prefix part of address that really matters for routing

Forwarding table contains


List of class+network entries A few fixed prefix lengths (8/16/24)

Large tables
2 Million class C networks

13 6

Original goal: network part would uniquely identify a single physical network Inefficient address space usage
Class A & B networks too big Each physical network must have one network number
x Also, very few LANs have close to 64K hosts x Easy for networks to (claim to) outgrow class-C

` `

Routing table size is too high Need simple way to reduce the number of network numbers assigned
Subnetting: Split up single network address ranges Fizes routing table size problem, partially

137

` `

Add another floating layer to hierarchy Variable length subnet masks


Could subnet a class B into several chunks

Network Network

Host Subnet Host Subnet Mask


13 8

111111111111111111111111 00000000

Assume an organization was assigned address 150.100 (class B) Assume < 100 hosts per subnet (department) How many host bits do we need?
Seven

` `

What is the network mask?


11111111 11111111 11111111 10000000 255.255.255.128

13 9

Host configured with IP adress and subnet mask Subnet number = IP (AND) Mask (Subnet number, subnet mask) Outgoing I/F
D = destination IP address For each forwarding table entry (SN, SM D1 = SM & D if (D1 == SN) Deliver on OI Else Forward to default router
14 0

OI)

Address space depletion


In danger of running out of classes A and B Why?
x Class C too small for most domains x Very few class A very careful about giving them out x Class B poses greatest problem

Class B sparsely populated


x But people refuse to give it back

14 1

Allows arbitrary split between network & host part of address


Do not use classes to determine network ID Use common part of address as network number Allows handing out arbitrary sized chunks of address space E.g., addresses 192.4.16 - 192.4.31 have the first 20 bits in common. Thus, we use these 20 bits as the network number 192.4.16/20

Enables more efficient usage of address space (and router tables)


Use single entry for range in forwarding tables Combine forwarding entries when possible

14 2

Network is allocated 8 contiguous chunks of 256-host addresses 200.10.0.0 to 200.10.7.255


Allocation uses 3 bits of class C space Remaining 20 bits are network number, written as 201.10.0.0/21

Replaces 8 class C routing entries with 1 combined entry


Routing protocols carry prefix with destination network address

14 3

Network (network portion): ` Get allocated portion of ISPs address space:


ISP's block Organization 0 Organization 1 Organization 2 ... Organization 7 11001000 00010111 00010000 00000000 11001000 00010111 00010000 00000000 11001000 00010111 00010010 00000000 11001000 00010111 00010100 00000000 .. . 11001000 00010111 00011110 00000000 200.23.16.0/20 200.23.16.0/23 200.23.18.0/23 200.23.20.0/23 . 200.23.30.0/23

14 4

How does an ISP get block of addresses?


From Regional Internet Registries (RIRs)
x ARIN (North America, Southern Africa), APNIC (Asia-Pacific), RIPE (Europe, Northern Africa), LACNIC (South America)

How about a single host?


dynamically get address: plug-and-play x Host broadcasts DHCP discover msg x DHCP server responds with DHCP offer msg x Host requests IP address: DHCP request msg x DHCP server sends address: DHCP ack msg

Hard-coded by system admin in a file DHCP: Dynamic Host Configuration Protocol:

14 5

Provider is given 201.10.0.0/21

Provider

201.10.0.0/22

201.10.4.0/24

201.10.5.0/24

201.10.6.0/23

CIDR implications: Longest prefix match Route aggregation


14 6

Packet R Sender
1

R
3 1

R1

R2
4 2

3 R

4 R 3

R
3

R3

4 R 3

Receiver

14 7

Packet
R1, R2, R3, R

R2, R3, R 2 2 3 1

Sender

R1
4

R2
4 2

R3, R
3

R3
4

Receiver R

14 8

Network picks a path Assigns VC numbers for flow on each link Populates forwarding table Packet Sender 5
5 1

7
7 3 1 2

R1
4

R2
4 2

1,7

4,2

1,5

3,7

2
6 3

1 2,2

R3
4

Receiver

3,6

14 9

128.2.198.222 host LAN 1 router 128.2.254.36 WAN host ... host Destination = 128.2.198.222

Routing Gets Packet to Correct Local Network


Based on IP address Router sees that destination address is of local machine

Still Need to Get Packet to Host


Using link-layer protocol Need to know hardware address

Same Issue for Any Local Communication


Find local machine, given its IP address
15 0


op Sender MAC address Sender IP Address Target MAC address Target IP Address

op: Operation
1: request 2: reply

Sender
Host sending ARP message

Target
Intended receiver of message

Diagrammed for Ethernet (6-byte MAC addresses)


`

Low-Level Protocol
Operates only within local network Determines mapping from IP address to hardware (MAC) address Mapping determined dynamically
x No need to statically configure tables x Only requirement is that each host know its own IP address
15 1

op Sender MAC address Sender IP Address Target MAC address Target IP Address

op: Operation
1: request

Sender
Host that wants to determine MAC address of another machine

Target
Other machine

Requestor
x Why include its MAC address?

Fills in own IP and MAC address as sender


` `

Mapping
Fills desired host IP address in target IP address

Sending
Send to MAC address ff:ff:ff:ff:ff:ff
x Ethernet broadcast
15 2

op Sender MAC address Sender IP Address Target MAC address Target IP Address

op: Operation
2: reply

Sender
Host with desired IP address

Target
Original requestor

Responder becomes sender


Fill in own IP and MAC address Set requestor as target Send to requestors MAC address

15 3

Destination Gateway Genmask Iface 128.2.209.100 0.0.0.0 255.255.255.255 eth0 128.2.0.0 0.0.0.0 255.255.0.0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 lo 0.0.0.0 128.2.254.36 0.0.0.0 eth0
` ` ` ` `

Host 128.2.209.100 when plugged into CS ethernet Dest 128.2.209.100 routing to same machine Dest 128.2.0.0 other hosts on same ethernet Dest 127.0.0.0 special loopback address Dest 0.0.0.0 default route to rest of Internet
Main CS router: gigrouter.net.cs.cmu.edu (128.2.254.36)

15 4

IP address, netmask, gateway, hostname, etc., etc.


Type by hand!!!

IPv4 option 1: RARP (Reverse ARP)


Data-link protocol
x Uses ARP format. New opcodes: Request reverse, reply reverse

Send query: Request-reverse [ether addr], server responds with IP x Used primarily by diskless nodes, when they first initialize, to find their Internet address
`

IPv4 option 2: DHCP


Dynamic Host Configuration Protocol ARP is fine for assigning an IP, but is very limited DHCP can provide all the info necessary

DHCPDISCOVER - broadcast

DHCPOFFER DHCPREQUEST DHCPACK


`

DHCPOFFER

IP addressing information Boot file/server information (for network booting) DNS name servers Lots of other stuff - protocol is extensible; half of the options reserved for local site definition and use.

Lease-based assignment
Clients can renew: Servers really should preserve this information across client & server reboots.

Provide host configuration information


Not just IP address stuff. NTP servers, IP config, link layer config,

Use:
Generic config for desktops/dial-in/etc.
x Assign IP address/etc., from pool

Specific config for particular machines


x Central configuration management

Goal: allow host to dynamically obtain its IP address from network server when it joins network
Can renew its lease on address in use Allows reuse of addresses (only hold address while connected an on) Support for mobile users who want to join network (more shortly)

DHCP overview: host broadcasts DHCP discover msg [optional] DHCP server responds with DHCP offer msg [optional] host requests IP address: DHCP request msg DHCP server sends address: DHCP ack msg

Network Layer

415 9

223.1.1.1 223.1.1.2 223.1.1.4

DHCP server
223.1.2.9

223.1.2.1

B
223.1.1.3 223.1.3.1 223.1.3.27

223.1.2.2 223.1.3.2

arriving DHCP client needs address in this network

Network Layer

416 0

DHCP server: 223.1.2.5

DHCP discover src : 0.0.0.0, 68 dest.: 255.255.255.255,67 yiaddr: 0.0.0.0 transaction ID: 654 DHCP offer src: 223.1.2.5, 67 dest: 255.255.255.255, 68 yiaddrr: 223.1.2.4 transaction ID: 654 Lifetime: 3600 secs

arriving client

DHCP request src: 0.0.0.0, 68 dest:: 255.255.255.255, 67 yiaddrr: 223.1.2.4 transaction ID: 655 Lifetime: 3600 secs DHCP ACK src: 223.1.2.5, 67 dest: 255.255.255.255, 68 yiaddrr: 223.1.2.4 transaction ID: 655 Lifetime: 3600 secs
416 1

time

Network Layer

DHCP: more than IP address


DHCP can return more than just allocated IP address on subnet:
 address

of first-hop router for client  name and IP address of DNS sever  network mask (indicating network versus host portion of address)

DHCP: example
DHCP DHCP DHCP DHCP

DHCP UDP IP Eth Phy


DHCP

 connecting laptop needs its

IP address, addr of firsthop router, addr of DNS server: use DHCP


 DHCP request encapsulated

DHCP DHCP DHCP DHCP

DHCP UDP IP Eth Phy

168.1.1.1

in UDP, encapsulated in IP, encapsulated in 802.1 Ethernet

router (runs DHCP)

 Ethernet frame broadcast

(dest: FFFFFFFFFFFF) on LAN, received at router running DHCP server

 Ethernet demuxed to IP

demuxed, UDP demuxed to DHCP

DHCP: example
DHCP DHCP DHCP DHCP

DHCP UDP IP Eth Phy

 DCP server formulates

DHCP ACK containing clients IP address, IP address of first-hop router for client, name & IP address of DNS server

 encapsulation of DHCP
DHCP DHCP DHCP DHCP DHCP

DHCP UDP IP Eth Phy

router (runs DHCP)

server, frame forwarded to client, demuxing up to DHCP at client  client now knows its IP address, name and IP address of DSN server, IP address of its first-hop router

DHCP: wireshark output (home LAN)


Message type: Boot Request (1) Hardware type: Ethernet Hardware address length: 6 Hops: 0 Transaction ID: 0x6b3a11b7 Seconds elapsed: 0 Bootp flags: 0x0000 (Unicast) Client IP address: 0.0.0.0 (0.0.0.0) Your (client) IP address: 0.0.0.0 (0.0.0.0) Next server IP address: 0.0.0.0 (0.0.0.0) Relay agent IP address: 0.0.0.0 (0.0.0.0) Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a) Server host name not given Boot file name not given Magic cookie: (OK) Option: (t=53,l=1) DHCP Message Type = DHCP Request Option: (61) Client identifier Length: 7; Value: 010016D323688A; Hardware type: Ethernet Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a) Option: (t=50,l=4) Requested IP Address = 192.168.1.101 Option: (t=12,l=5) Host Name = "nomad" Option: (55) Parameter Request List Length: 11; Value: 010F03062C2E2F1F21F92B 1 = Subnet Mask; 15 = Domain Name 3 = Router; 6 = Domain Name Server 44 = NetBIOS over TCP/IP Name Server

request

Message type: Boot Reply (2) Hardware type: Ethernet Hardware address length: 6 Hops: 0 Transaction ID: 0x6b3a11b7 Seconds elapsed: 0 Bootp flags: 0x0000 (Unicast) Client IP address: 192.168.1.101 (192.168.1.101) Your (client) IP address: 0.0.0.0 (0.0.0.0) Next server IP address: 192.168.1.1 (192.168.1.1) Relay agent IP address: 0.0.0.0 (0.0.0.0) Client MAC address: Wistron_23:68:8a (00:16:d3:23:68:8a) Server host name not given Boot file name not given Magic cookie: (OK) Option: (t=53,l=1) DHCP Message Type = DHCP ACK Option: (t=54,l=4) Server Identifier = 192.168.1.1 Option: (t=1,l=4) Subnet Mask = 255.255.255.0 Option: (t=3,l=4) Router = 192.168.1.1 Option: (6) Domain Name Server Length: 12; Value: 445747E2445749F244574092; IP Address: 68.87.71.226; IP Address: 68.87.73.242; IP Address: 68.87.64.146 Option: (t=15,l=20) Domain Name = "hsd1.ma.comcast.net."

reply

Serverless (Stateless). No manual config at all.


Only configures addressing items, NOT other host things
x Use DHCP for such things

Link-local address
1111 1110 10 :: 64 bit interface ID (usually from Ethernet addr)
x (fe80::/64 prefix)

Uniqueness test (anyone using this address?) Router contact (solicit, or wait for announcement)
x Contains globally unique prefix x Usually: Concatenate this prefix with local ID -> globally unique IPv6 ID

` DNS ` DNS

Design Today

16 7

` ` `

Need naming to identify resources Once identified, resource must be located How to name resource?
Naming hierarchy

How do we efficiently locate resources?


DNS: name location (IP address)

Challenge: How do we scale these to the wide area?


16 8

Lookup a Central DNS? ` Single point of failure ` Traffic volume ` Distant centralized database ` Single point of update ` Doesnt scale!

16 9

Why not use /etc/hosts? ` Original Name to Address Mapping


Flat namespace Lookup mapping in /etc/hosts Downloaded regularly
`

Count of hosts was increasing: machine per domain machine per user
Many more downloads Many more updates

17 0

Basically a wide-area distributed database of name to IP mappings Goals:


Scalability Decentralized maintenance Robustness Global scope
x Names mean the same thing everywhere

Dont need
x Atomicity x Strong consistency

17 1

Conceptually, programmers can view the DNS database as a collection of millions of host entry structures:
/* DNS host entry structure */ struct hostent { char *h_name; char **h_aliases; int h_addrtype; int h_length; char **h_addr_list; }; /* official domain name of host */ /* null-terminated array of domain names */ /* host address type (AF_INET) */ /* length of an address, in bytes */ /* null-terminated array of in_addr structs */

in_addr is a struct consisting of 4-byte IP address gethostbyname: query key is a DNS host name. gethostbyaddr: query key is an IP address.

Functions for retrieving host entries from DNS:

17 2

Identification
12 bytes

Flags No. of Answer RRs No. of Additional RRs

No. of Questions No. of Authority RRs

Name, type fields for a query RRs in response to query Records for authoritative servers Additional helpful info that may be used

Questions (variable number of answers) Answers (variable number of resource records) Authority (variable number of resource records) Additional Info (variable number of resource records)
17 3

Identification
Used to match up request/response

Flags
1-bit to mark query or response 1-bit to mark authoritative or not 1-bit to request recursive resolution 1-bit to indicate support for recursive resolution

17 4

DB contains tuples called resource records (RRs)


Classes = Internet (IN), Chaosnet (CH), etc. Each class defines value associated with type

RR format: (class,

name, value, type, ttl)

FOR IN class:
`

Type=A
name is hostname value is IP address

Type=CNAME
name is an alias name for some canonical (the real) name value is canonical name

Type=NS
name is domain (e.g. foo.com) value is name of authoritative name server for this domain

Type=MX
value is hostname of mailserver associated with name
17 5

Different kinds of mappings are possible:


Simple case: 1-1 mapping between domain name and IP addr:
x kittyhawk.cmcl.cs.cmu.edu maps to 128.2.194.242

Multiple domain names maps to the same IP address:


x eecs.mit.edu and cs.mit.edu both map to 18.62.1.6

Single domain name maps to multiple IP addresses:


x aol.com and www.aol.com map to multiple IP addrs.

Some valid domain names dont map to any IP address:


x for example: cs.wisc.edu

17 6

root (.) org net gwu ucb edu com uk

Each node in hierarchy stores a list of names that end with same suffix Suffix = path up tree E.g., given this tree, where would following be stored: Fred.com Fred.edu Fred.wisc.edu Fred.cs.wisc.edu Fred.cs.cmu.edu
17 7

wisc cmu mit cs wail ee

Zone = contiguous section of name space


root org net gwu ucb edu cmu cs cmcl ece com uk ca E.g., Complete tree, single node or subtree

A zone has an associated set of name servers


Must store list of names and tree links

bu mit

Subtree Single node Complete Tree


17 8

Zones are created by convincing owner node to create/delegate a subzone


Records within zone store multiple redundant name servers Primary/master name server updated manually Secondary/redundant servers updated by zone transfer of name space
x Zone transfer is a bulk transfer of the configuration of a DNS server uses TCP to ensure reliability

Example:
CS.WISC.EDU created by WISC.EDU administrators Who creates WISC.EDU or .EDU?

17 9

Responsible for root zone Approx. 13 root name servers worldwide


Currently {a-m}.rootservers.net

Local name servers contact root servers when they cannot resolve a name
Configured with wellknown root servers

18 0

Each host has a resolver


Typically a library that applications can link to Resolves contacts name server Local name servers hand-configured (e.g. /etc/resolv.conf)

Name servers
Either responsible for some zone or Local servers
x Do lookup of distant host names for local hosts x Typically answer queries about local zone
18 1

Steps for resolving www.wisc.edu


Application calls gethostbyname() (RESOLVER) Resolver contacts local name server (S1) S1 queries root server (S2) for (www.wisc.edu) S2 returns NS record for wisc.edu (S3) What about A record for S3?
x This is what the additional information section is for (PREFETCHING)

S1 queries S3 for www.wisc.edu S3 returns A record for www.wisc.edu


`

Can return multiple A records

what does this mean?

18 2

Recursive query:
`

Server goes out and searches for more info (recursive) Only returns final answer or not found

root name server 2 iterated query 3 4 7 local name server


dns.eurecom.fr intermediate name server dns.umass.edu

Iterative query:
`

Server responds with as much as it knows (iterative) I dont know this name, but ask this server

8 Workload impact on choice? ` Local server typically does recursive ` Root/distant server does iterative requesting host
surf.eurecom.fr

6authoritative name
server dns.cs.umass.edu

gaia.cs.umass.edu
18 3

Are all servers/names likely to be equally popular?


Why might this be a problem? How can we solve this problem?

DNS responses are cached


Quick response for repeated translations Other queries may reuse some parts of lookup
x NS records for domains

DNS negative queries are cached


Dont have to repeat past mistakes E.g. misspellings, search strings in resolv.conf

Cached data periodically times out


Lifetime (TTL) of data controlled by owner of data TTL passed with every record

18 4

www.cs.wisc.edu

root & edu DNS server

Client resolver

Local DNS server

ns1.wisc.edu DNS server ns1.cs.wisc.edu DNS server

18 5

ftp.cs.wisc. du

root & du DNS s rv r

Cli

Local DNS s rv r

wisc. du DNS s rv r cs.wisc. du DNS s rv r

18 6

DNS servers are replicated


Name service available if one replica is up Queries can be load balanced between replicas

UDP used for queries


Need reliability must implement this on top of UDP! Why not just use TCP?

Try alternate servers on timeout


Exponential backoff when retrying same server

Same identifier for all queries


Dont care which server responds

18 7

unnamed root

Task
Given IP address, find its name When is this needed?

arpa

edu
`

Method
Maintain separate hierarchy based on IP names Write 128.2.194.242 as 242.194.2.128.in-addr.arpa
x Why is the address reversed?

in-addr

cmu

128

cs

Managing
Authority manages IP addresses assigned to it E.g., CMU manages name space 2.128.in-addr.arpa

cmcl
194

242

kittyhawk
128.2.194.242
18 8

Name servers can add additional data to response Typically used for prefetching
CNAME/MX/NS typically point to another host name Responses include address of host referred to in additional section

18 9

Generic Top Level Domains (gTLD) = .com, .net, .org, etc Country Code Top Level Domain (ccTLD) = .us, .ca, .fi, .uk, etc Root server ({a-m}.root-servers.net) also used to cover gTLD domains
Load on root servers was growing quickly! Moving .com, .net, .org off root servers was clearly necessary to reduce load done Aug 2000

19 0

` ` ` ` ` ` ` `

.info general info .biz businesses .aero air-transport industry .coop business cooperatives .name individuals .pro accountants, lawyers, and physicians .museum museums Only new one actives so far = .info, .biz, .name

19 1

No centralized caching per site


Each machine runs own caching local server Why is this a problem? How many hosts do we need to share cache? hosts recent studies suggest 10-20

Hit rate for DNS = 80%


Is this good or bad?

1 - (#DNS/#connections)

Most Internet traffic is Web


What does a typical page look like? average of 4-5 imbedded objects needs 4-5 transfers This alone accounts for 80% hit rate!

` `

Lower TTLs for A records does not affect performance DNS performance really relies more on NS-record caching

19 2

Goal: learn how to build client/server application that communicate using sockets

Socket
`

I
SD4. I ,

socket

i troduced i

` `

explicitl created, used, released apps client/ser er paradi t o t pes of transport ser ice ia socket I: unreliable datagram reliable, byte stream-oriented

a host-local, application-created, OS-controlled interface (a door) into which application process can both send and receive messages to/from another application process

19 4

Server and Client exchange messages over the network through a common Socket API

Clients Server ports


user space

TCP/UDP IP Ethernet Adapter

Socket API

TCP/UDP IP Ethernet Adapter

kernel space

hardware
19 5

Socket: a door between application process and endend-transport protocol (UDP or TCP) TCP service: reliable transfer of bytes from one process to another

controlled by application developer controlled by operating system

process socket TCP with buffers, variables

process socket TCP with buffers, variables

controlled by application developer controlled by operating system

internet

host or server

host or server
19 6

Client must contact server ` server process must first be running ` server must have created socket (door) that welcomes clients contact Client contacts server by: ` creating client-local TCP socket ` specifying IP address, port number of server process ` When client creates socket: client TCP establishes connection to server TCP

When contacted by client, server TCP creates new socket for server process to communicate with client allows server to talk with multiple clients source port numbers used to distinguish clients (more in Chap 3)

application viewpoint TCP provides reliable, in-order transfer of bytes (pipe) between client and server
19 7

A stream is a sequence of characters that flow into or out of a process. An input stream is attached to some input source for the process, eg, keyboard or socket. An output stream is attached to an output source, eg, monitor or socket.

19 8

Example client-server app:


1) client reads line from standard input (inFromUser stream) , sends to server via socket (outToServer stream) 2) server reads line from socket 3) server converts line to uppercase, sends back to client 4) client reads, prints modified line from socket (inFromServer stream)

ke board

monitor

input stream

lient rocess process

in rom ser

output stream

in rom erver

out o erver

input stream

client ocket client socket


socket to net ork from net ork 19 9

Server (running on hostid)


create socket, port=x, for incoming request: welcomeSocket = ServerSocket() wait for incoming connection request connection setup connectionSocket = welcomeSocket.accept() read request from connectionSocket write reply to connectionSocket close connectionSocket

Client

TCP

create socket, connect to hostid, port=x clientSocket = Socket() send request using clientSocket

read reply from clientSocket close clientSocket


20 0

UDP: no connection between client and server ` no handshaking ` sender explicitly attaches IP address and port of destination to each packet ` server must extract IP address, port of sender from received packet UDP: transmitted data may be received out of order, or lost

application viewpoint UDP provides unreliable transfer of groups of bytes (datagrams) between client and server

20 1

Server (running on hostid)


create socket, port=x, for incoming request: serverSocket = DatagramSocket()

Client
create socket, clientSocket = DatagramSocket()

read request from serverSocket write reply to serverSocket specifying client host address, port number

Create, address (hostid, port=x, send datagram request using clientSocket

read reply from clientSocket close clientSocket

20 2

C ient process
Output: sends

Input: receives

packet (TC sent te stream) client UDP socket

packet (TC received te stream)

20 3

This contains the protocol specific addressing information that is passed from the user process to the kernel and vice versa Each of the protocols supported by a socket implementation have their own socket address structure sockaddr_suffix
Where suffix represents the protocol family Ex: sockaddr_in Internet/IPv4 socket address structure sockaddr_ipx IPX socket address structure

The generic socket address structure sockaddr { address family protocol specific data }; The internet/IPv4 socked address structure sockaddr_in { in_family Internet address family sin_port Transport layer Port Number in_addr sin_addr IP address; sin_zero[8] Padding ; };

x int8_t signed 8-bit integer - <sys/types.h> x uint8_t unsigned 8-bit integer - <sys/types.h> x int16_t signed 16-bit integer - <sys/types.h> x uint16_t unsigned 16-bit integer - <sys/types.h> x int32_t signed 32-bit integer - <sys/types.h> x uint32_t unsigned 32-bit integer - <sys/types.h> x sa_family_t address family of - <sys/socket.h> x socklen_t length of socket address structure -<sys/socket.h> x in_addr_t IPv4 address, normally uint32_t <netinet/in.h> x in_port_t TCP/UDP port, normally uint16_t <netinet/in.h>

Byte ordering
Network byte order Host byte order htons(l), ntohs(l)

Memory content initialization


memset(buffer,value,buffersize)

Data copying and comparison


memcpy(dest,src,num_of_bytes) memcmp(buffer1,buffer2,num_of_bytes)

IP address notation conversion


Integer notation Dotted decimal notation x status inet_aton(ddstring_pointer,address_pointer)
x Returns 1 on success 0 on error

x ddstring_pointer inet_ntoa(address_pointer) x address_pointer inet_addr(ddstring_pointer) *deprecated

sockfd socket(domain, type, protocol) domain is the protocol/address family AF_INET,AF_IPX.. type is the the type of service SOCK_DGRAM,SOCK_STREAM protocol is the specific protocol that is supported by the protocol family specified(as param1) Returns a fresh socket descriptor on success, 1 on error status close(sockfd) Flushes(supposed to) the pending I/O to disk Returns 1 on error

struct sockaddr_in { Above sin_family; error unsigned* short calls return 1 on/* address family (always AF_INET) */ unsigned short sin_port; /* port num in network byte order */ struct in_addr sin_addr; /* IP addr in network byte order */ unsigned char sin_zero[8]; /* pad to sizeof(struct sockaddr) */ };

status bind(sockfd,ptr_to_sockaddr,sockaddr_size) Associates the sockaddr with sockfd The rules for successful binding depend on the protocol family of the socket(specified during call to socket) Necessary for receiving connections on STREAM socket status listen(sockfd,backlog) Notifies the willingness to accept connections backlog Maximum number of established connections yet to be notified to their respective user processes(calls to accepts) On unbounded sockets an implicit bind is done with IN_ADDRANY and a random port as the address and port parameters respectively

connfd accept(sockfd,ptr_to_sockaddr,ptr_to_sockaddr_size) Blocks till a connection gets established on sockfd and returns a new file descriptor on which I/O can be performed with the remote entity Fills the sockaddr and size parameters with the address information (and its size respectively) of the connecting entity bind and listen are assumed to have been called on sockfd prior to calling accept status connect(sockfd, ptr_to_sockaddr, sockaddr_size) Initiates a new connection with the entity addressed by sockaddr in case of a STREAM socket Sets the default remote address for I/O in case of DGRAM socket
* Above calls return 1 on error

SEND: int send(int sockfd, const void *msg, int len, int flags);
msg: message you want to send len: length of the message flags := 0 returned: the number of bytes actually sent

RECEIVE: int recv(int sockfd, void *buf, int len, unsigned int flags);
buf: buffer to receive the message len: length of the buffer (dont give me more!) flags := 0 returned: the number of bytes received

SEND (DGRAM-style): int sendto(int sockfd, const void *msg, int len, int flags, const struct sockaddr *to, int tolen);
msg: message you want to send len: length of the message flags := 0 to: socket address of the remote process tolen: = sizeof(struct sockaddr) returned: the number of bytes actually sent

RECEIVE (DGRAM-style): int recvfrom(int sockfd, void *buf, int len, unsigned int flags, struct sockaddr *from, int *fromlen);
buf: buffer to receive the message len: length of the buffer (dont give me more!) from: socket address of the process that sent the data fromlen:= sizeof(struct sockaddr) flags := 0 returned: the number of bytes received

CLOSE: close (socketfd);

BIND SOCKET LISTEN CONNECT TCP three-way handshake ACCEPT

SEND

RECEIVE

SEND RECEIVE CLOSE

Concurrent server

CREATE BIND

SEND RECEIVE SEND CLOSE

TCP Server
`
Web Server
`

For example: web server


What does a web server need to do so that a web client can connect to it?

ort 80

Ethernet Adapter

21 7

Socket I/O: socket()


`

Since web traffic uses TCP, the web server must create a socket of type SOCK_STREAM int fd; /* socket descriptor */

if((fd = socket(AF_INET, SOCK_STREAM, 0)) < 0) { perror(socket); exit(1); } socket returns an integer (socket descriptor) fd < 0 indicates that an error occurred AF_INET associates a socket with the Internet protocol family SOCK_STREAM selects the TCP protocol

Socket I/O: bind()


`

A socket can be bound to a port


int fd; struct sockaddr_in srv; /* create the socket */ srv.sin_family = AF_INET; /* use the Internet addr family */ srv.sin_port = htons(80); /* bind socket fd to port 80*/ /* bind: a client may connect to any of my addresses */ srv.sin_addr.s_addr = htonl(INADDR_ANY); if(bind(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) { perror("bind"); exit(1); } /* socket descriptor */ /* used by bind() */

Still not quite ready to communicate with a client...

21 9

listen indicates that the server will accept a connection


int fd; struct sockaddr_in srv; /* socket descriptor */ /* used by bind() */

/* 1) create the socket */ /* 2) bind the socket to a port */ if(listen(fd, 5) < 0) { perror(listen); exit(1); }

Still not quite ready to communicate with a client...

22 0

accept blocks waiting for a connection


int fd; /* socket descriptor */ struct sockaddr_in srv; /* used by bind() */ struct sockaddr_in cli; /* used by accept() */ int newfd; /* returned by accept() */ int cli_len = sizeof(cli); /* used by accept() */ /* 1) create the socket */ /* 2) bind the socket to a port */ /* 3) listen on the socket */ newfd = accept(fd, (struct sockaddr*) &cli, &cli_len); if(newfd < 0) { perror("accept"); exit(1); }

accept returns a new socket (newfd) with the same properties as the original socket (fd) newfd < 0 indicates that an error occurred 22
2

struct sockaddr_in cli; int newfd; int cli_len = sizeof(cli);

/* used by accept() */ /* returned by accept() */ /* used by accept() */

newfd = accept(fd, (struct sockaddr*) &cli, &cli_len); if(newfd < 0) { perror("accept"); exit(1); }

How does the server know which client it is?


cli.sin_addr.s_addr contains the clients IP address cli.sin_port contains the clients port number

Now the server can exchange data with the client by using read and write on the descriptor newfd. Why does accept need to return a new descriptor?

22 3

` `

read can be used with a socket read blocks waiting for data from the client but does not guarantee that sizeof(buf) is read
int fd; char buf[512]; int nbytes; /* /* /* /* 1) 2) 3) 4) /* socket descriptor */ /* used by read() */ /* used by read() */

create the socket */ bind the socket to a port */ listen on the socket */ accept the incoming connection */

if((nbytes = read(newfd, buf, sizeof(buf))) < 0) { perror(read); exit(1); }


22 4

For example: web client


2 Web Clients

How does a web client connect to a web server?

TCP

IP

Ethernet Adapter
22 5

IP Addresses are commonly written as strings (128.2.35.50), but programs deal with IP addresses as integers. Converting strings to numerical address:
struct sockaddr_in srv; srv.sin_addr.s_addr = inet_addr(128.2.35.50); if(srv.sin_addr.s_addr == (in_addr_t) -1) { fprintf(stderr, "inet_addr failed!\n"); exit(1); }

Converting a numerical address to a string:


struct sockaddr_in srv; char *t = inet_ntoa(srv.sin_addr); if(t == 0) { fprintf(stderr, inet_ntoa failed!\n); exit(1); }

22 6

` `

Gethostbyname provides interface to DNS Additional useful calls


Gethostbyaddr returns hostent given sockaddr_in Getservbyname
x Used to get service description (typically port number) x Returns servent based on name

#include <netdb.h> struct hostent *hp; /*ptr to host info for remote*/ struct sockaddr_in peeraddr; char *name = www.cs.cmu.edu; peeraddr.sin_family = AF_INET; hp = gethostbyname(name) peeraddr.sin_addr.s_addr = ((struct in_addr*)(hp->h_addr))->s_addr;
22 7

connect allows a client to connect to a server...


int fd; struct sockaddr_in srv; /* create the socket */ /* connect: use the Internet address family */ srv.sin_family = AF_INET; /* connect: socket fd to port 80 */ srv.sin_port = htons(80); /* connect: connect to IP Address 128.2.35.50 */ srv.sin_addr.s_addr = inet_addr(128.2.35.50); if(connect(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) { perror(connect"); exit(1); }
22 8

/* socket descriptor */ /* used by connect() */

write can be used with a socket


int fd; struct sockaddr_in srv; char buf[512]; int nbytes; /* /* /* /* socket descriptor */ used by connect() */ used by write() */ used by write() */

/* 1) create the socket */ /* 2) connect() to the server */ /* Example: A client could write a request to a server */ if((nbytes = write(fd, buf, sizeof(buf))) < 0) { perror(write); exit(1); }
22 9

TCP Server
socket() bind()

TCP Client
socket() connect() write() connection establishment data request data reply read() close() end-of-file notification

listen() accept()

read() write() read() close()


23 0

Example: C client (TCP)


/* client.c */ void main(int argc, char *argv[]) { struct sockaddr_in sad; /* structure to hold an IP address */ int clientSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */ char Sentence[128]; char modifiedSentence[128]; host = argv[1]; port = atoi(argv[2]); clientSocket = socket(PF_INET, SOCK_STREAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_port = htons((u_short)port); ptrh = gethostbyname(host); /* Convert host name to IP address */ memcpy(&sad.sin_addr, ptrh->h_addr, ptrh->h_length); connect(clientSocket, (struct sockaddr *)&sad, sizeof(sad)); Create client socket, connect to server

Example: C client (TCP), cont.


Get input stream from user Send line to server gets(Sentence);

n=write(clientSocket, Sentence, strlen(Sentence)+1);

Read line from server

n=read(clientSocket, modifiedSentence, sizeof(modifiedSentence));

printf("FROM SERVER: %s\n,modifiedSentence);

Close connection

close(clientSocket); }

Exampl : C s rv r ( CP)
/* server.c */ void main(int argc, char *argv[]) { struct sockaddr_in sad; /* structure to hold an IP address */ struct sockaddr_in cad; int welcomeSocket, connectionSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */ char clientSentence[128]; char capitalizedSentence[128]; port = atoi(argv[1]); welcomeSocket = socket(PF_INET, SOCK_STREAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */ sad.sin_port = htons((u_short)port);/* set the port number */ bind(welcomeSocket, (struct sockaddr *)&sad, sizeof(sad)); Create welcoming socket at port & Bind a local address

Example: C server (TCP), cont


/* Specify the maximum number of clients that can be queued */ listen(welcomeSocket, 10) Wait, on welcomin socket for contact b a client while(1) { connectionSocket=accept(welcomeSocket, (struct sockaddr *)&cad, &alen); n=read(connectionSocket, clientSentence, sizeof(clientSentence)); /* capitalize Sentence and store the result in capitalizedSentence*/

n=write(connectionSocket, capitalizedSentence, strlen(capitalizedSentence)+1); close(connectionSocket); } } End of while loop, loop back and wait for another client connection Write out the result to socket

Outline for typical concurrent server

Status transition
*after return from accept

*after fork() returns

*after socket close()

Socket programming with UDP


UDP: no connection between client and server no handshaking sender explicitly attaches IP address and port of destination to each packet server must extract IP address, port of sender from received packet UDP: transmitted data may be received out of order, or lost

application viewpoint UDP provides unreliable transfer of groups of bytes (datagrams) between client and server

`
NTP daemon Port 123 UDP

For example: NTP daemon


What does a UDP server need to do so that a UDP client can connect to it?

IP

Ethernet Adapter
23 9

The UDP server must create a datagram socket


int fd; /* socket descriptor */

if((fd = socket(AF_INET, SOCK_DGRAM, 0)) < 0) { perror(socket); exit(1); }

socket returns an integer (socket descriptor) fd < 0 indicates that an error occurred AF_INET: associates a socket with the Internet protocol family SOCK_DGRAM: selects the UDP protocol
24 0

A socket can be bound to a port


int fd; struct sockaddr_in srv; /* create the socket */ /* bind: use the Internet address family */ srv.sin_family = AF_INET; /* bind: socket fd to port 80*/ srv.sin_port = htons(80); /* bind: a client may connect to any of my addresses */ srv.sin_addr.s_addr = htonl(INADDR_ANY); if(bind(fd, (struct sockaddr*) &srv, sizeof(srv)) < 0) { perror("bind"); exit(1); } /* socket descriptor */ /* used by bind() */

Now the UDP server is ready to accept packets


24 1

read does not provide the clients address to the UDP server
int fd; struct sockaddr_in srv; struct sockaddr_in cli; char buf[512]; int cli_len = sizeof(cli); int nbytes; /* 1) create the socket */ /* 2) bind to the socket */ nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) &cli, &cli_len); if(nbytes < 0) { perror(recvfrom); exit(1); }
24 2

/* /* /* /* /* /*

socket descriptor */ used by bind() */ used by recvfrom() */ used by recvfrom() */ used by recvfrom() */ used by recvfrom() */

nbytes = recvfrom(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) cli, &cli_len);

The actions performed by recvfrom


returns the number of bytes read (nbytes) copies nbytes of data into buf returns the address of the client (cli) returns the length of cli (cli_len) dont worry about flags

24 3

2 UDP Clients
`

How does a UDP client communicate with a UDP server?


ports TCP

IP

Ethernet Adapter
24 4

` `

write is not allowed Notice that the UDP client does not bind a port number a port number is dynamically assigned when the first sendto is called

int fd; struct sockaddr_in srv; /* 1) create the socket */

/* socket descriptor */ /* used by sendto() */

/* sendto: send data to IP Address 128.2.35.50 port 80 */ srv.sin_family = AF_INET; srv.sin_port = htons(80); srv.sin_addr.s_addr = inet_addr(128.2.35.50); nbytes = sendto(fd, buf, sizeof(buf), 0 /* flags */, (struct sockaddr*) &srv, sizeof(srv)); if(nbytes < 0) { perror(sendto); exit(1); }
24 5

UDP Server
socket() bind()

UDP Client
socket() sendto() data r q t bl r

recvfrom() i til data ram d fr m a li t

data r ply recvfrom() close()

sendto()

24 6

Example: C client (UDP)


/* client.c */ void main(int argc, char *argv[]) { struct sockaddr_in sad; /* structure to hold an IP address */ int clientSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */ char Sentence[128]; char modifiedSentence[128]; host = argv[1]; port = atoi(argv[2]); clientSocket = socket(PF_INET, SOCK_DGRAM, 0); /* determine the server's address */ memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_port = htons((u_short)port); ptrh = gethostbyname(host); /* Convert host name to IP address */ memcpy(&sad.sin_addr, ptrh->h_addr, ptrh->h_length); Create client socket, NO connection to server

Example: C client (UDP), cont.


Get input stream from user Send line to server gets(Sentence);

i f( tr t addr); addr l dt ( li t t, t , trl ( ( tr t addr *) ad, addr l

t );

Read line from server

fr

( li t t, difi dSentence, i eof( odifiedSentence). ( tr ct ockaddr *) ad, addr len);

printf(" Close connection

SERVER:

s\n,modifiedSentence);

close(clientSocket); }

Exampl : C s rv r (

P)

/* server.c */ void main(int argc, char *argv[]) { struct sockaddr_in sad; /* structure to hold an IP address */ struct sockaddr_in cad; int serverSocket; /* socket descriptor */ struct hostent *ptrh; /* pointer to a host table entry */ char clientSentence[128]; char capitalizedSentence[128]; port = atoi(argv[1]); serverSocket = socket(PF_INET, SOCK_DGRAM, 0); memset((char *)&sad,0,sizeof(sad)); /* clear sockaddr structure */ sad.sin_family = AF_INET; /* set family to Internet */ sad.sin_addr.s_addr = INADDR_ANY; /* set the local IP address */ sad.sin_port = htons((u_short)port);/* set the port number */ bind(serverSocket, (struct sockaddr *)&sad, sizeof(sad)); Create welcoming socket at port & Bind a local address

UDP Server Port 3000 Port 2000

How can the UDP server service multiple ports simultaneously?

UDP

IP

Ethernet Adapter
25 0

int s1; int s2; /* /* /* /* 1) 2) 3) 4) create socket s1 */ create socket s2 */ bind s1 to port 2000 */ bind s2 to port 3000 */

/* socket descriptor 1 */ /* socket descriptor 2 */

while(1) { recvfrom(s1, buf, sizeof(buf), ...); /* process buf */ recvfrom(s2, buf, sizeof(buf), ...); /* process buf */ }
`

What r blems oes this o e have?


25 1

Server Flaw
client 1
call connect ret connect call et

server
call accept

client 2

ret accept

User goes out to lunch Client 1 blocks waiting for user to type in data

Server blocks waiting for data from Client 1

call read

call connect

Client 2 blocks waiting to complete its connection request until after lunch!

Concurrent Ser ers


client 1
call connect ret connect call et

ser er
call accept

client 2
call connect

ret accept call read (dont block) call accept ret connect call et write call read write clo e end read clo e call read

User goes out to lunch Client 1 blocks waiting for user to type in data

ret accept

` ` ` ` ` ` ` ` ` ` `

while (1) { newsock = (int *)malloc(sizeof (int)); *newsock=accept(sock, (struct sockaddr *)&from, &fromlen); if (*newsock < 0) error("Accepting"); printf("A connection has been accepted from %s\n", inet_ntoa((struct in_addr)from.sin_addr)); retval = pthread_create(&tid, NULL, ConnectionThread, (void *)newsock); if (retval != 0) { error("Error, could not create thread"); } }

` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `

/****** ConnectionThread **********/ void *ConnectionThread(void *arg) { int sock, n, len; char buffer[BUFSIZE]; char *msg = "Got your message"; sock = *(int *)arg; len = strlen(msg); n = read(sock,buffer,BUFSIZE-1); while (n > 0) { buffer[n]='\0'; printf("Message is %s\n",buffer); n = write(sock,msg,len); if (n < len) error("Error writing"); n = read(sock,buffer,BUFSIZE-1); if (n < 0) error("Error reading"); } if (close(sock) < 0) error("closing"); pthread_exit(NULL); return NULL; }

Concurrency
Threading
Easier to understand Race conditions increase complexity

Select()
Explicit control flows, no race conditions Explicit control more complicated

There is no clear winner, but you MUST use select()

What is select()?
Monitor multiple descriptors How does it work?
Setup sets of sockets to monitor select(): blocking until something happens Something could be
Incoming connection: accept() Clients sending data: read() Pending data to send: write() Timeout

Concurrency Step 1
Allowing address reuse
int sock, opts=1; sock = socket(...); // To give you an idea of where the new code goes setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &opts, sizeof(opts));

Then we set the sockets to be nonblocking


if((opts = fcntl(sock, F_GETFL)) < 0) { // Get current options printf(Error...\n); ... } opts = (opts | O_NONBLOCK); // Don't clobber your old settings if(fcntl(sock, F_SETFL, opts) < 0) { printf(Error...\n); ... } bind(...); // To again give you an idea where the new code goes

Concurrency Step 2
Monitor sockets with select()
int select(int maxfd, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, const struct timespec *timeout);

maxfd
max file descriptor + 1

fd_set: bit vector with FD_SETSIZE bits


readfds: bit vector of read descriptors to monitor writefds: bit vector of write descriptors to monitor exceptfds: set to NULL

timeout
how long to wait without activity before returning

What about bit vectors?


void FD_ZERO(fd_set *fdset);
clear out all bits

void FD_SET(int fd, fd_set *fdset);


set one bit

void FD_CLR(int fd, fd_set *fdset);


clear one bit

int FD_ISSET(int fd, fd_set *fdset);


test whether fd bit is set

The Server
// socket() call and non-blocking code is above this point if((bind(sockfd, (struct sockaddr *) &saddr, sizeof(saddr)) < 0) { // bind! printf(Error binding\n); ... } if(listen(sockfd, 5) < 0) { printf(Error listening\n); ... } clen=sizeof(caddr); // Setup pool.read_set with an FD_ZERO() and FD_SET() for // your server socket file descriptor. (whatever socket() returned) while(1) { pool.ready_set = pool.read_set; // Save the current state pool.nready = select(pool.maxfd+1, &pool.ready_set, &pool.write_set, NULL, NULL); if(FD_ISSET(sockfd, &pool.ready_set)) { // Check if there is an incoming conn isock=accept(sockfd, (struct sockaddr *) &caddr, &clen); // accept it add_client(isock, &pool); // add the client by the incoming socket fd } check_clients(&pool); // check if any data needs to be sent/received from clients } ... close(sockfd); // listen for incoming connections

What is pool?
typedef struct { /* represents a pool of connected descriptors */ int maxfd; /* largest descriptor in read_set */ fd_set read_set; /* set of all active read descriptors */ fd_set write_set; /* set of all active read descriptors */ fd_set ready_set; /* subset of descriptors ready for reading */ int nready; /* number of ready descriptors from select */ int maxi; /* highwater index into client array */ int clientfd[FD_SETSIZE]; /* set of active descriptors */ rio_t clientrio[FD_SETSIZE]; /* set of active read buffers */ ... // ADD WHAT WOULD BE HELPFUL FOR PROJECT1 } pool;

What about checking clients?


The main loop only tests for incoming connections
There are other reasons the server wakes up Clients are sending data, pending data to write to buffer, clients closing connections, etc.

Store all client file descriptors


in pool

Keep the while(1) loop thin


Delegate to functions

Come up with your own design

int select(int maxfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout);
FD_CLR(int fd, fd_set *fds); FD_ISSET(int fd, fd_set *fds); FD_SET(int fd, fd_set *fds); FD_ZERO(fd_set *fds);
`

/* /* /* /*

clear the bit for fd in fds */ is the bit for fd in fds? */ turn on the bit for fd in fds */ clear all bits in fds */

maxfds: number of descriptors to be tested


descriptors (0, 1, ... maxfds-1) will be tested

readfds: a set of fds we want to check if data is available


returns a set of fds ready to read if input argument is NULL, not interested in that condition

` `

writefds: returns a set of fds ready to write exceptfds: returns a set of fds with exception conditions
26 4

int select(int maxfds, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, struct timeval *timeout); struct timeval { long tv_sec; long tv_usec; }
`

/* seconds / /* microseconds */

timeout
if NULL, wait forever and return only when one of the descriptors is ready for I/O otherwise, wait up to a fixed amount of time specified by timeout
x if we dont want to wait at all, create a timeout structure with timer value equal to 0

Refer to the man page for more information

26 5

select allows synchronous I/O multiplexing


int s1, s2; fd_set readfds; /* socket descriptors */ /* used by select() */

/* create and bind s1 and s2 */ while(1) { FD_ZERO(&readfds); /* initialize the fd set */ FD_SET(s1, &readfds); /* add s1 to the fd set */ FD_SET(s2, &readfds); /* add s2 to the fd set */ if(select(s2+1, &readfds, 0, 0, 0) < 0) { perror(select); exit(1); } if(FD_ISSET(s1, &readfds)) { recvfrom(s1, buf, sizeof(buf), ...); /* process buf */ } /* do the same for s2 */ }
26 6

More Details About a Web Server


How can a a web server manage multiple connections simultaneously?
Port 80

Web Server Port 8001

TCP

IP

Ethernet Adapter
26 7

int fd, next=0; /* original socket */ int newfd[10]; /* new socket descriptors */ while(1) { fd_set readfds; FD_ZERO(&readfds); FD_SET(fd, &readfds); /* Now use FD_SET to initialize other newfds that have already been returned by accept() */ select(maxfd+1, &readfds, 0, 0, 0); if(FD_ISSET(fd, &readfds)) { newfd[next++] = accept(fd, ...); } /* do the following for each descriptor newfd[n] */ if(FD_ISSET(newfd[n], &readfds)) { read(newfd[n], buf, sizeof(buf)); /* process data */ } }
`

Now the web server can support multiple connections...


Lecture 3: 9-4-01

26 8

0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Length | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Address | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Type: 4-byte integer Length: 2-byte integer Checksum: 2-byte integer Address: 4-byte IP address

26 9

struct packet { u_int32_t u_int16_t u_int16_t u_int32_t };

type; length; checksum; address;

/* ================================================== */ char buf[1024]; struct packet *pkt; pkt = (struct packet*) buf; pkt->type = htonl(1); pkt->length = htons(2); pkt->checksum = htons(3); pkt->address = htonl(4);
27 0

#include <stdio.h> /* for printf() and fprintf() */ #include <sys/socket.h> /* for socket(), connect(), sendto(), and recvfrom() */ #include <arpa/inet.h> /* for sockaddr_in and inet_addr() */ #include <stdlib.h> /* for atoi() and exit() */ #include <string.h> /* for memset() */ #include <unistd.h> /* for close() */ #define ECHOMAX 255 /* Longest string to echo */

int main(int argc, char *argv[]) { int sock; /* Socket descriptor */ struct sockaddr_in echoServAddr; /* Echo server address */ struct sockaddr_in fromAddr; /* Source address of echo */ unsigned short echoServPort =7; /* Echo server port */ unsigned int fromSize; /* address size for recvfrom() */ char *servIP=172.24.23.4; /* IP address of server */ char *echoString=I hope this works; /* String to send to echo server */ char echoBuffer[ECHOMAX+1]; /* Buffer for receiving echoed string */ int echoStringLen; /* Length of string to echo */ int respStringLen; /* Length of received response */

/* Create a datagram/UDP socket */ sock = socket(AF_INET, SOCK_DGRAM, 0); /* Construct the server address structure */ memset(&echoServAddr, 0, sizeof(echoServAddr)); /* Zero out structure */ echoServAddr.sin_family = AF_INET; /* Internet addr family */ echoServAddr.sin_addr.s_addr = htonl(servIP); /* Server IP address */ echoServAddr.sin_port = htons(echoServPort); /* Server port */ /* Send the string to the server */ sendto(sock, echoString, echoStringLen, 0, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr); /* Recv a response */

fromSize = sizeof(fromAddr); recvfrom(sock, echoBuffer, ECHOMAX, 0, (struct sockaddr *) &fromAddr, &fromSize); /* Error checks like packet is received from the same server*/ /* null-terminate the received data */ echoBuffer[echoStringLen] = '\0'; printf("Received: %s\n", echoBuffer); /* Print the echoed arg */ close(sock); exit(0); } /* end of main () */

int main(int argc, char *argv[]) { int sock; /* Socket */ struct sockaddr_in echoServAddr; /* Local address */ struct sockaddr_in echoClntAddr; /* Client address */ unsigned int cliAddrLen; /* Length of incoming message */ char echoBuffer[ECHOMAX]; /* Buffer for echo string */ unsigned short echoServPort =7; /* Server port */ int recvMsgSize; /* Size of received message */ /* Create socket for sending/receiving datagrams */ sock = socket(AF_INET, SOCK_DGRAM, 0); /* Construct local address structure */ memset(&echoServAddr, 0, sizeof(echoServAddr)); /* Zero out structure */ echoServAddr.sin_family = AF_INET; /* Internet address family */ echoServAddr.sin_addr.s_addr = htonl(172.24.23.4); echoServAddr.sin_port = htons(echoServPort); /* Local port */ /* Bind to the local address */ bind(sock, (struct sockaddr *) &echoServAddr, sizeof(echoServAddr);

for (;;) /* Run forever */ { cliAddrLen = sizeof(echoClntAddr); /* Block until receive message from a client */ recvMsgSize = recvfrom(sock, echoBuffer, ECHOMAX, 0, (struct sockaddr *) &echoClntAddr, &cliAddrLen); printf("Handling client %s\n", inet_ntoa(echoClntAddr.sin_addr)); /* Send received datagram back to the client */ sendto(sock, echoBuffer, recvMsgSize, 0, (struct sockaddr *) &echoClntAddr, sizeof(echoClntAddr); } } /* end of main () */ Error handling is must

The setsockopt() function manipulates options associated with a socket. Options can exist at multiple protocol levels. However, the options are always present at the uppermost socket level. Options affect socket operations, such as the routing of packets, out-of-band data transfer, and so on.

` `

The level argument specifies the protocol level at which the option resides. To set options at the socket level, specify the level argument as SOL_SOCKET. To set options at other levels, supply the appropriate protocol number for the protocol controlling the option. For example, to indicate that an option is interpreted by the TCP (Transport Control Protocol), set level to the protocol number of TCP. The following options are supported for setsockopt(): SO_DEBUG Provides the ability to turn on recording of debugging information. This option takes an int value in the optval argument. This is a BOOL option. SO_BROADCAST Permits sending of broadcast messages, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOL option. SO_REUSEADDR Specifies that the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option takes an int value in the optval argument. This is a BOOLoption.

` `

SO_KEEPALIVE Keeps connections active by enabling periodic transmission of messages, if this is supported by the protocol. If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option takes an int value in the optval argument. This is a BOOL option. SO_LINGER Specifies whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option takes a linger structure in the optval argument.

` ` `

SO_OOBINLINE Specifies whether the socket leaves received out-of-band data (data marked urgent) in line. This option takes an int value in optval argument. This is a BOOL option. SO_SNDBUF Sets send buffer size information. This option takes an int value in the optval argument. SO_RCVBUF Sets receive buffer size information. This option takes an int value in the optval argument. SO_DONTROUTE Specifies whether outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option takes an int value in the optval argument. This is a BOOL option. TCP_NODELAY Specifies whether the Nagle algorithm used by TCP for send coalescing is to be disabled. This option takes an int value in the optval argument. This is a BOOL option. For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled.

` `

` ` `

RETURN VALUES If successful, setsockopt() returns a zero. If a failure occurs, it returns a value of -1 and sets errno to one of the following values: EBADF s is not a valid descriptor ENOTSOCK s is not a socket descriptor ENOPROTOOPT optname is unknown at indicated level EFAULT optval is an invalid pointer

Sample Usage: int skt, int sndsize; err = setsockopt(skt, SOL_SOCKET, SO_SNDBUF, (char *)&sndsize, (int)sizeof(sndsize));or: int skt, int sndsize; err = setsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sndsize, (int)sizeof(sndsize));

` ` `

` ` ` `

int optval; int optlen; char *optval2; // set SO_REUSEADDR on a socket to true (1): optval = 1; setsockopt(s1, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof optval); // bind a socket to a device name (might not work on all systems): optval2 = "eth1"; // 4 bytes long, so 4, below: setsockopt(s2, SOL_SOCKET, SO_BINDTODEVICE, optval2, 4); // see if the SO_BROADCAST flag is set: getsockopt(s3, SOL_SOCKET, SO_BROADCAST, &optval, &optlen); if (optval != 0) { print("SO_BROADCAST enabled on s3!\n"); }

` `

` `

ESCRIPTION The getsockopt() function retrieves the current value for a socket option associated with a socket of any type, in any state, and stores the result in optval. Options may exist at multiple protocol levels, but they are always present at the uppermost socket' level. Options affect socket operations, such as the routing of packets, out-of-band data transfer, and so on. The level argument specifies the protocol level at which the option resides. To retrieve options at the socket level, specify the level argument as SOL_SOCKET. To retrieve options at other levels, supply the appropriate protocol number for the protocol controlling the option. For example, to indicate that an option is to be interpreted by the TCP (Transport Control Protocol), set level to the protocol number of TCP. The value associated with the selected option is returned in the buffer optval. The integer pointed to by optlen should originally contain the size of this buffer; on return, it is set to the size of the value returned. For SO_LINGER, this is the size of a struct linger; for most other options it is the size of an integer. The application is responsible for allocating any memory space pointed to directly or indirectly by any of the parameters it specified. If an option has not been set with setsockopt(), getsockopt() returns the default value for the option.

` ` `

` `

O_DEBUG Reports whether debugging information is being recorded. This option stores an int value in the optval argument. This is a BOOL option. SO_ACCEPTCONN Reports whether socket listening is enabled. This option stores an int value in the optval argument. This is a BOOL option. SO_BROADCAST Reports whether transmission of broadcast messages is supported, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOL option. SO_REUSEADDR Reports whether the rules used in validating addresses supplied to bind() should allow reuse of local addresses, if this is supported by the protocol. This option stores an int value in the optval argument. This is a BOOLoption. SO_KEEPALIVE Reports whether connections are kept active with periodic transmission of messages, if this is supported by the protocol. If the connected socket fails to respond to these messages, the connection is broken and processes writing to that socket are notified with an ENETRESET errno. This option stores an int value in the optval argument. This is a BOOL option.

` ` ` ` ` `

SO_LINGER Reports whether the socket lingers on close() if data is present. If SO_LINGER is set, the system blocks the process during close() until it can transmit the data or until the end of the interval indicated by the l_lingermember, whichever comes first. If SO_LINGER is not specified, and close() is issued, the system handles the call in a way that allows the process to continue as quickly as possible. This option stores a linger structure in the optval argument. SO_OOBINLINE Reports whether the socket leaves received out-of-band data (data marked urgent) in line. This option stores an int value in optval argument. This is a BOOL option. SO_SNDBUF Reports send buffer size information. This option stores an int value in the optval argument. SO_RCVBUF Reports receive buffer size information. This option stores an int value in the optval argument. SO_ERROR Reports information about error status and clears it. This option stores an int value in the optval argument. SO_TYPE Reports the socket type. This option stores an int value in the optval argument. SO_DONTROUTE Reports whether outgoing messages bypass the standard routing facilities. The destination must be on a directly-connected network, and messages are directed to the appropriate network interface according to the destination address. The effect, if any, of this option depends on what protocol is in use. This option stores an int value in the optval argument. This is a BOOL option. SO_MAX_MSG_SIZE Maximum size of a message for message-oriented socket types (for example, SOCK_DGRAM). Has no meaning for stream-oriented sockets. This option stores an int value in the optval argument.

` `

` ` `

CP_NODELAY Specifies whether the Nagle algorithm used by TCP for send coalescing is disabled. This option stores an int value in the optval argument. This is a BOOL option. For boolean options, a zero value indicates that the option is disabled and a non-zero value indicates that the option is enabled. RETURN VALUES If successful, getsockopt() returns a zero. If a failure occurs, it returns a value of -1 and sets errno to one of the following values: EBADF The parameter s is not a valid descriptor. ENOPROTOOPT The option is unknown at the level indicated. ENOTSOCK The parameter s is a file, not a socket.

int sockbufsize = 0; int size = sizeof(int); err = getsockopt(skt, SOL_SOCKET, SO_RCVBUF, (char *)&sockbufsize, &size);