You are on page 1of 105

IPC – Inter Process Communication

2015/09
Materials can be found:

2
Prerequisite

Linux +
gcc env.
Lecture +
Hands-on

3
Agenda

1. Introduction
2. Pipe
3. Message Queue
4. Semaphore
5. Shared Memory
6. Socket

4
Introduction

5
Introduction
Pipe

Message
Queue
Local host
System V IPC
Semaphore
IPC – Inter Process POSIX IPC
Communication
Shared
Memory

Unix domain
socket

Socket
Different host

STREAMS

6
Pipe
Pipe – system call

#include <unistd.h>
int pipe(int filedes[2]);

Returns: On success, zero is returned. On error, -1 is returned,


and errno is set appropriately.

2 file descriptors are returned:


filedes[0]: reading
filedes[1]: writing
The output of filedes[1] is the input for filedes[0]

8
Pipe

User Process Parent Child


fork
fd[0] fd[1] fd[0] fd[1] fd[0] fd[1]

Pipe Pipe

Kernel Kernel

9
Pipe
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <errno.h>
4 #include <unistd.h>
5 #include <sys/types.h>
6
7
#include <sys/wait.h>
Take a look at
8 main()
9
10
{
pid_t pid;
fork()
11 int rv;
12

13 switch(pid=fork()) {
14 case -1:
15 perror("fork"); /* something went wrong */
16 exit(1); /* parent exits */
17

18 case 0:
19 printf(" CHILD: This is the child process!\n");
20 printf(" CHILD: My PID is %d\n", getpid());
21 printf(" CHILD: My parent's PID is %d\n", getppid());
22 printf(" CHILD: Enter my exit status (make it small): ");
23 scanf(" %d", &rv);
24 printf(" CHILD: I'm outta here!\n");
25 exit(rv);
26

27 default: /* return value == pid of child process


28 printf("PARENT: This is the parent process!\n");
29 }
30 }

10
Pipe – In-Class Practice

 Copy the contents in attached file to pipe.c


 gcc –o pipe pipe.c
 ./pipe

pipe.c

11
Pipe – Usage in Shell

• ls | more

Steps implied by above command


1. A child process is forked
2. The child and parent uses pipe to communicate
3. The output of ls is write to the pipe
4. The more reads the input from read end of the pipe

12
Pipe – popen and pclose function

#include <stdio.h>
FILE *popen(const char *cmdstring, const char *type);
Returns: file pointer if OK, NULL on error

int pclose(FILE *fp);

Returns: termination status of cmdstring, or –1 on error

Type: ‘r’ means the returned file pointer is readalbe; ‘w’ means the returned file pointer is
writeable.
Create a pipe, forking a child, closing the unused ends of the pipe, executing a shell to run
the command, and waiting for the command to terminate.

13
Pipe – In-Class Practice

 Copy the contents in attached file into myuclc.c and main.c


 gcc –o myuclc myuclc.c
 gcc main.c
 ./a.out

Popen Filter program


Parent pipe
stdout
stdout
stdin
pr
om
pt p ut myuclc.c main.c
User at a e r in
terminal Us

14
Pipe – FIFO

#include <sys/stat.h>
int mkfifo(const char *pathname, mode_t mode);
Returns: 0 if OK, -1 on error

With FIFOs, unrelated processes can exchange data.

15
Pipe – In-Class Practice

 Using mkfifo(1) shell command to create a FIFO


 Mkfifo –m 555 myfifo
 Open 2 terminals
 Terminal 1: cat myfile > myfilo
 Terminal 2: cat myfifo

16
Pipe - Summary

What is pipe

Pipe usage in C

Pipe usage in SHELL command

17
System V IPC - Overview
System V IPC - Introduction

System V message queue — used to pass messages between processes


 boundary management, i.e. message granularity
 messages are typed with integer values, allowing to cherry pick messages of a specific type

System V semaphores — used for process synchronization


 kernel-maintained integers that can atomically tested & decremented (or tested & incremented)
 “taking” a unit of a semaphore value indicates that the taker is working on a shared resource

System V shared memory – used to share memory regions


 similar to shared mmap mappings, but with kernel persistence

19
System V IPC - History

Late 70s they first appear together in Columbus UNIX, a Bell UNIX for database and
efficient transaction processing
1983 they land together in System V that made them popular in mainstream UNIX-
es, hence the name
2001 SUSv3 is published and require implementation of all of them for XSI
conformance, hence they are also called XSI IPC

20
System V IPC - API

aspect msg queues semaphores shared memory

header <sys/msg.h> <sys/sem.h> <sys/shm.h>

data type msqid_ds semid_ds shmid_ds

get object msgget semget shmget / shmat

close object - - shmdt

manipulation msgctl semctl shmctl

communicate msgsnd / msgrcv semop memory access

System V IPC mechanisms share many API traits.


Note the uniformity in naming and operations, to the extent of what’s possible for different
communication primitives.

21
System V IPC - Concepts

 identifier unique identifier for an IPC object (in Kernel)


 key name used by processes to rendez-vous on a common object, to ensure
they communicate in the same context (external naming scheme)

Date
(French)
 object instance of the IPC mechanism that can be used for communication
• e.g. a specific message queue, (set of) semaphores, shared memory region
• each object defines a communication context

22
System V IPC - Protocol

The usage of all System V IPC mechanisms goes through a common “protocol”:
1. Obtain a key - mechanism-independent
2. Get an object - passing the key to a mechanism-specific syscall
3. Store its identifier - mechanism-independent
4. Communicate through the object - passing the identifier to mechanism-specific
syscalls

Steps (1) and (2) are optional: object identifiers are stable integers
that can be stored and passed around as processes see fit.

23
System V IPC - Getting IPC object

 Each mechanism provide a get syscall to get an IPC object


• msgget, semget, shmget

 Similarly to open for files, get syscalls are used to either:


• create a new IPC object and return its identifier; or
• return the identifier of a pre-existing IPC object.

 Either way, get syscalls receive as arguments: 1


• the key for the object we want to get
• a set of flags that always support:
IPC_CREAT request object creation, if needed
IPC_EXCL request object creation, fail if already exists
S_IRUSR, S_IWUSR, . . . permission mask

24
System V IPC – Example on getting IPC object

int id ;
/* . . . */
/* get message queue for key */
i f ( ( id = msgget ( key , IPC_CREAT | S_IRUSR | S_IWUSR ) ) < 0)
err_sys ( "msgget error " ) ;

Notes:
• the message queue will be created if it doesn’t exist, otherwise
the existing one corresponding to key will be returned
• if created, the message queue will be readable and writable only
by processes belonging to the owner of the current process

25
System V IPC - object persistence

System V IPC objects have kernel persistence: they remain available, no matter the
number of “users” they have, until kernel shutdown or explicit deletion.

Advantage
• processes can access the object, change its state, and then exit without having to
wait; other processes can come up later and check the (modified) state i.e. System
V IPC objects are stateful and connectionless

Disadvantage
• IPC objects consume system resources and cannot be automatically garbage
collected, hence the need of enforcing limits on their quantity
• it’s hard to determine when it is safe to delete an object

26
System V IPC - Object deletion

 Each System V IPC offers a control syscall for generic object manipulation
• msgctl, semctl, shmctl
 Among other things, IPC control syscalls are used to delete IPC objects:
• passing the IPC_RMID command
• and no extra arguments (i.e. a trailing NULL)

/* delete shared memory region */


if ( shmctl ( id , IPC_RMID , NULL) < 0)
err_sys ( " shmctl error " ) ;

27
System V IPC – Shell manipulation

 ipcs: lists available System V IPC objects, ipcs –l show system limits on IPC object counts
 ipcrm: allows to delete IPC objects (we own)
 ipcmk: (non portable) allows to create IPC objects

• On Linux, /proc/sysvipc provides a view on all existing IPC


objects, in a format easier to parse than ipcs. In-Class Practice

28
System V IPC – Assigning keys

There are 3 main strategies to assign IPC keys to applications:

1. Choose an arbitrary key value once and for all and distribute it via a common header file

2. Pass the special constant IPC_PRIVATE as key to get syscalls, which always result in the creation
of a fresh IPC object
 e.g. id = msgget(IPC_PRIVATE, S_IRUSR | S_IWUSR);
 the creating process will then need to transmit the object identifier to collaborating processes by other
means

29
System V IPC - Assigning keys

3. Use the ftok syscall to generate a unique key from the name of an existing file and an arbitrary
integer
 i.e. delegate uniqueness guarantees to the filesystem namespace

#include <sys/ipc.h>
key_t ftok(char *pathname, int proj);
Returns: key value if OK, -1 on error
• pathname points to a file that will be subject to stat
• proj allows to have different contexts for the same file

Behind the scenes, the 8 least significant bits of proj and of the file
i-node will be used to build a unique key.

30
Message Queue
Message Queue

System V message queues are IPC objects used for data transfer
among unrelated processes.
 communication granularity are individual messages, no need to handle message
boundaries explicitly as it happened with byte stream
 communication discipline is first-in, first-out . . .
 . . . but messages are typed with integer values and can be extracted using type predicates
i.e. a queue can be seen as multiple independent queues, one per message type, as well as a
priority queue
 each message carries a payload of arbitrary data

32
Message Queue - msgget

An existing message queue can be opened or a new one created using the message queue get syscall
msgget:
#include <sys/msg.h>
int msgget(key_t key, int flags);
Returns: message queue ID if OK, -1 on error

• flags supports IPC_CREAT, IPC_EXCL, and permissions


• as usual the return message queue ID should be stored and used for
subsequent usage of the message queue
Reminder: as with all other System V IPC objects, get is not mandatory; message queue
IDs are stable and can be passed around by other means.

33
Message Queue - msgsnd

Once you have a message msgp, you can send it through a given message queue msqid
using msgsnd:

#include <sys/msg.h>
int msgsnd(int msqid, const void *msgp, size_t msgsz, int msgflg);
Returns: 0 if OK, -1 on error
• msgsz is the size of the payload (i.e. what follows mtype)
it will depend on your specific instantiation of the msg. struct
• msgflg supports a single flag: IPC_NOWAIT | non-blocking send
if given, IPC_NOWAIT will have msgsnd fail with EAGAIN if the target queue
is full, instead of blocking

34
Message Queue - msgrcv

#include <sys/msg.h>
int msgrcv(int msqid, void *msgp, size_t maxmsgsz, long msgtyp, int msgflg);
Returns: 0 if OK, -1 on error

• maxmsgsz is the maximum payload size


• msgtyp is the type predicate that identifies the message we want to receive
msgtyp == 0 first message in the queue
msgtyp > 0 first message mtype == msgtyp
msgtyp < 0 first message with the lowest mtype < abs(msgtyp)

35
Message Queue - msgctl

#include <sys/msg.h>
int msgctl(int msqid, int cmd, struct msqid_ds *buf);
Returns: 0 if OK, -1 on error

Allowed commands for cmd are:


• IPC_RMID destroy queue; all pending messages are lost
• IPC_STAT get queue data structure into buf
• IPC_SET set queue data structure from buf

36
Message Queue – In-Class Practice

1. Create 2 files: writeQ.c, readQ.c


2. Copy attached file content into new files accordingly
3. gcc –o readQ readQ.c
4. gcc –o writeQ writeQ.c

writeQ.c readQ.c

37
Message Queue - In-Class Practice

run readQ& from terminal 1


run writeQ in same terminal
write something in writeQ, see what you get?

Open 2 terminals:
run readQ from terminal 1
run writeQ from terminal 2
Write something on terminal 2, what you get?

38
Semaphore
Semaphore
A semaphore is an abstract data type whose instances can be used by processes that want to
coordinate access to shared resources. A semaphore is equipped with operations that allow
coordination without race conditions between testing the value of the semaphore and changing it.
In its general form, a semaphore is an integer variable that can never descend below 0.
Intuitively, such a semaphore counts the number of available units of a shared resource.
A special, yet very common, case is that of a binary semaphore used to guarantee mutual exclusion
when accessing a single shared resource. A binary semaphore has only two possible values:
1 the resource is available
0 the resource is taken
Stefano

40
Semaphore - Operations

Two operations are defined on a semaphore:


V (for verhogen, “increase” in Dutch)
increase semaphore value
intuition: signal that a resource unit has been made available to others who are waiting for it

P (for probeer te verlagen, “try to decrease” in Dutch)


try to decrease semaphore value and blocks if that would result in a negative value; when
the operation returns the semaphore value has been decreased by one intuition: ask for
exclusive access to a resource unit

41
Semaphore - Example
You want to coordinate access to a shared memory area among multiple processes
ensuring memory coherence.
More precisely, each process wants to perform an increment action on the int counter
variable stored into shared memory. The increment operation is defined as: read counter
and write it back incremented by one (i.e. counter++). The desired property is that if
n processes perform a total of k increment operations, the counter is incremented exactly
by k.
Due to the race condition between reading counter value and writing it back
incremented, it is not enough for each process to simply do counter++ with more complex
data structure, the race condition can also cause data corruption that makes impossible
to read the value.

42
Semaphore – Example (cont.)

With a binary semaphore S we can solve the problem and avoid race conditions. We
assume that the semaphore is initialized with S <-1.
Each process will implement increment as follows:
1 P(S)
2 counter++
3 V(S)
The P call would block if someone else is incrementing counter. Within the P … V window a
process knows that—as long as all other processes follow the same protocol—he will
execute its critical section without interference from others.

For semaphores to work, it is essential that P(S) atomically test & decrement
(if possible) the semaphore value. Only the kernel can guarantee that.

43
Semaphore – Example (cont.)

In the example, there is a single resource available (the counter global variable), hence a
binary semaphore is enough.
 the scheme can be generalized to n available resources using
S <- n as semaphore initialization
 We then generalize P and V to take arbitrary integer parameters
V(S, m) increment semaphore value by m (1 … n)
P(S, m) try to decrease semaphore value by m and block if that is not possible

NOTE: until the decrease is possible, semaphore value remains unchanged, i.e. no partial
decrease is possible

44
Semaphore – System V Semaphore

The traditional implementation of semaphores by UNIX kernels are System V semaphores.


 they belong to System V IPC mechanisms; the IPC object is a set of semaphores
 all semaphores in the set are generalized semaphores with arbitrary (non-negative)
integer values
 each semaphore in a set support the following operations:
1. add a (positive) value to semaphore value (generalized V)
2. subtract a (positive) number from semaphore value (generalized P)
3. set semaphore value to an arbitrary (non-negative) value (initialization)
4. wait for semaphore value to become 0

45
Overview of System V semaphore API

header file <sys/sem.h>

get semaphore set semget

semaphore action semop

data structure semid_ds

control semctl

46
Semaphore - semget

#include <sys/sem.h>
int semget(key_t key, int nsems, int semflg);
Returns: semaphore set ID if OK, -1 on error

• nsems specifies the number of semaphores in the set


1 is a common choice
• semflg supports IPC_CREAT, IPC_FLAGS, and permissions as usual

47
Semaphore - semop

All semaphore operations are requested via the semop syscall. The fact that the object
granularity is the semaphore set, makes it rather cumbersome to use. . .
#include <sys/sem.h>
int semop(int semid, struct sembuf *sops, unsigned int nsops);
Returns: 0 if OK, -1 on error
• semid is the semaphore set identifier
• sops points to a (non-empty) array of operations to be performed on individual semaphores
struct sembuf {
unsigned short sem_num; /* semaphore number, 0􀀀based */
short sem_op; /* operation to be performed */
short sem_flg ; /* operation f lags */
};
• nsops is the number of operations in the sops array

48
Semaphore – semop (cont.)

struct sembuf {
unsigned short sem_num; /* semaphore number, 0 based */
short sem_op; /* operation to be performed */
short sem_flg ; /* operation f lags */
};

Operations are interpreted as follows:


sem_op > 0 -> V (Ssem_num, sem_op)
sem_op < 0 -> P (Ssem_num, |sem_op|)
sem_op = 0 -> test & wait until Ssem_num = 0
operation flags support IPC_NOWAIT with the usual semantics

49
Semaphore - semctl
The semaphore set control syscall is semctl. Contrary to other control syscalls, semctl is
overloaded for two different purposes:
1 generic control operations (access to semid_ds, deletion)
2 semaphore initialization & reset

#include <sys/sem.h>
int semctl(int semid, int semnum, int cmd, ... /*union semun arg */);
Returns: non-negative integer if OK, -1 on error

Generic control operations are as usual according to cmd:


• IPC_RMID to delete the semaphore set
• IPC_STAT/IPC_SET to get/set the associated data structure

50
Semaphore – semctl (cont.)
#include <sys/sem.h>
int semctl(int semid, int semnum, int cmd, ... /*union semun arg */);
Returns: non-negative integer if OK, -1 on error
Semaphore initialization / reset uses the following cmd-s:
• GETVAL/SETVAL: get/set the value of the setnum-th semaphore
• GETALL/SETALL: as above, but for all semaphores at once
Arguments and return values for all semctl commands are conveyed by semun, that must be defined by
your programs as follows:
union semun {
int val ; /* individual semaphore value */
struct semid_ds *buf ; /* semaphore data structure */
unsigned short * array ; /* multiple semaphore values */
};

51
Semaphore Initialization

 according to SUSv3, semaphores are not initialized to any specific value when created with
semget+IPC_CREAT
Linux initializes semaphores to 0, but it’s not portable
 therefore, semaphores must be explicitly initialized after creation with semctl
If multiple processes attempt to create & initialize a common
semaphore you get a race condition for semaphore initialization.

Solutions:
1. avoidance: ensure that a single process is in charge of creating& initializing the semaphore,
e.g.:
 creation at (application) boot time, possibly using ipcmk
 have a parent create & initialize before spawning children
2 the sem_otime trick based on the historical (and now standard) fact that the field is set to 0 upon
creation
52
Semaphore - In-Class Practice

• mkdir semaphore
• Copy contents attached file into new files accordingly
• gcc –o semaphore semaphore.c
• gcc –o semaphoreInit semaphoreInit.c

semaphore.c semaphoreInit.c

53
Semaphore - In-Class Practice

• run semaphoreInit
• run semaphore in process one
• Press return to lock
• Stop here
• run samaphore in process two
• In process two can you get the resource? Why?
• Release the resource in process one.
• Can you get the resource in process two now?

54
Semaphore - In-Class Practice

• Clean up the semaphore created by you


• Use ipcs to show all the ipcs
• Use “ipcrm –s <semaphore id>” to delete the semaphore.

55
System V semaphores - Disadvantages

Common System V IPC issues:


• object identifiers, rather than FDs
• keys, rather than filenames
• no accounting of object users; hard to delete safely
• limits

Semaphore-specific issues:
• initialization race condition
• overly complex API, “thanks” to the set granularity

56
Shared Memory
System V Shared Memory

System V shared memory is an IPC mechanism that allows unrelated


processes to share memory areas. In the System V shared memory jargon, shared
memory areas are called segments.

As with other System V IPC mechanisms, segments are kernel-persistent and


should be explicitly created and destroyed.

58
Shared memory protocol
The “protocol” to use shared memory segments, unlike other System V mechanisms, require
more than one step before actual IPC.

1. obtain the identifier of a shared memory segment


• this can happen via a get syscall (shmget) that either create a segment or retrieve its identifier
• or via communication of the identifier by other means

2. attach the segment to the process address space


• as a result of attaching, the caller obtain the starting memory address of the shared segment

3. use the attached segment, as if it were native process memory


• no mediation by the syscall API is needed
• as with memory mappings, synchronization among sharing processes is an important concern here

59
Shared memory protocol (cont.)

4. once a process is done using shared memory, it should detach the segment from its
address space
• once detached, access to the memory corresponding to the (now detached) segment will cause SIGSEGV

5. once all processes are done using shared memory, a process should delete the segment
using the control syscall with IPC_RMID
• until this step is done, the shared memory segment will continue to exist (and to occupy system resources)

60
Overview of System V Shared Memory API

header file <sys/shm.h>

get segment shmget

attach segment shmat

detach segment shmdt

data structure shmid_ds

control shmctl

61
shmget
shmget is the get syscall for shared memory segments:

#include <sys/shm.h>
int shmget(key_t key, size_t size, int shmflg);
Returns: shared memory segment id if OK, -1 on error

• key is the System V key, obtained by the usual means


• size is the size of the desired memory segment, in bytes
 the kernel will return a segment of size multiple to system page size, roundup to the next such
multiple will happen
• shmflg is a bitwise OR of the usual System V IPC flags:
IPC_CREAT, IPC_EXCL, permission bits
• on success, shmget returns a shared memory segment identifier that can be stored and passed to others

62
shmat
Once a process has obtained a segment identifier, it should use it to attach the segment
to its address space using shmat.

#include <sys/shm.h>
void *shmat(int shmid, const void *shmaddr, int shmflg);
Returns: base address of the attached segment if OK, -1 on error
• shmid is the identifier of the segment to attach
• shmaddr is the address where to attach the segment
 it is optional, if omitted the kernel will choose a suitable address
• Flags is a bitwise OR of flags that include:
 SHM_RDONLY: attach segment read-only
 SHM_REMAP: replace already existing mapping at shmaddr
• on success, shmat will return the starting address of the attached segment

63
shmat (cont.)

Attaching segments at fixed addresses using shmaddr comes with its own problems

• at compile time, when we write the program, we often don’t know if shmaddr will be available at
runtime for attaching the segment; in particular:
 we don’t know if another segment will be attached at shmaddr
 we don’t know if at shmaddr there will be enough available contiguous memory to fit the
segment+roundup
• it reduces portability, as an address valid on some UNIX machine will not necessarily be valid on
different UNIX-es

As a best practice, never use shmaddr and rely on the kernel to find
a suitable address for you.

64
shmdt

Once done working on a shared memory segment, a process can detach it from its
address space using shmdt:

#include <sys/shm.h>
int shmdt(const void *shmaddr);
Returns: 0 if OK, -1 on error

Note that detaching is different than deleting. Once detached, the memory segment (and
its content) will still exist in kernel space until explicitly deleted.

65
Segment inheritance

 Attached memory segments are automatically detached upon process termination.


 Attached memory segments are inherited through fork. That provides a(nother) handy
way to do IPC between parent and child
1. parent creates shared memory segment using key IPC_PRIVATE
2. parent fork-s a child
3. child and parent can now transfer data via shared memory

66
shmctl

The control syscall for shared memory segments is shmctl:

#include <sys/shm.h>
shmctl(int shmid, int cmd, struct shmid_ds *buf);
Returns: 0 if OK, -1 on error

It supports the usual operations: IPC_RMID (removal), IPC_STAT (retrieve shmid_ds),


IPC_SET (update shmid_ds).

67
Shared Memory - In-Class Practice

mkdir sharedMem
Copy the contents in attached file to sharedMemRC.c
gcc –o sharedMemRC sharedMemRC.c sharedMemRC.c
./sharedMemRC w&
./sharedMemRC
What do you see?
run it again: ./sharedMemRC
What go you see?

68
Shared Memory - In-Class Practice

Any difference between the first read and second read?


This is the concurrency issue of accessing the shared resource.
By the time it reads the data from shared memory, another process is writing data to it
This is also called a race condition
How to have it resolved?

69
System V IPC vs. POSIX IPC
POSIX IPC API Summary

71
Comparison of System V IPC and POSIX IPC

Advantages
• The POSIX IPC interface is simpler than the System V IPC interface.
• The POSIX IPC model—the use of names instead of keys, and the open, close, and unlink
functions—is more consistent with the traditional UNIX file model.
• POSIX IPC objects are reference counted. This simplifies object deletion, because we can
unlink a POSIX IPC object, knowing that it will be destroyed only when all processes have
closed it.

Disadvantage
• POSIX IPC is not portable than System V IPC

72
POSIX IPC – check tool

• cat /dev/mqueue/mqxxxx
• cat /dev/shm/shmxxxx
• cat /dev/shm/sem.xxx

73
System V IPC - Summary

Overview

Message Queue

Semaphore

Shared Memory

POSIX IPC

74
Socket
Socket – What is it?

 Sockets are IPC objects that allow to exchange data between processes running:
­ either on the same machine (host), or
­ on different ones over a network.

 A socket is like a two way pipe!

History:
The UNIX socket API first appeared in 1983 with BSD 4.2. It has been finally
standardized for the first time in POSIX.1g (2000), but has been ubiquitous to every
UNIX implementation since the 80s.

76
Socket - Unix vs. Internet sockets

 Unix domain socket


• Use the file system as the address name space
• UNIX file permissions can be used to control access UNIX socket
• Needs to be migrated to inet socket for inter-machine communication

 Internet domain socket


• Inter-machine communication
• Inter-OS communication

77
Socket - Internet Sockets Types

Connectionless
Diagram Unreliable
e.g. UDP

Connection Oriented
Reliable
Stream Byte stream
e.g. TCP

Sequential Similar as Stream, but message-based


e.g. SCTP
Packet

• Interface to network layer (IP in Internet domain)


Raw • Super user privilege are required

78
Socket – OSI Reference Model and TCP/IP Model Layers

79
Socket – Data Flow

81
Socket – Data Encapsulation

Network Access Internet Transport Application

• Application Layer (telnet, ftp, etc.)


• Host-to-Host Transport Layer (TCP, UDP)
• Internet Layer (IP and routing)
• Network Access Layer (Ethernet, ATM, or whatever)

82
Socket – Create Socket

#include <sys/socket.h>
int socket(int domain, int type, int protocol);
Returns: file (socket) descriptor if OK, –1 on error

• Domain: AF_INET, AF_INET6, AF_UNIX, AF_UNSPEC


• type: SOCK_DGRAM, SOCK_RAW, SOCK_SEQPACKET, SOCK_STREAM
• Protocol: usually 0 to select the default protocol for the given domain and socket type.
i.e. TCP – STREAM, UDP - DGRAM

83
Socket – Socket Descriptor

 Socket descriptors are implemented as file descriptors in the UNIX System.


 Many of the functions that deal with file descriptors, such as read and write, will work with a
socket descriptor.
 However, you can't use a socket descriptor with every function that accepts a file descriptor
argument.
e.g.
poll: work as expected
fchdir: fails with errno set to ENOTDIR
mmap: unspecified
Check manual when necessary …

84
Socket – Byte Order

85
Socket – Byte Order

 TCP/IP adopts big endian


 Byte ordering converting functions
­ htons(); // host to network short
­ htonl(); // host to network long
­ ntohs(); // network to host short
­ ntohl(); // network to host long
 Address format convert between network and presentation
­ inet_ntop(); // network to presentation
­ inet_pton(); // presentation to network

e.g. Presentation format: 135.252.34.201


Network binary format: 337253186 in network byte ordering
Always use these functions instead of using inet_aton() or inet_ntoa() which only works for IPv4.

86
Socket - In-Class Practice

mkdir socket/1
Copy the contents in attached file to
address.c address.c

gcc –o address address.c


./address

87
Socket – Address Format

 Generic socket address structure


struct sockaddr {
unsigned short sa_family; // address family, AF_xxx
char sa_data[14]; // 14 bytes of protocol address
};

sa_family: address family, AF_INET is used for internet sockets


sa_data: contains a destination address and port number for the socket

88
Socket – Address Format
 New generic socket address structure, to cover IPv6
/* Structure large enough to hold any socket address (with the historical
exception of AF_UNIX). We reserve 128 bytes. */
#if ULONG_MAX > 0xffffffff
# define __ss_aligntype __uint64_t
#else
# define __ss_aligntype __uint32_t
#endif
#define _SS_SIZE 128
#define _SS_PADSIZE (_SS_SIZE - (2 * sizeof (__ss_aligntype)))

struct sockaddr_storage
{
__SOCKADDR_COMMON (ss_); /* Address family, etc. */
__ss_aligntype __ss_align; /* Force desired alignment. */
char __ss_padding[_SS_PADSIZE];
};

89
Socket - Address Format

Why generic socket address?


A socket address structures is always passed by reference when passed as an argument to any
socket functions. But any socket function that takes one of these pointers as an argument
must deal with socket address structures from any of the supported protocol families.

A problem arises in how to declare the type of pointer that is passed. With ANSI C, the
solution is simple: void * is the generic pointer type. But, the socket functions predate ANSI C
and the solution chosen in 1982 was to define a generic socket address structure in the
<sys/socket.h> header.

90
Socket - Address Format Internet addresses – IPv6
1 struct sockaddr_in6 {
2 sa_family_t sin6_family; /* AF_INET6 */
Internet addresses – IPv4 3 in_port_t sin6_port; /* Transport layer port # */
1 struct sockaddr_in { 4 uint32_t sin6_flowinfo; /* IPv6 flow information */
2 sa_family_t sin_family; /* AF_INET */ 5 struct in6_addr sin6_addr; /* IPv6 address */
3 in_port_t sin_port; /* Port number. */ 6 uint32_t sin6_scope_id; /* IPv6 scope-id */
4 struct in_addr sin_addr; /* Internet address. */ 7 };
5
8 struct in6_addr {
6 /* Pad to size of `struct sockaddr'. */
9 union {
7 unsigned char sin_zero[sizeof (struct sockaddr) -
10 uint8_t u6_addr8[16];
8 sizeof (sa_family_t) -
9 sizeof (in_port_t) - 11 uint16_t u6_addr16[8];
10 sizeof (struct in_addr)]; 12 uint32_t u6_addr32[4];
11 }; 13 } in6_u;
12 typedef uint32_t in_addr_t; 14
13 struct in_addr { 15 #define s6_addr in6_u.u6_addr8
14 in_addr_t s_addr; /* IPv4 address */ 16 #define s6_addr16 in6_u.u6_addr16
15 };
17 #define s6_addr32 in6_u.u6_addr32
18 };

91
Socket – Programming Model
TCP Client-Server
Stream socket / phone analogy
To communicate one application—which we call “client”—
must call the other—the “server”—over the phone. Once
the connection is established, each peer can talk to the
other for the duration of the phone call.

both: socket() -> install a phone


server: bind() -> get a phone number
server: listen() -> turn on the phone, so that it can ring
client: connect() -> turns on the phone and call the
“server”, using its number
server: accept() -> pick up the phone when it rings (or
wait by the phone if it’s not ringing yet)

96
Socket – Programming Model
UPD Client-Server
Datagram socket / mail analogy
To communicate applications send (snail)
mail messages to their peer mailboxes.

• both: socket() -> installing a mailbox


• both*: bind() -> get a postal address
• peer A: sendto() -> send a letter to peer B, writing
to her postal address
• peer B: recvfrom() -> check mailbox to see if a
letter has arrived, waiting for it if it’s not the case
 each letter is marked with the sender address,
so peer B can write back to peer A even if her
address is not public

* Whether you need bind to receive messages depends on the domain


97
Socket – Server side API

 bind();
associate the IP address and port to socket descriptor
 listen();
waiting for clients to connect to me
 accept()
accepts connection from client

98
Socket – Server side API

#include <sys/socket.h>
int bind(int sockfd, struct sockaddr *my_addr, socklen_t addrlen);
Returns: 0 if OK, –1 on error

• The address we specify must be valid for the machine on which the process is running; we can't
specify an address belonging to some other machine.
• The address must match the format supported by the address family we used to create the socket.
• The port number in the address cannot be less than 1,024 unless the process has the appropriate
privilege (i.e., is the superuser).
• Usually, only one socket endpoint can be bound to a given address, although some protocols allow
duplicate bindings.

99
Socket – Server side API

#include <sys/socket.h>
int listen(int sockfd, int backlog);
Returns: 0 if OK, –1 on error

backlog is the number of connections allowed on the incoming queue.


Incoming connections are going to wait in this queue until you accept()
them and this is the limit on how many can queue up.

100
Socket – Server side API

#include <sys/socket.h>
int accept(int sockfd, struct sockaddr *restrict addr, socklen_t *restrict len);
Returns: file (socket) descriptor if OK, –1 on error

The accept function is used with connection-based socket types (SOCK_STREAM,


SOCK_SEQPACKET and SOCK_RDM). It extracts the first connection request on the queue of pending
connections, creates a new connnected socket with mostly the same properties as sockfd, and allocates a
new file descriptor for the socket, which is returned.

The newly created socket is no longer in the listening state. The original socket sockfd is unaffected by
this call and it remains available to receive additional connect requests.

101
Socket – Client side API

 bind(); optional to client


associate the IP address and port to socket descriptor
 connect();
initiate a connection on a socket

102
Socket – Client side API

#include <sys/socket.h>
int connect(int sockfd, const struct sockaddr *addr, socklen_t len);
Returns: 0 if OK, –1 on error

• Can also be used in DGRAM socket

103
Socket – Common API

stream/connected datagram socket APIs


 send(); send messages to peer
 recv(); receive messages from peer

unconnected datagram socket APIs


 sendto();
 recvfrom();

close connection
 close();
 shutdown();

104
Socket – In-Class Practice

mkdir socket/2
Copy the contents in attached file to
client.c and server.c
gcc –o server server.c client.c server.c
gcc –o client client.c

Run server in terminal 1


Run client in terminal 2

105
Socket – TCP state

closestudy1.pcap

106
Socket – I/O multiplexing

Question 1: How to read from 2 I/O file descriptors?


1. blocked I/O I/O
2. non-blocked I/O Process
I/O
Question 2: How does web server handle huge
http request?

107
Socket – I/O multiplexing

int select (int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct timeval
*timeout);

int pselect (int maxfdp1, fd_set *readset, fd_set *writeset, fd_set *exceptset, const struct
timespec *timeout, const sigset_t *sigmask); //(POSIX select)

int poll (struct pollfd *fdarray, unsigned long ndfs, int timeout);

epoll

108
Socket – Summary

What is socket

Programming model

Socket API

I/O Multiplexing

109

You might also like