Professional Documents
Culture Documents
Source: UNIX Network Programming: Chapter: 7 W. Richard Stevens, Bill Fenner, Andrew M. Rudoff
Overview
● Introduction
● getsockopt and setsockopt function
● socket state
● Generic socket option
● IPv4 socket option
● TCP socket option
● fcnl function
2
Introduction
There are various ways to get and set options that affect a socket:
● getsockopt , setsockopt function=>IPv4 and IPv6 multicasting options
● fcntl function =>nonblocking I/O, signal driven I/O
● ioctl function
3
getsockopt and setsockopt function
● The getsockopt and setsockopt system calls manipulate the options associated with a socket.
● The getsockopt() function retrieve the value for the option specified by
the option_name argument for the socket specified by the socket argument.
#include <sys/socket.h>
int getsockopt(int sockfd, , int level, int optname, void *optval, socklent_t *optlen);
int setsockopt(int sockfd, int level , int optname, const void *optval, socklent_t optlen);
5
Contd..
6
Contd..
7
Contd..
8
Socket Options Getting/Setting socket options…
● While manipulating the socket options both level and optname arguments must be correctly
specified.
● The “datatype” column shows the data type of what the optval pointer must point to for each
option. Use the notation to of two braces to indicate a structure.
● When calling getsockopt for these flags option,*optval is an integer. The value returned in
*optvalue is zero if the option is disabled, or non zero if option is enabled.
● Setsockopt requires a non zero *optvalue to turn the option on, and a zero to turn the option
off.
● If the Flag column does not contain a “.” then option is used to pass the values to the specific
data type between the user processes and the a system. 9
Types of options
Levels:
● Generic: SOL_SOCKET
● IP: IPPROTO_IP
● ICMPV6: IPPROTO_OCMPv6
● IPV6:IPPROTO_IPV6
● TCP:IPPROTO_TCP
10
Generic: SOL_SOCKET
optname get set Description Flag Datatype
permit sending of broadcast
SO_BROADCAST int
datagrams
SO_DEBUG enable debug tracing int
11
Generic: SOL_SOCKET
● level: SOL_SOCKET
optname get set Description Flag Datatype
13
Generic socket option
● SO_BROADCAST =>enable or disable the ability of the process to send broadcast
message.(only datagram socket : Ethernet, token ring..)
You cannot broadcast – point to point link and connection based transport protocol.
● SO_DEBUG => When enable for TCP socket, kernel keep track of detailed information
about all packets sent or received by TCP(only supported by TCP).
● SO_DONTROUTE=>outgoing packets are to bypass the normal routing mechanisms of
the underlying protocol.
The equivalent of this option can also be applied to individual datagrams using the MSG_DONTROUTE
flag with the send , sendto, or sendmsg function.
"Does not route: sends directly to interface. E.G: routing daemons
● SO_ERROR=>when error occurs on a socket, the protocol module in a Berkeley-derived
kernel sets a variable named so_error for that socket. Process can obtain the value of
so_error by fetching the SO_ERROR socket option
14
Contd..
● SO_KEEPALIVE=>When the keep alive option is set for a TCP socket and no data has
been exchanged across the socket in either direction for 2 hours, and then TCP
automatically sends a keepalive probe to the peer.
Peer response
ACK(everything OK)
RST(peer crashed and rebooted):ECONNRESET
no response:ETIMEOUT =>socket closed
15
SO_LINGER
● SO_LINGER =>specify how the close function operates for a connection-oriented
protocol(default:close returns immediately)
struct linger{
int l_onoff; /* 0 = off, nonzero = on */
int l_linger; /*linger time : second*/
};
● l_onoff = 0 : turn off , l_linger is ignored
● l_onoff = nonzero and l_linger is 0:TCP abort the connection, discard any remaining data
in send buffer.
● l_onoff = nonzero and l_linger is nonzero : process wait until remained data sending, or
until linger time expired. If socket has been set nonblocking it will not wait for the close to
complete, even if linger time is nonzero.
16
SO_LINGER
client server
write data
client server
write data
Close with SO_LINGER socket option set and l_linger a positive value
18
SO_LINGER
client server
write data
● Close Returns Immediately: When the `close` function is used without setting the
`SO_LINGER` socket option, it typically returns immediately without waiting.
● This means it closes the socket and doesn't wait for any acknowledgment or data exchange
confirmation from the peer. It's a fast way to close the socket.
● Close with SO_LINGER Lingers Until ACK: If you set the `SO_LINGER` socket option and
specify a positive linger time, a `close` operation will wait until the acknowledgment (ACK) of
your FIN (Finish) signal is received from the peer. 20
● This means it ensures that your FIN has been acknowledged by the peer's TCP layer before
returning. This can be useful to confirm that your data reached the other end successfully.
● Shutdown Followed by a Read Waits for Peer's FIN: When you use `shutdown` with
`SHUT_WR` (indicating you've finished writing) followed by a `read` operation, it
effectively waits until you receive the peer's FIN signal.
● The `read` operation will block until the peer's application decides to close its end of the
connection by sending a FIN.
● This approach allows to confirm that not only has your data been acknowledged but also
that the peer's application has finished reading and is closing its side of the connection.
21
Contd..
22
Contd..
● An way to know that the peer application has read the data
use an application-level ack or application ACK
client
char ack;
Write(sockfd, data, nbytes); // data from client to server
n=Read(sockfd, &ack, 1); // wait for application-level ack
server
nbytes=Read(sockfd, buff, sizeof(buff)); //data from client
//server verifies it received the correct amount of data from the client
Write(sockfd, “”, 1); //server’s ACK back to client
23
Buffer
● Every socket has a send buffer and receive buffer.
● The receive buffer are used by TCP,UDP, and SCTP to hold received data until it is read by
the application.
● TCP?
● UDP?
● Two socket options let us change the default sizes. The default values differ widely between
implementation.
24
SO_RCVBUF , SO_SNDBUF
● let us change the default send-buffer, receive-buffer size.
Default TCP send and receive buffer size :
4096bytes
8192-61440 bytes(newer system)
Default UDP buffer size : 9,000bytes(send buff if the host supports NFS), 40,000 bytes(receive buff)
● SO_RCVBUF option must be setting before connection established.
● TCP socket buffer size should be at least three times the MSSs(Ethernet with mss 1,460)
25
SO_RCVLOWAT , SO_SNDLOWAT
● Every socket has a receive low-water mark and send low-water mark.(used by select
function)
● Receive low-water mark:
The amount of data that must be in the socket receive buffer for select to return “readable”.
Default receive low-water mark : 1 for TCP and UDP
● Send low-water mark:
The amount of available space that must exist in the socket send buffer for select to return “writable”
Default send low-water mark : 2048 for TCP
UDP send buffer never change because dose not keep a copy of send datagram.
26
SO_RCVTIMEO, SO_SNDTIMEO
● These two socket option allow us to place a timeout on socket receives and sends.
● We disable timeout by setting its value to 0 seconds and 0 micro seconds.
● The receive timeout affects the five input functions: read , readv, recv,recvfrom and
recvmsg
● The send time out affect the five output functions: write, writev, send , sendto and
sendmesg
27
SO_REUSEADDR, SO_REUSEPORT
● Allow a listening server to start and bind its well known port even if previously established
connection exist that use this port as their local port.
● Allow multiple instance of the same server to be started on the same port, as long as each
instance binds a different local IP address.
● Allow a single process to bind the same port to multiple sockets, as long as each bind
specifies a different local IP address.
● Allow completely duplicate bindings : multicasting
29
SO_TYPE
● Return the socket type.
● Returned value is such as SOCK_STREAM, SOCK_DGRAM...
30
SO_USELOOPBACK
● This option applies only to sockets in the routing domain(AF_ROUTE).
● The socket receives a copy of everything sent on the socket.
31
IPv4 socket option
● Level => IPPROTO_IP
● IP_HDRINCL => If this option is set for a raw IP socket, we must build our IP header for
all the datagrams that we send on the raw socket.
32
IPv4 socket option
● IP_OPTIONS=>allows us to set IP option in IPv4 header.
● IP_RECVDSTADDR=>This socket option causes the destination IP address of a received
UDP datagram to be returned as ancillary data by recvmsg.
33
recvmsg()
● The recvmsg() API is similar to other socket APIs, such as recv() and read(), that allow an
application to receive data, but also provides the capability of receiving ancillary data.
● Ancillary data allows the TCP/IP protocol stack to return additional option data to the
application along with the normal data from the IP network.
34
IP_RECVIF
● Cause the index of the interface on which a UDP datagram is received to be returned as
ancillary data by recvmsg.
35
IP_TOS
● lets us set the type-of-service(TOS) field in IP header for a TCP or UDP socket.
● If we call getsockopt for this option, the current value that would be placed into the
TOS(type of service) field in the IP header is returned.
36
IP_TTL
● User can set and fetch the default TTL that the system will use for unicast packets sent on a
given socket.
37
TCP socket option
● There are five socket option for TCP, but three are new with Posix.1g and not widely
supported.
● Specify the level as IPPROTO_TCP.
38
IPPROTO_TCP
● level: IPPROTO_TCP
optname get set Description Flag Datatype
interpretation of urgent
TCP_STDURG int
pointer
39
TCP_KEEPALIVE
● This is new with Posix.1g
● It specifies the idle time in second for the connection before TCP starts sending keepalive
probe.
● Default 2hours
● This option is effective only when the SO_KEEPALIVE socket option enabled.
40
TCP_MAXRT
● This is new with Posix.1g.
● It specifies the amount of time in seconds before a connection is broken once TCP starts
retransmitting data.
0 : use default
-1:retransmit forever
positive value: rounded up to next transmission time
41
TCP_MAXSEG
● This allows us to fetch or set the maximum segment size(MSS) for TCP connection.
42
TCP_NODELAY
● This option disables TCP’s Nagle algorithm.
(default this algorithm enabled)
● purpose of the Nagle algorithm.
==>prevent a connection from having multiple small packets outstanding at any time.
● Small packet => any packet smaller than MSS.
43
Nagle algorithm
● Default enabled.
● Reduce the number of small packet on the WAN.
● If given connection has outstanding data , then no small packet data will be sent on
connection until the existing data is acknowledged.
44
Nagle Algorithm Disabled
h 0
e 250
l 500
l 750
o 1000
! 1250
1500
1500
1750
2000
45
NAGLE ALGORITHM ENABLED
fcntl function
● File control
● The fcntl function can change the properties of a file that is already open.
● This function perform various descriptor control operation.
● Provide the following features
Nonblocking I/O
Signal-driven I/O
Set socket owner to receive SIGIO signal.
47
Contd..
48
Fcntl: Nonblocking I/O
● A blocking function is capable of delaying execution of other tasks, especially those that are
independent
In case of a server, other requests may get blocked
In case of a worker consuming tasks from a queue other independent tasks may get delayed.
The fcntl (file control) API is used to retrieve or set the flags that are associated with a socket or stream file.
50
Nonblocking I/O using fcntl
int flags;
/* set a socket as nonblocking */
if((flags = fcntl(fd, f_GETFL, 0)) < 0)
err_sys(“F_GETFL error”);
flags |= O_NONBLOCK;
if(fcntl(fd, F_SETFL, flags) < 0)
err_sys(“F_ SETFL error”);
The only correct way to set one of the file status flags is to fetch the current flags, logically
OR in the new flag, and then set the flag.
each descriptor has a set of file flags that fetched with
the F_GETFL command
and set with F_SETFL command.
52
Misuse of fcntl
/* wrong way to set socket nonblocking */
if(fcntl(fd, F_SETFL,O_NONBLOCK) < 0)
err_sys(“F_ SETFL error”);
53
Turn off the nonblocking flag
Flags &= ~O_NONBLOCK;
if(fcntl(fd, F_SETFL, flags) < 0)
err_sys(“F_SETFL error”);
54
F_SETOWN
● The integer arg value can be either positive(process ID) or negative (group ID)value to
receive the signal.
● F_GETOWN => retrurn the socket owner by fcntl function, either process ID or process
group ID.
56
57
58
Name and Address Conversions
Elementary Name and Address Conversions
● Domain Name System(DNS)
● gethostbyname Function
● gethostbyname2 Function and IPv6 support
● gethostbyaddr Function
● uname and gethostname Functions
● getservbyname and getservbyport Functions
● Other networking information
59
Introduction DNS
1. What is the IP
address of
udel.edu ?
It is 128.175.13.92
1. What is the
host name of
128.175.13.74
It is strauss.udel.edu
60
DNS Components
● Resolvers:
Client programs that extract information from Name Servers.
● Name Servers:
Server programs which hold information about the structure and the names.
61
Resolvers
A Resolver maps a name to an address and vice versa.
Query
Response
62
Iterative Resolution
a.root
server
a a.gtld-
3 server
. 5
n udel ns1.goo
s server gle.com
t 7
iterative response (referral)
3l
“I don't know. Try a.root-servers.net.”
d iterative response (referral)
. 9
c 1 iterative response (referral)
“I don't know. Try a.gtld-servers.net.”
o iterative response (referral)
“I don't know. Try a3.nstld.com.”
m 2 4 “I don't know. Try ns1.google.com.”
6 iterative response
8 “The IP address of www.google.com
client 10 is 216.239.37.99.”
iterative request
“What is the IP address of
www.google.com?” 63
Recursive Resolution
root
server
edu 3 com
server server
7 4
udel 2 8 google
server server
6 5
9
1
10 recursive request
“What is the IP address of
www.google.com?”
client recursive response
“The IP address of www.google.com is
216.239.37.99.”
64
Domain Name System
● Entries in DNS: resource records (RRs) for a host
A record: maps a hostname to a 32-bit IPv4 addr
AAAA (quad A) record: maps to a 128-bit IPv6 addr
PTR record: maps IP addr to hostname
MX record: specifies a mail exchanger of the host
CNAME record: assigns canonical name for common services
65
MX Record
● Three servers can receive emails for the kerio.com email domain. The lowest number means
the highest server preference.
● In this example the primary MX server (the server with highest preference) is
mx1.kerio.com
66
Contd..
● If you have to change the IP-address you only have to change it in one place, i.e. in the A
record.
67
DNS: Application, Resolver, Name Servers
application
application
code
UDP
call return
request
resolver local other
code name name
UDP server servers
reply
resolver
configuration
files
• resolver functions: gethostbyname/gethostbyaddr
• name server: BIND (Berkeley Internet Name Domain)
• static hosts files (DNS alternatives): /etc/hosts
• resolver configuration file (specifies name server IPs): /etc/resolv.conf
68
Contd..
● The gethostbyname() function returns a structure of type hostent for the given host name.
● gethostbyname Function performs a DNS query for an A record or a AAAA record
#include <netdb.h>
struct hostent *gethostbyname (const char *hostname);
returns: nonnull pointer if OK, NULL on error with h_errno set
struct hostent {
char *h_name; /* official (canonical) name of host */
char **h_aliases; /* ptr to array of ptrs to alias names */
/ *A list of aliases that can be accessed with arrays*/
int h_addrtype; /* host addr type: AF_INET or AF_INET6 */
int h_length; /* length of address: 4 or 16 */
char **h_addr_list; /* ptr to array of ptrs with IPv4/IPv6 addrs */
};
#define h_addr h_addr_list[0] /* first address in list */
69
hostent Structure Returned by gethostbyname
hostent { }
h_name canonical hostname \0
h_aliases alias #1 \0
h_addrtype AF_INET/6
h_length 4/16 alias #2 \0
h_addr_list NULL
in/6_addr { }
IP addr #1
in/6_addr { }
IP addr #2
NULL in/6_addr { }
IP addr #3
70
RES_USE_INET6 Resolver Option
● Per-application: call res_init
#include <resolv.h>
res_init ( );
_res.options |= RES_USE_INET6
● Per-user: set environment variable RES_OPTIONS
export RES_OPTIONS=inet6
● Per-system: update resolver configuration file
options inet6 (in /etc/resolv.conf)
● For a host without a AAAA record, IPv4-mapped IPv6 addresses are returned.
gethostbyname2 Function and IPv6 Support
#include <netdb.h>
struct hostent *gethostbyname2 (const char *hostname, int family);
returns: nonnull pointer if oK, NULL on error with h_errno set
RES_USE_INET6 option
off on
gethostbyname A record AAAA record
(host) or A record returning
IPv4-mapped IPv6 addr
gethostbyname2 A record A record returning
(host, AF_INET) IPv4-mapped IPv6 addr
gethostbyname2 AAAA record AAAA record
(host, AF_INET6)
Note: Resolver’s RES_USE_INET6 option along with which function is called (gethostbyname or
gethostbyname2) dictates the type of records that are searched for in the DNS (A or AAAA) and what type of
addresses are returned (IPv4, IPv6, or IPv4-mapped IPv6).
72
gethostbyaddr Function binary IP address to hostent structure
#include <netdb.h>
struct hostent *gethostbyaddr (const char *addr, size_t len, int family);
returns: nonnull pointer if OK, NULL on error with h_errno set
73
getservbyname and getservbyport Functions
● The getservbyname() function searches the database and finds an entry which matches the
specified service name and the specified protocol, opening a connection to the database if
necessary.
● The getservbyport() function searches the database and finds an entry which matches the
specified port number and the specified protocol, opening a connection to the database if
necessary.
76
getservbyname and getservbyport Functions
#include <netdb.h>
struct servent *getservbyname (const char *servname, const char *protoname);
returns: nonnull pointer if OK, NULL on error
struct servent *getservbyport (int port, const char *protoname);
returns: nonnull pointer if OK, NULL on error
The getservbyport socket function returns the Internet service name based on a specified Internet service port
number and protocol.
struct servent {
char *s_name; /* official service name */
char **s_aliases; /*alias list */
int s_port; /* port number, network-byte order */
char *s_proto; /* protocol, TCP or UDP, to use */
}
Mapping from name to port number: in /etc/services
Services that support multiple protocols often use the same TCP and UDP port number. But it’s not
always true:
shell 514/tcp
syslog 514/udp 77
Contd..
78
Daytime Client using gethostbyname and getservbyname
● Call gethostbyname and getservbyname
● Try each server address
● Call connect
● Check for failure
● Read server’s reply
79
Other Networking Info
● Four types of info:
hosts (gethostbyname, gethostbyaddr)
through DNS or /etc/hosts, hostent structure
networks (getnetbyname, getnetbyaddr)
through DNS or /etc/networks, netent structure
protocols (getprotobyname, getprotobynumber)
through /etc/protocols, protoent structure
services (getservbyname, getservbyport)
through /etc/servies, servent structure
80
CHAPTER 12
IPV4 AND IPV6
INTEROPERABILITY
Contents
● Introduction
● IPv4 Client, IPv6 Server
● IPv6 Client, IPv4 Server
● IPv6 Address Testing Macros
● IPV6_ADDRFORM Socket Option
● Source Code Portability
IPV6
● IPv6 can theoretically hold 2128 IP addresses. As you’re probably aware of, that’s a huge
number:
● 2128 = 340,282,366,920,938,463,463,374,607,431,768,211,456
● 340 undecillion, 282 decillion, 366 nonillion, 920 octillion, 938 septillion, 463 sextillion,
463 quintillion, 374 quadrillion, 607 trillion, 431 billion, 768 million, 211 thousand and 456
Introduction
● Server and client combination
IPv4 <=> IPv4(most server and client)
IPv4 <=> IPv6
IPv6 <=> IPv4
IPv6 <=> IPv6
● How IPv4 application and IPv6 application can communicate with each other.
● Host are running dual stacks, both an IPv4 protocol stack and IPv6 protocol stack
Dual stack
A station should simultaneously run both IPv4 & IPv6 protocols until all Internet
uses IPv6.
To determine which version to use, the source host queries the DNS and send
whichever version of IP packet the DNS returns.
85
IPv4 Client , IPv6 Server
● IPv6 dual stack server can handle both IPv4 and IPv6 clients.
● This is done using IPv4-mapped IPv6 address
● server create an IPv6 listening socket that is bound to the IPv6 wildcard address
IPv4 Mapped IPv6 Address
IPv6 IPv4 IPv6 listening socket,
IPv6
client client server
bound to 0::0, port 9999
TCP UDP
IPv4 mapped
Address IPv4 IPv6
returned by
accept or
recvfrom
IPv4 IPv6
92
IPv6 client, IPv4 server
● IPv4 server start on an IPv4 only host and create an IPv4 listening socket
● IPv6 client start, call gethostbyname. IPv4 mapped IPv6 address is returned.
● Using IPv4 datagram
Contd..
First consider an IPv6 TCP client running on a dual-stack host.
● An IPv4 server starts on an IPv4-only host and creates an IPv4 listening socket.
● The IPv6 client starts and calls getaddrinfo asking for only IPv6 addresses (it requests the
AF_INET6 address family and sets the AI_V4MAPPED flag in its hints structure). Since
the IPv4-only server host has only A records, an IPv4-mapped IPv6 address is returned to
the client.
● The IPv6 client calls connect with the IPv4-mapped IPv6 address in the IPv6 socket address
structure. The kernel detects the mapped address and automatically sends an IPv4 SYN to
the server.
● The server responds with an IPv4 SYN/ACK, and the connection is established using IPv4
datagrams.
94
AF_INET AF_INET
IPv4 SOCK_STREAM SOCK_DGRAM
sockets sockaddr_in sockaddr_in
AF_INET6 AF_INET6
IPv6 SOCK_STREAM SOCK_DGRAM
sockets sockaddr_in6 sockaddr_in6
TCP UDP
IPv4 IPv6
97
IPv6 Address Testing Macros
There are small class of IPv6 application that must know whether they are talking to an IPv4
peer.
● These application need to know if the peer’s address is an IPv4-mapped IPv6 address.
● Twelve macro defined
● Eg: int IN6_IS_ADDR_LOOPBACK(const struct in6_addr * aptr ) ;
● Eg: int IN6_IS_ADDR_V4MAPPED(const struct in6_addr * aptr ) ;
Contd..
99
IPV6_ADDRFORM Socket Option
● Change the IPV6 address into a different address family
● Can change a socket from one type to another, following restriction.
An IPv4 socket can always be changed to an IPv6. Any IPv4 address already associated with the socket
are converted to IPv4- mapped IPv6 address.
An IPv6 socket can changed to an IPv4 socket only if any address already associated with the socket are
IPv4-mapped IPv6 address.
Converting an IPv4 to IPv6
Int af;
socklen_t clilen;
struct sockaddr_int6 cli; /* IPv6 struct */
struct hostent *ptr;
af = AF_INT6;
Setsockopt(STDIN_FILENO, IPPROTO_IPV6, IPV6_ADDRFORM, &af, sizeof(af));
clilen = sizeof(cli);
Getpeername(0, &cli, &clilen);
ptr = gethostbyaddr(&cli.sin6_addr, 16, AF_INET);
Contd..
● setsockopt => change the Address format of socket from IPv4 to IPv6.
Return value is AF_INET or AF_INET6
● getpeername =>return an IPv4-mapped IPv6 address