Implementing AF-independent application

Jun-ichiro itojun Itoh, KAME Project

$Id: index.html,v 1.3 2003/05/16 15:42:36 itojun Exp $

Introduction

By deployment of Internet Protocol Version 6 (IPv6), the application programmers has to cope with socket connection with multiple address families, i.e. AF_INET and AF_INET6. The document describes how a programmer can handle those multiple address families at ease.

This document assumes that you are familiar with AF_INET socket programming. You may want to refer RFC2553 and RFC2292.

If you find any mistakes, please let the author know. The document will be updated right away. Thank you!

struct sockaddr_storage

RFC2553 proposes struct sockaddr_storage. This is a placeholder for all sockaddr-variant structures. This is implemented like follows:

	struct sockaddr_storage {
		u_char ss_len;
		u_char ss_family;
		u_char padding[128 - 2];
	};

You should use this structure to hold any of sockaddr-variant structures.

union sockunion

Alternatively, you may want to implement sockunion.h, with the following content:

	union sockunion {
		struct sockinet {
			u_char si_len;
			u_char si_family;
		} su_si;
		struct sockaddr_in  su_sin;
		struct sockaddr_in6 su_sin6;
	};
	#define su_len        su_si.si_len
	#define su_family     su_si.si_family

NOTE: For better portability, struct sockaddr_storage should be used. union sockunion is okay, but is not really portable enough due to structure alignment twists.

Rules of thumb

1. avoid struct in_addr and struct in6_addr.

Since we are trying to implement AF-independent programs, all of the memory structures that handle network address have to be AF-independent. In that sense, we should avoid struct in_addr and in6_addr, since they have no room to hold AF information. Suppose you pass an network address to some function, foo(). If you use struct in_addr or struct in6_addr, you will end up with extra parameter to indicate address family, as below:

	struct in_addr in4addr;
	struct in6_addr in6addr;
	/* IPv4 case */
	foo(&in4addr, AF_INET);
	/* IPv6 case */
	foo(&in6addr, AF_INET6);

This way the network address and address family is will not live together, and leads to bunch of if/switch statement and mistakes in programming. Why don't we just use struct sockaddr_storage like below?

	struct sockaddr_storage ss;
	int sslen;
	/* AF independent! - use sockaddr when passing a pointer */
	foo((struct sockaddr *)&ss);
	/* if you need portability to Linux/Solaris, you need to pass length explicitly */
	foo((struct sockaddr *)&ss, sslen);

Also, by near-future update to IPv6 basic socket API (RFC2553), sockaddr_in6 will include interface index for link-local scoped address, as well as site index for site-local scoped address. Therefore, if your application needs to handle scoped addresses, avoiding in6_addr (and using sockaddr_in6) is a critical requirement.

2. use getaddrinfo() and getnameinfo() everywhere.

getaddrinfo() and getnameinfo() are new address independent variant that hides every gory detail in name-to-address translation, or vice versa. It implements functionalities for the following functions:

	gethostbyname()
	gethostbyaddr()
	inet_ntop()
	inet_pton()
	getservbyname()
	getservbyport()

These can perform DNS/hostname table lookup, though it can be turned off if you want. getaddrinfo() can return multiple addresses, if a host have multiple address with multiple address families, as below:

	localhost.	IN A	127.0.0.1
			IN AAAA	::1

It can query hostname as well as service name/port at once. Therefore, we can bury all the gory details about initializing sockaddr structure into library function.

Anyway. inet_aton() can be written as follows:

	int error;
	char *name;
	struct sockaddr_storage ss;
	struct sockaddr *sa;
	struct addrinfo hints;
	struct addrinfo *res;

	/*
	 * inet_aton() case.
	 * This cannot handle IPv6 addresses.  Also, it cannot return
	 * multiple addresses.
	 */
	if (!inet_aton(name, &((struct sockaddr_in *)&ss)->sin_addr))
		perror("inet_aton");

	/* getaddrinfo() case.  It can handle multiple addresses. */
	memset(&hints, 0, sizeof(hints));
	/* set-up hints structure */
	hints.ai_family = PF_UNSPEC;
	error = getaddrinfo(name, NULL, &hints, &res);
	if (error)
		perror(gai_strerror(error));
	else {
		while (res) {
			sa = res->ai_addr;
			salen = res->ai_addrlen;
			/* do what you want */
			res = res->ai_next;
		}
	}

inet_ntoa() can be written as follows:

	int error;
	char *name;
	char namebuf[BUFSIZ];
	struct sockaddr_storage ss;

	/*
	 * inet_ntoa() case. This cannot handle IPv6 addresses.
	 * No way to pass the error status.
	 */
	name = inet_ntoa(((struct sockaddr_in *)&ss)->sin_addr);

	/* getnameinfo() case. NI_NUMERICHOST avoids DNS lookup. */
	error = getnameinfo((struct sockaddr *)&ss. ss.ss_len,
		namebuf, sizeof(namebuf), NULL, 0, NI_NUMERICHOST);
	if (error)
		perror("getnameinfo");
	name = namebuf;

gethostbyname() can be written as follows:

	struct sockaddr *sa;
	struct hostent *hp;
	char *name;
	int af;
	struct addrinfo hints;
	struct addrinfo *res;

	/* gethostbyname() case.  It is just for single AF denoted by "af". */
	hp = gethostbyname2(name, af);

	/*
	 * getaddrinfo() case.  You can get IPv6 address and IPv4 address
	 * at the same time.
	 */
	memset(&hints, 0, sizeof(hints));
	/* set-up hints structure */
	hints.ai_family = PF_UNSPEC;
	error = getaddrinfo(name, NULL, &hints, &res);
	if (error)
		perror(gai_strerror(error));
	else {
		while (res) {
			sa = res->ai_addr;
			salen = res->ai_addrlen;
			/* do what you want */
			res = res->ai_next;
		}
	}

Now, gethostbyaddr() can be written as follows:

	struct sockaddr_storage ss;
	struct sockaddr_in *sin;
	struct sockaddr_in6 *sin6;
	struct hostent *hp;
	char *name;

	/* gethostbyaddr() case. */
	switch (ss.ss_family) {
	case AF_INET:
		sin = (struct sockaddr_in *)&ss;
		hp = gethostbyaddr(&sin->sin_addr, sizeof(sin->sin_addr),
			ss.ss_family);
		break;
	case AF_INET6:
		sin6 = (struct sockaddr_in6 *)&ss;
		hp = gethostbyaddr(&sin6->sin6_addr, sizeof(sin6->sin6_addr),
			ss.ss_family);
		break;
	}
	name = hp->h_name;

	/* getnameinfo() case. NI_NUMERICHOST avoids DNS lookup. */
	error = getnameinfo((struct sockadddr *)&ss, ss.ss_len,
		namebuf, sizeof(namebuf), NULL, 0, 0);
	if (error)
		perror("getnameinfo");
	name = namebuf;

3. do not hardcode knowledge about particular AF.

Since we are trying to be AF-independent, it is not preferred to hardcode AF-dependent knowledge into the program. The construct like below should be avoided:

	/* BAD EXAMPLE */
	switch (sa->sa_family) {
	case AF_INET:
		salen = sizeof(struct sockaddr_in);
		break;
	}

Instead, use res->ai_addrlen returned by getaddrinfo(3).

Modifying servers called from inetd

To port your server that is called via inetd (for example, pop server), you must rewrite the following portions:

all struct sockaddr_in has to be changed into struct sockaddr_storage. Be sure to update cast operators and sizeof operations as well. Pointers should be changed into struct sockaddr *.
rewrite struct/union field names.
inet_aton() and inet_ntoa() has to be changed to getaddrinfo() and getnameinfo().
gethostbyname() and gethostbyaddr() has to be changed to getnameinfo() and getaddrinfo().

The simplest server has no sockaddr-related code inside, it will just use standard input. However, most servers has logging functionality, which requires the address for the peer. It will be obtained by using getpeername(). Therefore, you must rewrite address-to-name translation part for peer address.

Use great care on sizeof operation to sockaddr. This kind of code is very popular:

	int slen;
	struct sockaddr_in sin;

	slen = sizeof(struct sockaddr_in);
	getsockname(s, (struct sockaddr *)&sin, &slen);

If we simply modify the type of sin, we're doomed. You'll need to change sizeof operation as well, like:

	int slen;
	struct sockaddr_storeage ss;

	slen = sizeof(ss);
	getsockname(s, (struct sockaddr *)&ss, &slen);

Modifying daemons

IPv4 daemons usually bind to IN_ADDR_ANY, that is, 0.0.0.0. To obtain this kind of address in a AF-independent manner, you can use AI_PASSIVE flag for getaddrinfo(). Mutliprotocol daemon may want to bind() to all the addresses returned from getaddrinfo().

	struct addrinfo hints;
	struct addrinfo *res;
	char *myservice;

	memset(&hints, 0, sizeof(hints));
	/* set-up hints structure */
	hints.ai_family = PF_UNSPEC;
	hints.ai_flags = AI_PASSIVE;
	hints.ai_socktype = SOCK_STREAM;
	error = getaddrinfo(NULL, myservice, &hints, &res);
	if (error)
		perror(gai_strerror(error));
	else {
		/*
		 * "res" has a chain of addrinfo structure filled with
		 * 0.0.0.0 (for IPv4), 0:0:0:0:0:0:0:0 (for IPv6) and alike,
		 * with port filled for "myservice".
		 */
		while (res) {
			/* bind() and listen() to res->ai_addr */
		}
	}

Modifying clients

Client side program may want to connect to all resolved addresses, as telnet program does (telnet tries to connect to all resolved addresses, sequentially until connection is established).

	struct addrinfo hints;
	struct addrinfo *res;
	char *server;
	char *hisservice;

	memset(&hints, 0, sizeof(hints));
	/* set-up hints structure */
	hints.ai_family = PF_UNSPEC;
	hints.ai_socktype = SOCK_STREAM;
	error = getaddrinfo(server, hisservice, &hints, &res);
	if (error)
		perror(gai_strerror(error));
	else {
		while (res) {
			/* try to connect() to res->ai_addr */
			if (success)
				break;
		}
	}

	/* whatever you would like to perform */

What about inet_ntop() and inet_pton()?

In previous sections, we talked almost nothing about the usage of inet_ntop() and inet_pton(). This is because they are not very AF independent. Since inet_ntoa() and inet_aton() are just for IPv4 addresses, inet_ntop() and inet_pton() are described as replacement, in RFC2553. They are defined as follows:

	int inet_pton(int af, const char *src, void *dst);

	const char *inet_ntop(int af, const void *src,
		char *dst, size_t size);

inet_pton() and inet_ntop() assumes in_addr or in6_addr for handling addresses, that are, as I wrote, something we would like to avoid. If you got a some sockaddr-ish structure, you can get printable form of address by the following statements.

	struct sockaddr_storage ss;
	char buf[BUFLEN];
	switch (ss.ss_family) {
	case AF_INET:
		inet_ntop(ss.ss_family,
			&((struct sockaddr_in *)&ss)->sin_addr, buf, BUFLEN);
		break;
	case AF_INET6:
		inet_ntop(ss.ss_family,
			&((struct sockaddr_in6 *)&ss)->sin6_addr, buf, BUFLEN);
		break;
	}

This requires extra conditional statement, since inet_ntop() is not written for sockaddr structures. Worse, for converting printable form into address, you need to konw the address family, prior to the call to inet_pton(). You can perform error-and-retly loop but it is not a very clean way of dealing with it.

	struct sockaddr_storage ss;
	struct sockaddr_in *sin;
	struct sockaddr_in6 *sin6;
	char *printable;

	switch (ss.ss_family) {
	case AF_INET:
		sin = (struct sockaddr_in *)&ss;
		inet_pton(af, printable, &sin->sin_addr));
		break;
	case AF_INET6:
		inet_pton(af, printable, &sin6->sin6_addr);
		break;
	}

Answer: You should use getnameinfo whereever possible.

update history

November 1998: document struct sockaddr_storage.
December 1998: replace union sockunion into struct sockaddr_storage, as struct sockaddr_storage is recommended.
July 1999: comment out most of union sockunion part. Thank Mr. Adam M. Costello for comment.
November 2000: update draft #.