There is a mounting excitement for the upcoming OpenBSD 3.6 release, as it is the first release that supports multiprocessor systems. To celebrate the event, Federico Biancuzzi interviewed several core developers by email to discuss new features, tools, and future plans.
FB: This is the first OpenBSD release that includes SMP support. Who worked on it?
Niklas Hallqvist: First off, I must tell you that OpenBSD MP support stems from the NetBSD work mainly done by Bill Sommerfeld. Of course, lots of other developers in their camp have added to it; these days I think Frank van der Linden is maintaining the i386 parts. However, stuff like this is impossible to just move over verbatim, and I have spent weeks of work making it fit into our tree. It deserves to be mentioned I got funded by GeNUA for big parts of this work; we thank them for that very kind support.
A handful of other OpenBSD people have done their share as well. Artur Grabowski did the amd64 work; Dale Rahn, Ted Unangst, Hakan Olsson, Andreas Gunnarsson have provided debugging as well as coding help. I am sure a few more, but I don't remember offhand. Lots have helped test, too many to mention. [Dozens] of people donated equipment to make this happen.
FB: What techniques have you chosen?
Niklas Hallqvist: Biglock. That means that while a process is executing inside the kernel, no other process gets access to it; instead they will spin until the first process leaves it. This is a bit crude, since some applications won't benefit at all from this kind of multiprocessing. However, most do, especially if things are set up with multiprocessing in mind.
The reason we have chosen biglock is that it is really simple, and simplicity is a prime objective if security is considered. Fine-grained locking is much harder to do completely right, and things like starvation and deadlocks are much more common. We are just too conservative to risk this.
FB: How does this compare with FreeBSD 4, FreeBSD 5, and DragonFlyBSD?
Niklas Hallqvist: Actually I don't know. I'd expect we'd do worse in anything that is interrupt-intensive. We probably do worse even for the common case where several runnable processes exist simultaneously as well. But ... we do not aim to compete at the edge here. We want to make scalability happen without disrupting our security and robustness track record. We just have other priorities. We will kick SMP out of the tree if it proves to not be trustworthy enough. As it was before we had SMP support, some people had to chose other OSes because we just did not run on big machines enough, so they felt they had to make a compromise on the security to get work done. Hopefully that is not the case anymore.
FB: At the moment the code works on i386 and amd64 platforms. Which platforms do you plan to support in the future?
Niklas Hallqvist: Loose plans, not any guarantees made: alpha, ppc, sparc(64), and maybe mvme88k :-) Maybe the new mips port? Who knows. This is work that probably must be done just because it is fun. There's hardly a large demand with funders around the corner. And today, unfortunately, there's not much time left for fun projects anymore. I was very lucky to get paid to do part of this fun work; otherwise it might not have happened.
FB: The 3.6 kernel includes new timecounter code. Why? Maybe for SMP compatibility?
Artur Grabowski: It's not enabled on any architecture yet, so it's just work in progress. It's definitely not relevant for 3.6. It's relatively important for SMP, since we figured out that it's very hard to access time structures in an MP-safe way. Instead of reinventing the wheel we just took a wheel that someone else has already implemented. It also lets us have precise timekeeping with a central infrastructure, without doing too much hairy work on each architecture. Now, most of the timekeeping code can be shared between architecture.
FB: How does it work? What type of advantages does it bring over the old code?
Artur Grabowski: Check this.
FB: Is there any special hook for OpenNTPD interaction?
Artur Grabowski: Not yet, but we've been talking about it. It won't be a special hook for OpenNTPD, but rather a bunch of hooks for any ntp implementation.
FB: You've introduced a new little tool, OpenNTPD. What is it?
Henning Brauer: Well, as the name says, it's a Network Time Protocol daemon.
FB: Why a new Network Time Protocol daemon?
Henning Brauer: There has been basically only one ntpd around--which has a slight problem with the license--that is huge, runs as root, and doesn't exactly have a promising security record. And it is complex and thus complicated to configure. As a result, the majority of machines out in the wild run without synchronized clocks.
So Theo mentioned that we perhaps should do our own; I liked the idea and sat down and did it. As a result, we have OpenNTPD, which is free, written with security in mind, and easy to use. It of course is privilege separated. (It is a very good example for privilege separation, even: just two message types, one needing privsep because the adjtime syscall requires root privileges, the other--name resolution--requiring access to /etc, thus outside the chroot.) OpenNTPD is really easy to configure, there are just three config file statements. The default config file is perfectly fine for the majority of uses, so for most people, getting synchronized clocks breaks down to enabling ntpd, no configuration needed.
FB: What is the new utility
Markus Friedl: It allows you to terminate any TCP
connection that connects to or originates from the local machine. You specify
the connection, and
tcpdrop tells the kernel to send out a TCP
reset segment and remove the local state associated with this connection.
FB: When would you use it?
Markus Friedl: While working on a fix for a denial of
involving out-of-order TCP packets, I've found that it's hard for an
administrator to terminate a misbehaving TCP connection. Usually you have to
kill the application, unless there is a way to tell it to close a given socket.
The attack mentioned before can even lead to situations where closing the
socket does not help (e.g., when the connection is in the
Without tcpdrop, you could use a packet-generating tool like libdnet and send out fake TCP resets. However, this is difficult since it requires that the administrator figure out the correct TCP sequence numbers.
FB: Reading some CVS logs, I found the initial
tcpdrop commit message saying "drop tcp connections using
sysctl(2)." I'm very curious; why
sysctl and how does
Markus Friedl: There are two system calls that can be used
to add functionality to the kernel (without adding yet another system call):
sysctl(3). The latter was chosen because it
was very simple to implement the new feature. Basically,
allows you to modify a number of kernel variables. These variables are arranged
in a tree; the command
sysctl net.inet will show the subtree
related to the IP protocol. In order to implement the
tool, the new
tcpdrop writes a local address/port and foreign
address/port tuple to this variable. This causes the kernel to terminate the
FB: OpenSSH 3.9 includes support for session multiplexing. What is it?
Markus Friedl: It allows you to run multiple login shells, scp or sftp sessions over a single SSH connection. For example, you log in to a host and enter your password one time. Then [you] can do multiple scp and sftp sessions to the same host without authenticating again.
FB: Is there anything you can do with it that wasn't possible before?
Markus Friedl: You can use this feature to speed up cvs access over ssh. Usually you have to enter the ssh password for every cvs command. With this feature you start a ssh client, it connects to the cvs server, and you enter the password. Further invocations of ssh will not contact the server directly but pass the connection request to first ssh client process. This client in turn asks the sshd server to start an additional remote process. This way you don't have to reauthenticate. Moreover, it also speeds up the cvs commands because the computationally expensive authentication operations are skipped and the setup requires less round-trip times.
You can use session multiplexing by adding something like this to your .ssh/config:
Host cvs-server Hostname cvs-server ControlMaster yes ControlPath ~/.ssh/cvs-mux Host cvs-server-fast ControlMaster no ControlPath ~/.ssh/cvs-mux
Now, open a ssh connection with
ssh cvs-server and keep it
open. The command
% cvs -d cvs-server-fast:/cvs diff
will use the first connection and will be be faster than
% cvs -d cvs-server:/cvs diff
Damien Miller: Others have been interested in using multiplexing for distributed compilation (distcc and Nikolay Sturm's distributed ports builder). In the distcc case especially, the cost of setting up a new connection would often be greater than the time it takes to compile a single .c file, so the multiplexing support should be a real help.
FB: What is
hotplugd(8) is a userland
part of the device hot-plugging notification. The main part is in the kernel.
It doesn't do anything magic, just hooks into our
framework, which deals with device attachments and detachments. Every time a
device attaches or detaches, a corresponding event is queued in the hotplug
queue. This queue can be accessed from the userland through the
/dev/hotplug device file. This device file supports usual
notification mechanisms such as
kqueue(2). So the
hotplugd(8) daemon waits for events to
be read from the /dev/hotplug file. Each event includes the
information if it's an attachment or detachment, what's the class and the name
of the device that appeared or disappeared. On every event,
hotplugd(8) runs shell scripts from the /etc/hotplug
directory, where you can describe what you want to do with your devices.
Note that hotplug can only work with [a physical bus] that already has
support for hot plugging in its drivers; i.e.,
usb(4). SCSI or PCI hot plugging
requires some additional work on the drivers.
FB: What type of future uses do you see for it?
Alexander Yurchenko: I use
automatically download photos from my camera. You might use it to configure
cardbus wifi cards, upload firmware to it, run
ucom(4) when [plugging] your gprs-capable cell phone to usb, and so on.
hotplug is for lazy slackers ;-)
FB: There is a new driver called
for Inter IC (I2C) master/slave buses. I've already heard that acronym, but I'm
wondering, what is it exactly?
Alexander Yurchenko: I2C (I square C)--Inter Integrated Circuits--is a two-wire bus originally developed by Philips; [it is now the industry] standard now for connecting simple devices such as sensors, EEPROMs, tuners, and so on. It's cheap, simple, has hot-plugging ability by design. I2C is very common in the embedded world. On usual PCs, I2C is often used for connecting hardware-monitoring sensors to the CPU or for controlling video and radio receivers.
iic(4) kernel framework was written by Steve C. Woodford and
Jason R. Thorpe for NetBSD, and I've ported it to OpenBSD. It provides a uniform
programming interface layer between I2C master controllers and various I2C
FB: What type of devices use it?
Alexander Yurchenko: For now we support only one master controller and one slave thermal sensor, both found on the WRAP x86 board.
gpio(4) supports General Purpose
Input/Output controllers, while
gpioctl(8) can manipulate device
pins. Is this interface available on common hardware?
Alexander Yurchenko: GPIO, like I2C, also came from the embedded world. Though GPIO controllers are integrated now in many popular PC chipsets, they're not used. You could find GPIO pins on the single-board computers like Soekris or WRAP. ARM- or PPC-based system-on-chip devices also have integrated GPIO controllers.
gpio(4) framework allows you to manipulate pins either from
userland or from kernel. For example, you can connect a red LED to a GPIO pin
and switch it on if something's wrong with your system.
FB: This release ships with the driver for AIC79xx-based Ultra320 SCSI adapters. Is this the first Ultra320 SCSI driver?
Marco Peereboom: No, and to my knowledge there are currently only two U320 SCSI offerings. One from Adaptec and one from LSI. All other offerings that use U320 are based on either chip.
OpenBSD supported firstly the LSI-based 53c1020/1030 U320 chip with the
mpt(4) driver. This chip is used on a plethora of servers from
different vendors. With the introduction of the chip, we also started supporting
LSI megaraid-based RAID cards like the Dell PERC 4/DC and SC that use the
Before we introduced the Adaptec
ahd(4) driver, we actually
supported Adaptec's U320 RAID cards. These were
aac(4) based and
therefore easy to add to the list.
The glaring thing that is missing on this list is LSI's IM/IS/IME extensions to the 53c1020/1030 chip. These are RAID 1 (Integrated Mirroring + Enhanced) and RAID 0 (Integrated Striping) that can be provided using the 53c1020/1030 chips that have the required peripheral hardware (EEPROM). I am actually working on adding support for this, but I am unsure when it'll be ready.
FB: Would you suggest to buy U320 hardware or U160 for an OpenBSD-3.6 server?
Marco Peereboom: Absolutely! Nothing beats compiling a kernel on a bunch of SCSI drives. SCSI is considered higher end than PATA/SATA, the difference being that PATA/SATA are pushing technology for density and speed. As with any technology that is pushing its limits, it is inherently less reliable. The technology that proves to be most reliable is then reused in SCSI drives. The PATA/SATA technology however uses a considerably simpler bus and has nowhere near the same amount of intelligence. SCSI is more reliable and faster because "all" complexities are offloaded to the physical devices. This is important because the OS therefore does not need to waste any time dealing with the underlying protocol. Currently PATA/SATA drives are creeping up into the SCSI domain by providing SCSI-like technologies such as tagged queuing. Problem is that these are vendor specific and not defined in a spec that is widely available. Because of this, most OSes are not using these devices to their full potential.
A common misconception between SCSI and PATA/SATA is that people will note that sometimes the latter seems faster. What really is happening is cache trickery. Since PATA/SATA is ended toward home users, vendors turn on all the caching that they have on those drives to wring out every bit of performance. The SCSI products, on the other hand, are ended toward enterprise customers who value data integrity over speed, so they disable all cache on these drives to prevent data loss/corruption. Both types have utilities or mode pages to change this behavior for specific needs; choose wisely.
U320 devices are over the hump; as with all new technologies it takes a while to remove the final kinks, and U320 gear is nowadays as stable as the slower U160 counterparts.
I personally only use SCSI on production boxes, the only exception being firewalls that basically have no I/O load at all.
FB: Which adapters in particular?
Marco Peereboom: I am a sucker for the LSI U320 gear. The reason for this that mpt (message-passing technology) basically is a generic transport mechanism that can be used for a wide variety of devices. With minimal changes an mpt-based driver can be adapted to add support for GigE, iSCSI, FC, and SCSI, our driver currently supports FC and SCSI. The mpt spec can and is being scaled beyond these devices. Besides this generic interface for a bunch of gear, mpt also offloads all the protocol specifics in hardware. This is really cool! The driver sends of an I/O, and a while later it gets an interrupt telling it if the I/O completed or failed, and based on that it will send it back to the SCSI midlayer. In other words, the driver has a very small footprint and is extremely simple and fast.
All this said, I want to thank LSI for all the hardware and documentation they donated to OpenBSD. I got a lot of good suggestions from their engineers on particulars of the chip.
I also want to thank Adaptec, who donated quite a few boards to OpenBSD. They did, however, not provide us with documentation and pointed us to the FreeBSD driver as their documentation. In all fairness, Justin Gibbs from Adaptec and FreeBSD did answer all questions promptly.
And I want to make sure that credit is given where due. I codeveloped these goodies with two brilliant guys, Kenneth R. Westerback and Milos Urbanek.
FB: This release supports a new platform. Since I don't own any LUNA-88k box, I'm wondering what type of things you can do with OpenBSD 3.6 on it (workstation, firewall, server, ...).
Kenji Aoyama: Well, at this moment, I am using my box just for "fun"; i.e., porting OpenBSD itself. But OpenBSD/LUNA-88k becomes relatively stable now, I would like to use it as a server in my home network.
FB: I've read that there are still some unsupported devices and features like SMP. Are you looking for any particular help from the community (hardware, money, resources, ...)?
Kenji Aoyama: It would be nice if I got an original working LUNA-88K box, not LUNA-88K2 that I already own. They have some difference in peripheral devices, so it makes better LUNA-88K support. In addition, if I have two boxes, I can develop on one box and build a snapshot on the other box at the same time. It takes three days to build a snapshot on LUNA-88K.
FB: OpenBSD is probably the BSD project with the best support for Apple's hardware. What type of contributions is Apple sharing with you?
Dale Rahn: Basically, none. I talked with Apple's open source representative at USENIX 2001 and USENIX 2002 but was unable to obtain anything useful. After USENIX 2002, the APSL was modified to make the code somewhat more free; however, OpenBSD developers still avoided using it as a source of information to avoid APSL contamination. It had not been a free license. Recently this has improved with APSL 2.0; it is now more GPL-like, which is slightly more but not completely free. Note that OpenBSD has removed GPL code where is was possible; there is no GPL code in the kernel, for instance.
FB: Have you obtained all the docs and specs you needed?
Dale Rahn: I have obtained no documentation to assist with the effort. Most of the development is based on NetBSD or Linux code. Most of the improvements have been the result of dedicated developers trying to make their macppc laptops run nicely. While greatly outnumbered by i386 laptops, a significant number of developers were using macppc laptops at the last hackathon.
FB: Is Darwin/OpenDarwin a good resource for code sharing?
Dale Rahn: Because the APSL has not been a free license, this resource is normally avoided. Sometimes it has been used as a reference when the Linux code was not clear.
FB: Has Apple showed any worry or interest about the OpenBSD porting to the PowerPC platform?
Dale Rahn: I have never been contacted by Apple. The fact that Apple ships OpenSSH with MacOS shows that they know that OpenBSD exists, but they appear to not have acknowledged that OpenBSD/macppc exists.
FB: ethereal was removed from the ports tree because "the ethereal team does not care about security, as new protocols get added, and nothing gets done about the many more holes that exist." I hope that this is not the beginning of a hunting season to remove software because it's [insecure. That] will end with a system that's secure because [it] can't do anything. I'm wrong, right?
Peter Valchev: You are in part correct.
People often forget the main purpose of the ports tree is to provide packages, especially on the CDs when a release is done, for convenience. When a piece of networking software running with root privileges continuously gets holed, and the developers do not address the root of the problem (the big hunk of code running as root), the other facts aside, means we ship a holed version in our releases. Then many people, not knowing better, will just add the package in question and get in trouble. Namely, that was the case in 3.5. This kind of software does not belong to the ports tree for mainly that reason ... especially when alternatives exist. And maybe someone who cares about this particular piece of software and relies on parts of it can use this as motivation and address the root problem. You are not wrong that OpenBSD will discourage the use of insecure software in the future, in the ports tree or not. It's why rlogin was removed from the source tree, for example. I know of a big institution that recommends rlogin over ssh to this day. I don't think that is the world OpenBSD enthusiasts want to live in.
FB: How does the port team interact with ported software developers? Do you submit any patch for OpenBSD portability or to improve the security of their projects?
Peter Valchev: The main job of a port maintainer is to interact with the upstream author/maintainer of the software. The goal is to submit all patches and solve portability or security/other problems, and usually the authors are very helpful and consensus is reached, with these changes making it into the new releases. Of course there are exceptions, as well as abandoned or "dead" projects, so the ports tree will always be full of patches maintained by us.
FB: On the
misc@ mailing list there were
some discussions about Apache and the version in the OpenBSD source tree. It
seems that you chose to fork from version 1.3.29 and then introduced a lot of
fixes, some of them security related. This sounds good to me, but do you think
that keeping the name "Apache" is judicious?
Henning Brauer: No, this is completely backward. We did NOT fork from 1.3.29 and then did a lot of security fixes. We always had a number of security fixes, mostly in mod_ssl. Some releases ago I wrote the chroot extension. We did more fixes over time, as we found more problems. Of course, we always notified people in the Apache team, but they apparently decided to not care, or in some cases to ignore problems because they could not solve them on some obscure platforms that don't have [reasonable sources of] randomness. So yeah, good idea, let everyone suffer because one or two platforms can't get it right. If there was proof needed that it is impossible to write reasonable code for both Unix and completely non-Unixy platforms, they made it.
So by now the diff between our in-tree Apache and their 1.3.29 is well beyond 4,000 lines.
After the 1.3.29 they decided to muck with their license, introducing stupid
patent terms without understanding what they turned their license (that used to
be a BSD-derived one) into with that, so we cannot import new versions unless
they fix their license. It is not a big loss tho'. The Apache people have
mostly given up on 1.3 anyway, and all that happened over the last years was
bug fixes, documention work (actually, mainly translation), and some stupid code
shuffling, that only made diffs bigger without improving anything. Now that it
is certain that we don't have to worry about syncing to them any more, we can
start making the mess of code readable tho'. That is, un
replacing all those
ap_something function calls by their native
such. I am pretty certain that there will be bugs discovered or "accidentally"
fixed in that process. It is a lot of work and I cannot do it all by myself,
but I hope to get help.
FB: If I'm not wrong, this release doesn't add any new
spamd. Isn't there any new idea to fight spam that you
would like to add to
Bob Beck: Nope, no new
spamd features in 3.6.
The work I've done post-3.5 has been in other areas. I had a few grand plans,
which were mostly put on hold by a big flood we had in my hometown (Edmonton,
Alberta). It backed up raw sewage into my basement, forcing me to tear down my
entire computer room and store all my machines in piles upstairs. (This was my
computer room on July 16.) Since when this happens there are no contractors
available to do any work for you, I've had to do it all myself/bug friends
to help, and rack my credit cards up to pay for it. I'm still waiting for my
insurance company to give me a dime so I can pay the bills and replace some of
the furniture I lost. (Let's just say I have a special place in my heart for
Meloche Monex Insurance, right next to the place for all the spammers.) I'm just
now getting back to where I'm not spending every free minute doing forced
renovations, and I can have fun again.
I'm currently most interested in making it possible to distribute the spamd database in a couple of ways for post-3.6. This is important so people running multiple MXes can still make use of greylisting properly.
I'd like to look at adding features to
spamd that make it
possible to more effectively share information between or from trusted sites,
particularly the whitelist information, which is actually the powerful stuff. I
think there's also room in
spamd to allow for spamtraplike
functionality from the greylist, and I will be looking at getting a clean
implementation of that in.
There are many other spam-fighting possibilities that do *not* belong in
spamd; it's important to keep that in mind. If it's better done in
the MTA, then that's the place to do it. I do like keeping
small, simple, low impact, and secure.
FB: What is your opinion about caller ID technology?
Bob Beck: Short answer: it sucks dead moose through bent straws.
Long answer: for spam fighting? I don't think it will work well. SPF and caller ID don't stop spam--all they do is make sure that the envelope-from of a message comes from a place that the DNS for the supposedly sending domain says it should, or could, come from. That doesn't mean it's not a spam message. It's a nice thing for AOL or MSN to be able to provide people with a mechanism to identify their outgoing SMTP addresses. (Of course there are other ways they could actually do that, like publishing a whitelist?) My other big problem with these mechanisms is they break mail forwarding. Yes, I know there's another proposal to get around that problem too--but that involves more changes to MTAs, and I would hope to see some benefit for that in exchange for such a change. Truth be told, I haven't, and I've looked.
As you can imagine, I've received a bunch of suggestions to "make
spamd do SPF! wouldn't it be cool!" I agree it might be cool, but
cool alone doesn't cut it; it needs to work to justify its existence, code
complexity, and maintenance with some actual benefit. So I did a little
experiment around the middle of September. I have a
box in front of a large domain with about a hundred thousand users or so. It
runs greylisting only, no blacklists. This machine consistently sits at about
40,000 greylist entries in
spamdb, with a 4-hour expiry time. So
figure on about 10,000 connections an hour in the greylist, the vast majority
of which are spambot-generated spam that never retries. I wrote a small Perl
script using the reference SPF implementation (
Mail::SPF) to walk
through my greylist. I was interested in two things:
The result was very interesting. Obviously if there was lots of case 1, it would be a compelling argument to add functionality to spamd to do this. Out of my approximately 40,000 entries in my greylist, there were about 25 connections that showed this. So really, for all that cost of DNS lookups and code complexity, I would have blocked 25 spams. Big deal :)
No. 2 was even more interesting. There were about 1,500 greylist entries
where the envelope from indicated a valid or at least possibly permitted
sending IP in SPF. But then it gets fun. Take a look at the sender and
recipient, and turn on
spamd debugging, and take a look at the
content. Almost all of these were spam.
What's my conclusion? SPF and caller ID does two things, which I would do if I were writing spam software:
If I were a spammer, I would publish SPF records for my throwaway domains to allow the places I'm spamming from. There's a nice site about SPF that tells me how to do it :) The biggest SPF adopters I see on my site (from No. 2 above) are spammers.
(And don't forget, this is what AOL and MSN *really* care about.) If I were a spammer and were completely untalented and didn't have a throwaway domain, if I were sending a message with a random valid domain as the source, I would first do a quick DNS lookup to see if the domain published SPF records. If it does, pick a different random domain. Not that hard, and not that expensive. The result is I don't use AOL or MSN addresses. While this is good for AOL or MSN, it doesn't really do much for the average mail site maintainer. True, you could say, "I will only accept mail from sites with SPF records," but right now that means "I will only accept mail that is 90 percent spam," because while lots of people have SPF records, other than MSN and AOL and their ilk, most of the mail flying around out there from SPFed domains, at least that hits my server, is spam!
I sometimes wish my sense of ethics was surgically removed at birth so I could write spam software. I'd make more money to pay someone else to shovel the s### out of my basement and fight with the insurance people ;)
FB: And what about its proposed license?
Bob Beck: When you consider the Microsoft patent issues, not only does it not work well for stopping spam, but it's also probably going to end up being free-implementation hostile like VRRP anyway. I don't want to have to get drunk and make up another IETF-bashing song about SPF/caller ID/MARID/blah blah ...
Wait, what's that noise I hear? It's a band of zealots screaming, "It's not for stopping spam, it's for stopping forgery!" after reading this. Right--but people are touting it to stop spam and I don't think it will. It also can't stop forgery, only make it a bit more difficult. When I don't want my mail to be forged, I do it the way God intended man to have secure conversations: strong cryptography--not the DNS. (I'm crossing myself saying, "PGP, IPSec, SSH.") :)
FB: NAT-Traversal support has been added to
isakmpd. What is it?
Hkan Olsson: NAT-Traversal addresses the problem that IPsec has with NAT.
The normal operation of NAT is to multiplex (modify) the TCP or UDP port number to distinguish between different hosts/ports on the inside network--typically to "hide" a number of hosts behind a single, externally visible IP address.
IPsec does not work well with NAT for a number of reasons; perhaps most obvious is that the IPsec protocols do not have port numbers (being neither TCP nor UDP).
The solution--i.e, "NAT-Traversal"--is to encapsulate the IPsec packets inside normal UDP packets, normally on port 4500. This traffic, being UDP, works well with NAT.
There is thus far no finished IETF standard describing how NAT-T should be done. The OpenBSD IKE daemon (isakmpd) currently implements roughly the -02 and -03 NAT-T drafts from the IETF IPSEC working group. This seems to match what most other vendors do.
FB: Does it interact with PF?
Hkan Olsson: Yes, and it is fairly straightforward. IPsec packets using NAT-T are to be matched on UDP port 4500 instead of (or in addition to) IP protocol 50 (ESP).
FB: Sometimes I read on the PF mailing list that someone hit the queues limit number with his ruleset. Is there any plan to increase the max number of queues that PF can handle?
Henning Brauer: Completely pointless. The number of queues is not limited because we like to impose limits ... the time resolution of the kernel is limited; we cannot reasonably break down the network traffic to an arbitrary number of queues, they would simply not work as intended. That said, there is at least some ideas for the PRIQ scheduler to allow people to specify more queues than now. As it is only about priorities and not bandwidth, the time resolution is not such a big factor there.
FB: I'm very interested in the
optimizer feature. How does it work?
Mike Frantzen: The basic premise behind the ruleset optimizer is that it doesn't matter which rule passes or blocks a packet as long as it gets passed or blocked like intended. To that end the optimizer will split the ruleset into superblocks of adjacent rules that it can safely reorder. For example:
pass in proto tcp to $BOB port ssh keep state pass in proto tcp to $JIM port ssh keep state pass in proto tcp to $BOB port smtp keep state pass in proto tcp to $JIM port www keep state pass in proto tcp to $BOB port ident keep state
Those five rules can safely be reordered without changing the meaning of the ruleset. So what ... The kernel implements skip steps such that if a packet does not match part of a rule (for instance, the port), then the kernel will skip over the next rules that it knows cannot match. So in the above example, we would reorder the rules to:
pass in proto tcp to $BOB port ssh keep state pass in proto tcp to $BOB port smtp keep state pass in proto tcp to $BOB port ident keep state pass in proto tcp to $JIM port ssh keep state pass in proto tcp to $JIM port www keep state
When the kernel is evaluating a packet against the first rule and the packet
is not destined to
$BOB, then the kernel will skip over all of the
$BOB rules. In the unordered case, the kernel would have
evaluated five rules. In the optimized case, the kernel will only evaluate two
to four rules.
The ruleset optimizer can also remove duplicate rules, rules that are a subset of another rule, and it can automatically combine rules with different IP addresses into a single rule with a table lookup. The optimizer can even look at the statistics of the currently running ruleset and use that as feedback to direct the optimization of "quick" rules.
Most people get a 10 to 30 percent decrease in effective ruleset size. Some script generated rulesets get cut by 300 percent.
Here I should probably plug my employer, NFR Security. I wrote the ruleset optimizer under the auspices of NFR's intrusion prevention system that incorporates PF.
NMBCLUSTERS is gone. How does the
kernel manage networking memory now?
Henning Brauer: Well,
NMBCLUSTERS was the
(fixed) size of the mbuf cluster pool. Now, there is an initial size for it,
and whenever it is used up beyond a certain point, the pool is grown. The
growing happens outside interrupt context, of course.
FB: It seems that OpenBSD is moving toward the Cisco replacement market. Why have you added support for T1/E1 hardware?
Henning Brauer: We are not moving toward any market. We solve problems, often our own ones.
There is increasing demand to terminate T1/E1/T3/E3 lines on OpenBSD machines. So we contacted some vendors; Cyclades provided two cards (for that, support is still to be written), and Sangoma provided [many] cards and sent us one of their engineers, Alex, to the hackathon. He and a few of us worked together to get the driver into a shape where it could be committed to OpenBSD, and in it went.
FB: Is there any plan for DSL hardware support too?
Henning Brauer: I do not see the point; the DSL-Ethernet bridges are the devices to use IMHO.
FB: I found this post, which includes a link to a patch for in-kernel PPPoE. Is there any chance to see a kernel side PPPoE implementation in the tree?
Todd C. Miller: I don't use PPPoE myself, but this has been
a topic of discussion among the OpenBSD developers. The consensus was to do
this in pieces by updating the
lmc(4) driver first and then adding
bits specific to PPPoE.
FB: What is the generic framework IEEE 802.11 introduced in this release?
Todd C. Miller: OpenBSD 3.6 ships with an IEEE 802.11 networking stack originally developed by Atsushi Onoe (NetBSD) and heavily modified by Sam Leffler (FreeBSD). Previously, most popular 802.11 chipsets (prism, lucent, aeronet) did most of the actual 802.11 protocol themselves. More recently, the trend has been toward chipsets that require almost everything be done in software (presumably for cost reasons).
In a way it is similar to the prevalence of software modems these days. With the 802.11 framework we can support the older cards that are "smart" as well as the newer "dumb" ones. Note that "smart" doesn't necessarily mean better. When the stack is done in software, we have more control over things, allowing for host-base access points (HostAP) and alternate encryption schemes.
Currently only the ADMtek ADM8211 driver uses the 802.11 framework, but in time I expect the other wireless drivers to be converted as well.
FB: This release adds the support for ADMtek ADM8211 802.11b wireless adapters. What is the plan for other chips and technologies like 802.11g?
Todd C. Miller: The relative dearth of supported 802.11g chipsets is really due to a lack of documentation and/or firmware. For instance, there is a driver for the Intel 802.11g (Centrino) chipset but it requires a firmware image that is only available via click-through license, which prevents us from shipping the firmware. Likewise, there is a Linux driver for the Prism 802.11g chipset, but it also requires a firmware image for which there is no legal source. The 802.11b cards we support have firmware stored in flash memory, but newer designs omit the flash to save money. Without the firmware there is little point in having drivers for the hardware. While there are people putting pressure on vendors to release their firmware under an acceptable license, so far they haven't had much luck.
FB: Most security-sensible people use IPSEC to protect wireless networks. Do you think that the new WPA (Wi-Fi Protected Access) could make it unnecessary?
Todd C. Miller: I have not looked at WPA in enough detail to have a very informed opinion on this. It is certainly an improvement over WEP (but [that's not] saying much). I do know that Sam Leffler has changes to the 802.11 framework to support WPA, though I don't know if he has integrated those into FreeBSD yet.
FB: Mike Frantzen has introduced a new feature called StackGhost on the Sparc platform. How does it work?
Mike Frantzen: Sparc is a weird, fun computer architecture. The return address is saved on the stack when a programmer calls a function on most architectures. And thus a buffer overflow will overwrite the return pointer in order to execute an exploit. But Sparc is weird. To greatly simplify the explanation, Sparc puts the return address in a list of registers. Then the kernel is the one that actually writes userland's return addresses to the stack once the function calls go deep enough to exhaust all of the registers.
So StackGhost will go ahead and XOR a random 32-bit number into the return address before written to the stack, and it will remove the random number when it retrieves it from the stack. When an attacker overwrites the return address with a pointer into his exploit payload, his pointer will be off by that random 32-bit number and the program will crash instead of running his exploit. The performance impact is about 1 percent, and it requires no changes to userland, not even a recompile.
FB: Is there any chance to port it to other platforms?
Mike Frantzen: StackGhost takes advantage of a nuance of the Sparc architecture. Sparc64 would obviously be fairly easy. Similar nuances exist in Itanium but would be more difficult to take advantage of. And I'm told there might be games we could play on HPPA or M88k.
FB: Is there any other OS that uses it?
Mike Frantzen: Not that I know of. StackGhost itself is pretty easy to implement. The really hard part is making GDB usable under StackGhost, which was done by Mark Kettenis.
FB: Why have you developed
another function to convert strings to numbers?
Todd C. Miller: We developed
the same reason we added
strlcat(3)--the existing mechanisms were difficult or impossible to use safely. The most
common function used to convert a string to a number,
provides no way to check for error conditions such as overflow, underflow, or
invalid input. While
strtol(3) and its variants do provide ways to
detect this, they lack the ease of use of
atoi(3) and so tend not
to be used. With
strtonum(3), a programmer can easily replace
existing calls to
atoi(3) and get bounds checking and error
detection for free. As an added benefit, using
people think about [which] upper and lower bounds actually make sense for the
FB: I was wondering if you ever thought to replace the old plain text logs format with something like Bruce Schneier's Cryptographic Support for Secure Logs. Tampering with logs is pretty easy, and this has gone unaddressed for years.
Damien Miller: It doesn't make sense to change the log
format to something other than text. The best thing about syslog is that you
can wield the entire range of Unix text-processing tools over syslog output,
and even do it in real time (using
tail -f). I'm also quite
concerned about adding more complexity to
syslogd--it is one of
the really critical parts of the system, and it must be robust in the face of
all sorts of untrusted input. That being said, there may be a way to add some
sort of simple integrity stamps (perhaps based on Schneier's design) to the log
in a textual format. It could be a good project for someone who is
I think that a more practical problem to solve is how to export logs off a
system in a secure fashion. If the log data is no longer on a system, then it
cannot be retrospectively compromised. Also, log information is more useful
when it is centralized and available for event correlation. It might be cool to
add some simple way for syslogd to automatically set up IPsec SAs for when
exporting logs to an external host, probably using the code from
bgpd. But, I don't have any immediate plans to work on this.
Federico Biancuzzi is a freelance interviewer. His interviews appeared on publications such as ONLamp.com, LinuxDevCenter.com, SecurityFocus.com, NewsForge.com, Linux.com, TheRegister.co.uk, ArsTechnica.com, the Polish print magazine BSD Magazine, and the Italian print magazine Linux&C.
Return to the BSD DevCenter.
Copyright © 2009 O'Reilly Media, Inc.