Hardware Accelerated SSL on SheevaPlug (Marvell Kirkwood ARM) Using OpenSSL on Fedora

I have recently been spending a quite a lot of time working on Linux on various ARM devices. It is quite amazing what ARM hardware is capable of nowdays. One of the most popular ARM based machines available is the SheevaPlug. The performance of it is pretty good for a small server – my experience shows that the 1.2GHz Marvell Kirkwood 88F6281 compares quite favoutably to the likes of 1.66GHz Intel Atom N450 in terms of both server performance and especially in terms power usage. Atom N450 systems have a typical power draw of about 22W idle and 28W under load – a far cry from the supposed 7.6W total of 5.5W N450 + 2.1W NM10. SheevaPlug, on the other hand, draws 2.3W idle and 7W under load.

In some areas, however, the Atom does hold a performance advantage, especially in usage that requires heavy number crunching – unlike the Marvell KirkwoodAtom N450 has a FPU and SIMD capability via the SSE/SSE2/SSSE3 instruction sets. One set of applications that get better performance on Atom N450 are the ones doing encryption, for example OpenSSL. Or do they…

Not quite. The Kirkwood ARM has an ace up it’s sleeve, and as it turns out, it is one powerful enough to allow it to close the gap against a processor with 4x the power budget. It has a hardware crypto engine that supports MD5, SHA1 and AES-128 acceleration.

Unfortunately, mainstream Linux distributions don’t come with the hardware crypto acceleration enabled, and most of the documentation available is sufficiently out of date to be unapplicable to the current generation of distributions. All of it points at OCF Linux, which hasn’t been updated for kernels past 2.6.33 and OpenSSL 0.9.8n, both of which are deprecated. I have modified the kernel patches to make them work on 2.6.35, but unfortunately the cryptodev driver uses locked ioctl operation which has been removed from the kernel starting with 2.6.36, so further modifications are required to make it work on later kernels. OCF Linux also doesn’t appear to have been updated since late 2010. But things are not as bad as it initially seems – it turns out that there is an alternative.

The reason kernel patches are required is because acceleration depends on the BSD style cryptodev kernel interface. There is an alternative, more up to date project that provides this much less intrusively: Cryptodev-linux. It provides a standalone driver that doesn’t require the entire kernel to be recompiled for it, and it works with the 2.6.36+ kernels.

That just leaves OpenSSL support. Well, it turns out that OpenSSL 1.0.0 already comes with support for cryptodev hardware offload, it just isn’t enabled by default. It has to be enabled during the configure stage by providing -DHAVE_CRYPTODEV (for encryption offload) and -DUSE_CRYPTODEV_DIGESTS (for hashing offload). If you are building against Cryptodev-linux you will also have to provide the -DHASH_MAX_LEN=64 parameter – this is normally in OCF‘s cryptodev.h header file, but isn’t present in the header files that Cryptodev-linux provides. Not a big deal, but something to bear in mind when you are building your own OpenSSL with cryptodev engine support.

So, how big a difference does the Kirkwood‘s acceleration make? Quite a substantial one. Here is what openssl speed test produces:

Kirkwood without cryptodev:
# openssl speed -evp aes-128-cbc
Doing aes-128 cbc for 3s on 16 size blocks: 1870065 aes-128 cbc’s in 3.00s
Doing aes-128 cbc for 3s on 64 size blocks: 516074 aes-128 cbc’s in 3.00s
Doing aes-128 cbc for 3s on 256 size blocks: 132474 aes-128 cbc’s in 3.00s
Doing aes-128 cbc for 3s on 1024 size blocks: 33342 aes-128 cbc’s in 3.00s
Doing aes-128 cbc for 3s on 8192 size blocks: 4171 aes-128 cbc’s in 3.00s

Kirkwood with cryptodev:
# openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 85277 aes-128-cbc’s in 0.08s
Doing aes-128-cbc for 3s on 64 size blocks: 82960 aes-128-cbc’s in 0.08s
Doing aes-128-cbc for 3s on 256 size blocks: 59806 aes-128-cbc’s in 0.03s
Doing aes-128-cbc for 3s on 1024 size blocks: 40939 aes-128-cbc’s in 0.01s
Doing aes-128-cbc for 3s on 8192 size blocks: 8227 aes-128-cbc’s in 0.00s

The results show, predictably, that with very small (unrealistically small) data blocks, software-only userspace crypto is faster due to less context switching. With 1KB blocks, however, hardware crypto is 23% faster, and with 8KB blocks the hardware engine goes twice as fast as the software-only option. But what is really impressive is the reduction in CPU time. Because the hardware crypto engine is asynchronous, there is practically no CPU time required when using it, which is important since it leaves the CPU free to get on with other tasks.

For comparison, there are the Atom N450 results:

# openssl speed -evp aes-128-cbc
Doing aes-128-cbc for 3s on 16 size blocks: 3813930 aes-128-cbc’s in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 1098375 aes-128-cbc’s in 2.99s
Doing aes-128-cbc for 3s on 256 size blocks: 294884 aes-128-cbc’s in 2.99s
Doing aes-128-cbc for 3s on 1024 size blocks: 74520 aes-128-cbc’s in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 9245 aes-128-cbc’s in 2.99s

So the Atom is faster all around – on 1KB blocks it is 82% faster, which reduces to a 12% advantage using 8KB blocks. But let us not forget that we could, in theory, run two instances of OpenSSL, one with hardware offload and one without, which would give us the combined total performance of both, if that is all we needed the machine to do. This would give us figures of approximately:

1KB: 33342+40939=74281
8KB: 4171+8227=12398

This ties with the Atom using 1KB blocks, and beats it by 34% using 8KB blocks – all in a power envelope 4x smaller. Pretty impressive.

Installing Cryptodev-linux is trivially simple, and is simply a matter of the usual “make; make install” procedure after extracting the tar ball (make sure you have the kernel headers for your kernel installed and available in /lib/modules/$(uname -r)/build/).

I mentioned above the required additional parameters to make OpenSSL build with cryptodev support. On Fedora 13’s OpenSSL‘s source package, you can edit the relevant line in the spec file. The relevant section on my version reads:

./Configure –prefix=/usr –openssldir=%{_sysconfdir}/pki/tls ${sslflags} zlib enable-camellia enable-seed enable-tlsext enable-rfc3779 enable-cms enable-md2 no-idea no-mdc2 no-rc5 no-ec no-ecdh no-ecdsa –with-krb5-flavor=MIT –enginesdir=%{_libdir}/openssl/engines –with-krb5-dir=/usr -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS -DHASH_MAX_LEN=64 shared threads ${sslarch} fips

In case you cannot modify/build it yourself, here are the packages:
/wp-content/uploads/2011/05/openssl-1.0.0-1.kw.fc13.src.rpm
/wp-content/uploads/2011/05/openssl-1.0.0-1.kw.fc13.armv5tel.rpm
/wp-content/uploads/2011/05/openssl-devel-1.0.0-1.kw.fc13.armv5tel.rpm