May 16, 2018

Keccak implementation in Haskell

A Haskell programmer typically relies on cryptonite for hash implementations. cryptonite is exhaustive and efficient; most of its cryptographic functions are implemented in C and invoked using Haskell’s FFI. This week, I finished an implementation of the Keccak hash (SHA3) in pure Haskell which I needed for a project of mine compiled with ghcjs (ghcjs cannot compile Haskell which uses the C FFI.)

While researching Keccak, I learned that much more than hashes could be implemented using its underlying “sponge” construction. The Keccak team writes in their paper Cryptographic Sponge Functions,

In the context of cryptography, sponge functions provide a particular way to generalize hash functions to more general functions whose output length is arbitrary. A sponge function instantiates the sponge construction, which is a simple iterated construction building a variable-length input variable-length output function based on a fixed length permutation (or transformation). With this interface, a sponge function can also be used as a stream cipher, hence covering a wide range of functionality with hash functions and stream ciphers as particular points.

The goals for this keccak library are to implement the hashes, MACs, extendible-output functions, & stream ciphers described by the Keccak team in full generality and in pure, readable Haskell. Currently, the library mainly implements the four standard hashes (224-, 256-, 384-, & 512-bit) for both SHA3 and Keccak (which differ only in padding rules). The implementations are unoptimized so, for context, cryptonite’s C-based implementation of Keccack256 is 21 times faster than my naive, unoptimized Haskell.

benchmarked keccak
time                 768.3 μs   (758.7 μs .. 775.7 μs)
                     0.998 R²   (0.995 R² .. 0.999 R²)
mean                 774.2 μs   (767.5 μs .. 784.0 μs)
std dev              29.27 μs   (23.12 μs .. 36.87 μs)
variance introduced by outliers: 19% (moderately inflated)

benchmarked cryptonite-keccak
time                 36.92 μs   (35.95 μs .. 38.03 μs)
                     0.996 R²   (0.995 R² .. 0.998 R²)
mean                 36.27 μs   (35.99 μs .. 36.66 μs)
std dev              1.147 μs   (918.3 ns .. 1.471 μs)
variance introduced by outliers: 14% (moderately inflated)

Eventually, I hope the library will have very few dependencies (only base, vector & bytestring, currently) and excellent performance.

Example usage

In the example usage below, I encode ByteStrings in base16 so that they can be read as standard hex strings.

ghci> import Data.ByteString.Base16 as BS16

ghci> :t keccak256
keccak256 :: BS.ByteString -> BS.ByteString

ghci> BS16.encode $ keccak256 "testing"
"5f16f4c7f149ac4f9510d9cf8cf384038ad348b3bcdc01915f95de12df9d1b02"

ghci> BS16.encode $ keccak256 ""
"c5d2460186f7233c927e7db2dcc703c0e500b653ca82273b7bfad8045d85a470"

Keccak Background

According to Jean-Philippe Aumasson, the recent winner of NIST’s SHA3 competition, Keccak, is named after the Balinese dance Kecak. Wikipedia offers the following the description of the performance:

Also known as the Ramayana Monkey Chant, the piece, performed by a circle of at least 150 performers wearing checked cloth around their waists, percussively chanting “chak” and moving their hands and arms, depicts a battle from the Ramayana. The monkey-like Vanara led by Hanuman helped Prince Rama fight the evil King Ravana. Kecak has roots in sanghyang, a trance-inducing exorcism dance.[2]

A helpful answer on crytpo.stackoverflow points out that this continues djb’s tradition of naming cryptographic functions after dances. djb chose only Latin dances for his Salsa, Rumba, and ChaCha algorithms.

The Sponge Construction

I quote the paper Cryptographic Sponge Functions again:

The sponge construction is a simple iterated construction for building a function F with variable-length input and arbitrary output length based on a fixed-length transformation or permutation f operating on a fixed number b of bits. Here b is called the width. The sponge construction operates on a state of b = r + c bits. The value r is called the bitrate and the value c the capacity.

In the case of the standard Keccak and SHA3 hashes, the sponge state always has width b = 1600, and it is operated on by the permutation KeccakF[1600] which maps the 1600-bit states to 1600-bit states. What mainly distinguishes SHA3-224 from SHA3-256, SHA3-384, and SHA3-512 is not the output length (which can be made arbitrarily long using the sponge construction), but rather the bitrate r of the hash and its capacity c. Because b = 1600 = r + c, the capacity of the hash must decrease as the bitrate increases and vice versa; a higher bitrate promises better performance while a higher capacity provides better security guarantees. The general operation of the sponge is described by the Keccak team very succinctly:

First, all the bits of the state are initialized to zero. The input message is padded and cut into blocks of r bits. The sponge construction then proceeds in two phases: the absorbing phase followed by the squeezing phase.

In the absorbing phase, the r-bit input message blocks are XORed into the first r bits of the state, interleaved with applications of the function f. When all message blocks are processed, the sponge construction switches to the squeezing phase.

In the squeezing phase, the first r bits of the state are returned as output blocks, interleaved with applications of the function f. The number of output blocks is chosen at will by the user. The last c bits of the state are never directly affected by the input blocks and are never output during the squeezing phase.

The keccak library implements the constituant parts of the KeccakF[1600] function using very simple Haskell; hopefully, those new to Keccak can read it more easily than existing C or Python implementations to get a feel for this very simple permutation at the heart of SHA3.

Demonstrating the poor security of low-capacity sponge-based hashes

A second-preimage attack in this example shows that for a keccak hash with capacity c = 32 and bitrate r = 1568, two preimages can be found which both hash to 0xdec1. The attack requires checking only 5544 inputs.

First preimage:

In the context of cryptography, sponge functions provide a particular way to generalize hash functions to more general functions whose output length is arbitrary. A sponge func- tion instantiates the sponge construction, which is a simple iterated construction building a variable-length input variable-length output function based on a fixed length permutation (or transformation). With this interface, a sponge function can also be used as a stream ci- pher, hence covering a wide range of functionality with hash functions and stream ciphers as particular points. From a theoretical point of view, sponge functions model in a very simple way the finite memory any concrete construction has access to. A random sponge function is as strong as a random oracle, except for the effects induced by the finite memory. This model can thus be used as an alternative to the random oracle model for expressing security claims. From a more practical point of view, the sponge construction and its sister construction, called the duplex construction, can be used to implement a large spectrum of the symmetric cryptography functionality. This includes hashing, reseedable pseudo random bit sequence generation, key derivation, encryption, message authentication code (MAC) computation and authenticated encryption. This provides users with a lot of functionality from a single fixed permutation, hence making the implementation easier. The designers of cryptographic primitives may also find it advantageous to develop a strong permutation without worrying about other components such as the key schedule of a block cipher.

Second preimage:

kI!the context of cryptography, sponge functions provide a particular way to generalize hash functions to more general functions whose output length is arbitrary. A sponge func- tion instantiates the sponge construction, which is a simple iterated construction building a variable-length input variable-length output function based on a fixed length permutation (or transformation). With this interface, a sponge function can also be used as a stream ci- pher, hence covering a wide range of functionality with hash functions and stream ciphers as particular points. From a theoretical point of view, sponge functions model in a very simple way the finite memory any concrete construction has access to. A random sponge function is as strong as a random oracle, except for the effects induced by the finite memory. This model can thus be used as an alternative to the random oracle model for expressing security claims. From a more practical point of view, the sponge construction and its sister construction, called the duplex construction, can be used to implement a large spectrum of the symmetric cryptography functionality. This includes hashing, reseedable pseudo random bit sequence generation, key derivation, encryption, message authentication code (MAC) computation and authenticated encryption. This provides users with a lot of functionality from a single fixed permutation, hence making the implementation easier. The designers of cryptographic primitives may also find it advantageous to develop a strong permutation without worrying about other components such as the key schedule of a block cipher.

Testing

NIST uses the Secure Hash Algorithm Validation System (SHAVS) to validate the correctness of hash implementations. For all four variants of SHA3 and Keccak, the keccak library’s implementations successfully pass the standard KATs (Known Answer Tests).

Eliminating explicit parentheses with a handy combinator This is a literate Haskell post about a trick I picked up in Raymond Smullyan’s “To Mock a Mockingbird”, a fantastic introduction to combinator

Arithmetic on a very small elliptic curve This document gives a detailed illustration of discrete elliptic curve mathematics. After reading this document, one should be able to carry out