Handling binary data in Elixir
Hello,
I like functional programming and the associated benefit/safety of immutable data structures.
I just finished reading Elixir in Action, and I found it incredibly well written. Along with that, I’ve been solving challenges on the Hackattic website using Elixir lang. It has given me the opportunity to work with binaries while decoding base64-encoded data, reading .wav
and .rdb
files.
In the beginning it was difficult to even read a fixed number of bytes, but I’ve slowly started to appreciate pattern matching constructs and macros of the Elixir lang that made the process much simpler.
I’d like to show you a few techniques that I think will help you.
Types
In Elixir, binaries and bitstrings are represented using the special form <<>>
available as Kernel.<<>>/1
. There are a few types available, but we will focus on bitstring
, binary
and integer
.
bitstring
- as the name suggests, a sequence of bits.binary
- a sequence of bits but the # of bits is a multiple of 8, i.e., a sequence of bytesinteger
- an 8-bit number
In addition, the above types can be paired with unit
and size
.
From docs:
The length of the match is equal to the unit (a number of bits) times the size (the number of repeated segments of length unit).
Combining {type, unit, size} combo with Elixir’s pattern matching construct makes working with binary data a breeze.
See the following examples for a better understanding.
Working with bits
To fetch the two most significant bits of an 8-byte integer:
<< first_two::bitstring-size(2), rest::bitstring >> = number
It is important to note that rest
is not guaranteed to be a binary
and hence it is safe to use bitstring
.
You can go further and pattern match on the first_two
bits to see what they represent
case first_2 do
<<0::2>> ->
IO.puts("ZERO")
<<0::1, 1::1>> ->
IO.puts("ONE")
<<1::1, 0::1>> ->
IO.puts("TWO")
<<1::1, 1::1>> ->
IO.puts("THREE")
end
Know that <<0::2>>
is equivalent to <<0::1, 0::1>>
.
Endianness
To read a signed 32bit integer in little endian from a bitstring:
<< number::integer-signed-32-little, rest::bitstring >> = data
likewise, to read an unsigned 64bit integer in big endian:
<< number::integer-unsigned-64-big, rest::bitstring >> = data
Using macros
Reading a signed integer of varying byte sizes can be cumbersome when using the above shown notation. Luckily, Elixir’s rich macro offering can help make this much easier.
The notation integer-signed-32-little
can be condensed to int32LE
by defining a macro
defmacro int32LE, do: quote(do: integer - signed - 32 - little)
which can be used as
<< chunkSize::int32LE(), rest::binary >> = data
Matching Strings
It is possible to pattern match on binary data with strings too.
iex(1)> << "fmt" >> = << 102, 109, 116 >>
"fmt"
Now, using all the above in one go
<< "RIFF", chunkSize::int32LE(), "WAVE", rest::binary >> = bytes
…and you’re almost there parsing the header of a .wav
file :)
Bonus
If you’re interested in seeing more of Elixir’s pattern matching on binaries in action, then checkout my implementations of
As always, to gain a deeper understanding of Kernel.<<>>/1
, the official elixir documentation is your best friend.
~rahultumpala