The word “buffering” often appears when watching an online video – the display pauses for a bit while the next bunch of data downloads and then the cute kittens start moving again. In this example, “buffering” is the process of using a bit of memory (the “buffer”) to smooth out the supply of data from the network before sending it to the video player – the display pauses until sufficient data has come from the network for it to be worth playing the video again.
The arrivals hall at an airport is another example of buffering – people arrive in batches, one aircraft-load at a time, and join queues snaking around barriers. From the point of view of the immigration officer, there’s a steady flow of people showing their passports. The queue has served to smooth out the very lumpy arrival of several hundred people into a steady stream.
Such buffers have a cost – it takes a certain amount of time for the first frame of the video to make it through the buffer, or the first person off the flight to get to the immigration officer – but they can speed things up immensely.
We’ve been moving big datasets around recently – several TB in size – and one thing which has sped this up has been buffering. There are three major processes: reading the data on one machine, moving the data across the network and then writing the data on another machine. All of these are subject to quite lumpy behaviour – perhaps we’re requesting some files which happen to be adjacent to each other on the disk then some files stored in quite separate places or the network isn’t used by anything else for a bit then someone starts another big transfer. By putting some buffers between each stage, we can smooth out the delays and get a speedier overall transfer.
We used mbuffer to aid in a ZFS send/recv operation, running
zfs send -R Foo/Bar/Baz@snap1 | mbuffer -s 128k -m 4G -O OtherHost:9090
on the source and
nc -l 9090 | mbuffer -s 128k -m 4G | zfs recv Foo/Bar/Baz
on the destination. This combination takes about 25% of the time of the more traditional
zfs send -R Foo/Bar/Baz@snap1 | ssh OtherHost zfs recv Foo/Bar/Baz
.