Lazier Copy-On-Write

Copy-on-write (COW) is a popular mechanism of lazy evaluation, that helps improve running speed and reduce memory requirements by transparently delaying the copying of data. Essentially, this is how it works: When you try to copy an object, you are instead given a fake object. Attempts to read that new object will instead read from the original object. Writing to the object will create a new copy (as originally requested) and refer to it in the future for reads and writes.

It allows to write simpler and safer programs. For instance, a programmer can pass "copies" of his data with minimal performance impact, and not have to worry about others changing his original data.

It's great, but can it be more?

I present here two proposals for extending this idea for even greater optimization. They are far from my areas of expertise, so I hope they still make sense.

1. Copy-On-Write + Fragmentation

Fragmentation of data is a mechanism that allows different parts (blocks) of data to reside in different locations in memory while appearing intact (that is, sequential). This mechanism is often accused of slowing the computer down. However, its a critical feature of virtual memory, which is a basis to all modern operating systems.

Introducing fragmentation into your data structures has many benefits, but let's discuss the benefits regarding COW, which may be obvious by now: On write, you don't have to copy all of the data, just the blocks which are being changed. This can be a big difference, if you're changing a few bytes in a 50mb string.

You still have to copy the meta-data (such as, where are the blocks, what is the next block, etc.), but that's a small price to pay, and a reasonable design requirement.

Now instead of copying the entire data, you copy only a fragment of it. How big is that fragment? Perhaps a fixed size, such as 64k. But assuming you have no real restriction on the size of these data blocks, the next logical step, in my eyes, is to ask: Why not make it as small as possible? That is, why not intentionally fragment the block into three smaller blocks: Before the area that is to be written, the area that is to be written, and after the area to be written. At this point we continue as we originally planned: We copy only the block which is to be written, which is, of course, exactly as small as it can be.

Eventually, we have a model in which writing n bytes into a COWed data of m bytes takes O(n) time and memory, instead of the original O(m+n) time and O(m) memory. I argue that in the common case, n is significantly smaller than m, and so the win is big.

Of course, fragmentation has a habit of slowing down reading times. When fragmentation is "too high", it is possible to defragment the memory (an occasional O(m) process). The optimal balance of fragmentation depends heavily on the frequency of reads vs of writes, but I argue that even a sub-optimal, common-case balance, will produce an improvement in performance.

Edit: I've been unclear about how it affects look-ups. Fragmentation to blocks of fixed size remains O(1) for getting and setting items. However, for variable-size blocks it's not so simple. A search tree can achieve a look-up of O(logn) where n is number of fragments, which is a lot slower than the original array peformance. It is probably only a good idea if you have access to the operating system's memory management, or if the use of look-ups is rare (and then an occasional defragmentation would still be necessary). Still, fixed-size fragments are good enough, and they can be dynamically resized with little cost, as long as the resize is uniform.

2. Copy-On-Write-And-ReaD

Or in short, COWARD, is a mechanism to even further delay copying, to only after the written data is also read. That is, when the programmer requests to write data, this mechanism will instead journal the data, producing sort of a "diff". Only when the programmer attempts to read the result, the original data is copied and the diff is applied. A diff structure is provided by any implementation of lazy evaluation, by definition, but perhaps there are other more suitable diff structures for this purpose.

This starts to make more sense with fragmentation: Then the diff can be applied only to the block that is read. And so, a block will be copied only if it is both written and read. In some cases, there may be very little intersection between the two (and so, very little copying).

So basically, COWARD is just a (non-)fancy name for an array of promises (not to be confused with a field of dreams). The (possible) novelty is in the way this array is created and used: transparently, and relatively efficiently. Note that, like the previous proposal, it provides little value in situations where the all of the data is altered or read. However, I argue it will significantly improve performance in cases where only part of the data is read and written.

It can, for instance, be useful in cases where an algorithm works on COWed data (which happens quite often) and provides more processing than the user requires. Using this method, only blocks that the user requests are copied -- and if the calculations themselves are lazy -- processed. And all of it transparent to both the user and the implementer of the algorithm .

Here's to lazier COWs!

Leave a Reply Cancel reply