The topic of working with low-level buffers may not come up often in Python, but there are occasions when the application we are building requires it. Whether it involves using serializers, interpreters, or working with sockets, we need to find efficient ways to slice, reshape, and modify buffers without causing inefficiency in our program.
Breaking down the problem
Slicing and manipulating bytes
and bytearrays
typically requires creating copies, which can significantly increase the memory usage of your applications as the data size grows. Additionally, it can extend the overall runtime as the data needs to be reassembled.
Your initial thought might be to use external libraries like numpy
, which is a common choice. However, in some cases, adding multiple dependencies may not be feasible or preferred.
So let’s take a look at how we might approach this in Python without any libraries.
MemoryView
A memoryview
is a built-in object that allows you to access an underlying object’s buffer interface without the need to create duplicate copies of the data. This feature makes it ideal for applications that require the efficient handling of large amounts of data.
Any modifications made to the buffer will reflect in the original object, enabling easy interpretation, slicing, and alterations to the buffer. The functionality of these buffers is similar to that of numpy
arrays, but with more restricted capabilities.
What can be used as buffers?
The main objects you are likely to use with a memoryview
are:
bytes
bytearray
array,array
ctypes
arrays
When might I need to use this?
There are lots of cases where you might be working with streams of data:
- Web Applications
- Device Interfaces
- Multimedia Editing
- Interpreters
- GPU Rendering
While there are many libraries that help facilitate these, sometimes the application you are writing involves directly working with this data, and in high-speed or high-volume environments, making duplicates of data can come at a cost.
Using Memoryview with Bytearrays
Let’s see how we can use memoryview
to manipulate a bytearray
:
Creation: You first need to create a memoryview
from the bytearray
.
arr = bytearray(b"capybara")
view = memoryview*(arr)
Slicing: You can slice the memoryview
, which doesn’t create a copy of the sliced data.
capy = view[0:3]
bara = view[4:-1]
Modifying: Changes made to the memoryview
are reflected in the original bytearray
.
view[0: 3] = b"Bara"
Shaping: You can also shape a memoryview
, but its functionality is very limited compared to numpy
.
reshaped = view.cast('c', (4, 2))
reshaped[(0, 0)] = b'c'
reshaped[(0, 1)] = b'a'
reshaped[(1, 0)] = b'p'
reshaped[(1, 1)] = b'y'
This will allow modify the array much like before, but using tuple
indices.
Note
The memoryview
does not support multi-dimensional subviews, so when you reshape it, it does not generate a list of subviews structured like a nested list. Instead, the indices are utilized to calculate the strides in order to locate the index in the array.
Valid
for y in range(4):
for x in range(2):
print(reshaped[(y, x)].decode(), end='')
print()
Invalid
for y in range(4):
print(reshaped[y].decode())
This is one of the ways it’s much more limited than numpy
.
Buffer objects
In Python 3.12, it’s now possible to create objects that are compatible with the buffer protocol. This enables the creation of wrappers for memoryview
in order to implement customized functionality.
class BufferedObject:
def __init__(self, data: bytes):
self.data = bytearray(data)
self.view = None
def __buffer__(self, flags: inspect.BufferFlags) -> memoryview:
if flags != inspect.BufferFlags.FULL_RO:
raise TypeError("Only BufferFlags.FULL_RO supported")
if self.view is not None:
raise RuntimeError("Buffer already held")
self.view = memoryview(self.data)
return self.view
def __release_buffer__(self, view: memoryview):
self.view = None
view.release()
def extend(self, b: bytes) -> None:
if self.view is not None:
raise RuntimeError("Cannot extend held buffer")
self.data.extend(b)
To implement a class that works with Python’s buffer protocol
__buffer__
: This method is used to initialize and return a memoryview
object. It receives inspect.BufferFlags. memoryview
passed the inspect.BufferFlags.FULL_RO
flag, so this is the flag to check for in this scenario.
__release_buffer__
: This method is called when a buffer is no longer needed. The buffer argument is a memoryview
object that was previously returned by buffer. All clean-up associated with the buffer must be done in this method. If no special clean-up is needed, then this method need not be implemented.
You can also type hint for a Buffer class by using the collections.abc.Buffer
type.
buffer: Buffer = BufferedObject("capybara")
Final thoughts
As you can see, memoryview
, the buffer protocol and Buffer objects provide a powerful and Pythonic tool for interacting with low-level C buffer objects without the need to duplicate data. This is especially powerful in situations where you are working with a high volume of data and external systems.