uctypes
– access binary data in a structured way¶
This module implements “foreign data interface” for MicroPython. The idea
behind it is similar to CPython’s ctypes
module, but the actual API is
different, streamlined and optimized for small size. The basic idea of the
module is to define data structure layout with about the same power as the
C language allows, and then access it using familiar dot-syntax to reference
sub-fields.
Warning
uctypes
module allows access to arbitrary memory addresses of the
machine (including I/O and control registers). Uncareful usage of it
may lead to crashes, data loss, and even hardware malfunction.
See also
- Module
ustruct
- Standard Python way to access binary data structures (doesn’t scale well to large and complex structures).
Usage examples:
import uctypes
# Example 1: Subset of ELF file header
# https://wikipedia.org/wiki/Executable_and_Linkable_Format#File_header
ELF_HEADER = {
"EI_MAG": (0x0 | uctypes.ARRAY, 4 | uctypes.UINT8),
"EI_DATA": 0x5 | uctypes.UINT8,
"e_machine": 0x12 | uctypes.UINT16,
}
# "f" is an ELF file opened in binary mode
buf = f.read(uctypes.sizeof(ELF_HEADER, uctypes.LITTLE_ENDIAN))
header = uctypes.struct(uctypes.addressof(buf), ELF_HEADER, uctypes.LITTLE_ENDIAN)
assert header.EI_MAG == b"\x7fELF"
assert header.EI_DATA == 1, "Oops, wrong endianness. Could retry with uctypes.BIG_ENDIAN."
print("machine:", hex(header.e_machine))
# Example 2: In-memory data structure, with pointers
COORD = {
"x": 0 | uctypes.FLOAT32,
"y": 4 | uctypes.FLOAT32,
}
STRUCT1 = {
"data1": 0 | uctypes.UINT8,
"data2": 4 | uctypes.UINT32,
"ptr": (8 | uctypes.PTR, COORD),
}
# Suppose you have address of a structure of type STRUCT1 in "addr"
# uctypes.NATIVE is optional (used by default)
struct1 = uctypes.struct(addr, STRUCT1, uctypes.NATIVE)
print("x:", struct1.ptr[0].x)
# Example 3: Can also calculate field offsets automatically
from ucollections import OrderedDict
COORD = OrderedDict({
"x": uctypes.FLOAT32,
"y": uctypes.FLOAT32,
})
# Note that COORD is updated inplace!
uctypes.calc_offsets(COORD, uctypes.NATIVE)
# Example 4: Access to CPU registers. Subset of STM32F4xx WWDG block
WWDG_LAYOUT = {
"WWDG_CR": (0, {
# BFUINT32 here means size of the WWDG_CR register
"WDGA": 7 << uctypes.BF_POS | 1 << uctypes.BF_LEN | uctypes.BFUINT32,
"T": 0 << uctypes.BF_POS | 7 << uctypes.BF_LEN | uctypes.BFUINT32,
}),
"WWDG_CFR": (4, {
"EWI": 9 << uctypes.BF_POS | 1 << uctypes.BF_LEN | uctypes.BFUINT32,
"WDGTB": 7 << uctypes.BF_POS | 2 << uctypes.BF_LEN | uctypes.BFUINT32,
"W": 0 << uctypes.BF_POS | 7 << uctypes.BF_LEN | uctypes.BFUINT32,
}),
}
WWDG = uctypes.struct(0x40002c00, WWDG_LAYOUT)
WWDG.WWDG_CFR.WDGTB = 0b10
WWDG.WWDG_CR.WDGA = 1
print("Current counter:", WWDG.WWDG_CR.T)
Defining structure layout¶
Structure layout is defined by a “descriptor” - a Python dictionary which encodes field names as keys and other properties required to access them as associated values:
{
"field1": <properties>,
"field2": <properties>,
...
}
Properties are basicly offset and type, where types can be scalar (like
integers of different sizes, bitfields, floats), or aggregate (arrays,
structures, pointers, containing references to other types recursively).
Offsets can be either specified explicitly, or there’s a helper function
(calc_offsets()
) which can calculate offsets automatically in an ordered
dictionary of fields. Offsets are given in bytes from the structure start.
Following are encoding specification for various field types:
Scalar types:
"field_name": offset | uctypes.UINT32
in other words, the value is a scalar type identifier ORed with a field offset (in bytes) from the start of the structure.
Recursive structures:
"sub": (offset, { "b0": 0 | uctypes.UINT8, "b1": 1 | uctypes.UINT8, })
i.e. value is a 2-tuple, first element of which is an offset, and second is a structure descriptor dictionary (note: offsets in recursive descriptors are relative to the structure it defines). Of course, recursive structures can be specified not just by a literal dictionary, but by referring to a structure descriptor dictionary (defined earlier) by name.
Arrays of primitive types:
"arr": (offset | uctypes.ARRAY, size | uctypes.UINT8),
i.e. value is a 2-tuple, first element of which is ARRAY flag ORed with offset, and second is scalar element type ORed number of elements in the array.
Arrays of aggregate types:
"arr2": (offset | uctypes.ARRAY, size, {"b": 0 | uctypes.UINT8}),
i.e. value is a 3-tuple, first element of which is ARRAY flag ORed with offset, second is a number of elements in the array, and third is a descriptor of element type.
Pointer to a primitive type:
"ptr": (offset | uctypes.PTR, uctypes.UINT8),
i.e. value is a 2-tuple, first element of which is PTR flag ORed with offset, and second is a scalar element type.
Pointer to an aggregate type:
"ptr2": (offset | uctypes.PTR, {"b": 0 | uctypes.UINT8}),
i.e. value is a 2-tuple, first element of which is PTR flag ORed with offset, second is a descriptor of type pointed to.
Bitfields:
"bitf0": offset | uctypes.BFUINT16 | lsbit << uctypes.BF_POS | bitsize << uctypes.BF_LEN,
i.e. value is a type of scalar value containing given bitfield (typenames are similar to scalar types, but prefixes with
BF
), ORed with offset for scalar value containing the bitfield, and further ORed with values for bit position and bit length of the bitfield within the scalar value, shifted by BF_POS and BF_LEN bits, respectively. A bitfield position is counted from the least significant bit of the scalar (having position of 0), and is the number of right-most bit of a field (in other words, it’s a number of bits a scalar needs to be shifted right to extract the bitfield).In the example above, first a UINT16 value will be extracted at offset 0 (this detail may be important when accessing hardware registers, where particular access size and alignment are required), and then bitfield whose rightmost bit is lsbit bit of this UINT16, and length is bitsize bits, will be extracted. For example, if lsbit is 0 and bitsize is 8, then effectively it will access least-significant byte of UINT16.
Note that bitfield operations are independent of target byte endianness, in particular, example above will access least-significant byte of UINT16 in both little- and big-endian structures. But it depends on the least significant bit being numbered 0. Some targets may use different numbering in their native ABI, but
uctypes
always uses the normalized numbering described above.
Module contents¶
-
class
uctypes.
struct
(addr, descriptor, layout_type=NATIVE, /)¶ Instantiate a “foreign data structure” object based on structure address in memory, descriptor (encoded as a dictionary), and layout type (see below).
-
uctypes.
LITTLE_ENDIAN
¶ Layout type for a little-endian packed structure. (Packed means that every field occupies exactly as many bytes as defined in the descriptor, i.e. the alignment is 1).
-
uctypes.
BIG_ENDIAN
¶ Layout type for a big-endian packed structure.
-
uctypes.
NATIVE
¶ Layout type for a native structure - with data endianness and alignment conforming to the ABI of the system on which MicroPython runs.
-
uctypes.
sizeof
(struct, layout_type=NATIVE, /)¶ Return size of data structure in bytes. The struct argument can be either a structure class or a specific instantiated structure object (or its aggregate field).
-
uctypes.
calc_offsets
(ordered_desc, layout_type=NATIVE)¶ Automatically calculate (and update inplace) offsets of structure fields represented by ordered_desc, which should be an
OrderedDict
object. The fields of descriptor should contain only type information, but not offsets, except for a special value ofPREV_OFFSET
, which specifies that currently defined field should have the same offset as the previous defined field. This can be used to encode C unions. E.g. a union:union my_union { uint8_t byte; uint16_t word; uint32_t dword; };
should be represented as:
my_union = OrderedDict({ "byte": uctypes.UINT8, "word": uctypes.PREV_OFFSET | uctypes.UINT16, "dword": uctypes.PREV_OFFSET | uctypes.UINT32, })
(Note: first field of a union should not contain
PREV_OFFSET
, only second and following should.)Warning
The ordered_desc structure is updated inplace with offsets, this also includes fields for aggregate types which use tuple for encoding. (In other words, while
tuple
is immutable type in Python,calc_offsets()
is a special function which changes tuple structures passed to it). Due to this, ordered_dict in almost all cases should be an OrderedDict literal (as shown above).
-
uctypes.
addressof
(obj)¶ Return address of an object. Argument should be bytes, bytearray or other object supporting buffer protocol (and address of this buffer is what actually returned).
-
uctypes.
bytes_at
(addr[, size])¶ Capture memory at the given address and size as a bytes object. If size is not specified, capture a zero-terminated C string (still as a bytes object). As bytes object is immutable, the memory is actually duplicated and copied into bytes object, so if memory contents at the given address change later, the created object retains the original value.
-
uctypes.
bytearray_at
(addr, size)¶ Capture memory at the given address and size as bytearray object. Unlike bytes_at() function above, memory is captured by reference, so it can be both written too, and you will access current value at the given memory address.
-
uctypes.
UINT8
¶ -
uctypes.
INT8
¶ -
uctypes.
UINT16
¶ -
uctypes.
INT16
¶ -
uctypes.
UINT32
¶ -
uctypes.
INT32
¶ -
uctypes.
UINT64
¶ -
uctypes.
INT64
¶ Integer types for structure descriptors. Constants for 8, 16, 32, and 64 bit types are provided, both signed and unsigned.
-
uctypes.
SHORT
¶ -
uctypes.
USHORT
¶ -
uctypes.
INT
¶ -
uctypes.
UINT
¶ -
uctypes.
LONG
¶ -
uctypes.
ULONG
¶ -
uctypes.
LONGLONG
¶ -
uctypes.
ULONGLONG
¶ Native C data types (implemented as aliases to corresponding exact-size types).
Availability: Some ports may lack these constants.
-
uctypes.
VOID
¶ VOID
is an alias forUINT8
, and is provided to conveniently define C’s void pointers:(uctypes.PTR, uctypes.VOID)
.
-
uctypes.
PTR
¶ -
uctypes.
ARRAY
¶ Type constants for pointers and arrays. Note that there is no explicit constant for structures, it’s implicit: an aggregate type without
PTR
orARRAY
flags is a structure.
-
uctypes.
PREV_OFFSET
¶ Value which should be used for struct descriptor passed to
calc_offsets()
to indicate that current field shuld have the same offset as previous (i.e. effectively to define a C union).
Structure descriptors and instantiating structure objects¶
Given a structure descriptor dictionary and its layout type, you can
instantiate a specific structure instance at a given memory address
using uctypes.struct()
constructor. Memory address usually comes from
following sources:
- Predefined address, when accessing hardware registers on a baremetal system. Lookup these addresses in datasheet for a particular MCU/SoC.
- As a return value from a call to some FFI (Foreign Function Interface) function.
- From
uctypes.addressof()
, when you want to pass arguments to an FFI function, or alternatively, to access some data for I/O (for example, data read from a file or network socket).
Structure objects¶
Structure objects allow accessing individual fields using standard dot
notation: my_struct.substruct1.field1
. If a field is of scalar type,
getting it will produce a primitive value (Python integer or float)
corresponding to the value contained in a field. A scalar field can also
be assigned to.
If a field is an array, its individual elements can be accessed with
the standard subscript operator []
- both read and assigned to.
If a field is a pointer, it can be dereferenced using [0]
syntax
(corresponding to C *
operator, though [0]
works in C too).
Subscripting a pointer with other integer values but 0 are also supported,
with the same semantics as in C.
Summing up, accessing structure fields generally follows the C syntax,
except for pointer dereference, when you need to use [0]
operator
instead of *
.
Limitations¶
1. Accessing non-scalar fields leads to allocation of intermediate objects to represent them. This means that special care should be taken to layout a structure which needs to be accessed when memory allocation is disabled (e.g. from an interrupt). The recommendations are:
- Avoid accessing nested structures. For example, instead of
mcu_registers.peripheral_a.register1
, define separate layout descriptors for each peripheral, to be accessed asperipheral_a.register1
. Or just cache a particular peripheral:peripheral_a = mcu_registers.peripheral_a
. If a register consists of multiple bitfields, you would need to cache references to a particular register:reg_a = mcu_registers.peripheral_a.reg_a
. - Avoid other non-scalar data, like arrays. For example, instead of
peripheral_a.register[0]
useperipheral_a.register0
. Again, an alternative is to cache intermediate values, e.g.register0 = peripheral_a.register[0]
.
2. Range of offsets supported by the uctypes
module is limited.
The exact range supported is considered an implementation detail,
and the general suggestion is to split structure definitions to
cover from a few kilobytes to a few dozen of kilobytes maximum.
In most cases, this is a natural situation anyway, e.g. it doesn’t make
sense to define all registers of an MCU (spread over 32-bit address
space) in one structure, but rather a peripheral block by peripheral
block. In some extreme cases, you may need to split a structure in
several parts artificially (e.g. if accessing native data structure
with multi-megabyte array in the middle, though that would be a very
synthetic case).