Data types
A .ksy file is a YAML document that describes how to parse a binary format. The
core of any description is the seq section, a list of attributes that are read
in order from the stream. Each attribute usually carries a type key that tells
the compiler how many bytes to read and how to interpret them.
This page covers the built-in primitive types: integers, floating-point numbers, raw byte arrays, strings, and bit-sized integers. User-defined types and enums are documented separately.
Type names in Kaitai Struct encode both the kind of value and its width. For
example, u4 is an unsigned 4-byte integer and s2 is a signed 2-byte integer.
The leading letter is the kind; the trailing digit is the size in bytes.
Integers
Kaitai Struct provides fixed-width signed and unsigned integers. The first letter
is u (unsigned) or s (signed); the number is the byte width.
| Type | Signedness | Size (bytes) | Range |
|---|---|---|---|
u1 | unsigned | 1 | 0 … 255 |
u2 | unsigned | 2 | 0 … 65 535 |
u4 | unsigned | 4 | 0 … 4 294 967 295 |
u8 | unsigned | 8 | 0 … 18 446 744 073 709 551 615 |
s1 | signed | 1 | −128 … 127 |
s2 | signed | 2 | −32 768 … 32 767 |
s4 | signed | 4 | −2 147 483 648 … 2 147 483 647 |
s8 | signed | 8 | −9 223 372 036 854 775 808 … 9 223 372 036 854 775 807 |
seq:
- id: record_count
type: u4
- id: temperature
type: s2
Endianness
Multi-byte integers need an endianness: big-endian (be, most significant byte
first) or little-endian (le, least significant byte first). There are two ways
to set it.
Set a default for the whole type with meta/endian:
meta:
id: my_format
endian: be
seq:
- id: width
type: u4 # read as big-endian, per meta/endian
- id: height
type: u4
Or append a suffix to a single field to override the default (or to specify
endianness when no meta/endian is set):
seq:
- id: network_order
type: u4be # big-endian
- id: intel_order
type: u4le # little-endian
u1 and s1 are single-byte values, so endianness does not apply to them and no
suffix is needed. You only need le/be for types that are 2 bytes or wider.
Floating-point numbers
Floats follow IEEE 754. The digit is the byte width: f4 is single precision
(32-bit) and f8 is double precision (64-bit). Endianness is specified the same
way as for integers — through meta/endian or a per-field le/be suffix.
| Type | Precision | Size (bytes) |
|---|---|---|
f4 | single (IEEE 754) | 4 |
f8 | double (IEEE 754) | 8 |
seq:
- id: scale_factor
type: f4
- id: latitude
type: f8le # little-endian double
Byte arrays
A raw byte array is an attribute with no type key. You must tell the
compiler how many bytes to read; one of the sizing keys below is mandatory.
size— a fixed number, or an expression referencing an earlier field.size-eos: true— read all remaining bytes to the end of the stream.terminator— read up to (and by default consuming) a delimiter byte.
The contents key is a related but distinct mechanism: it asserts a fixed,
known sequence of bytes. With contents there is no need to specify size —
the length comes naturally from the listed bytes.
seq:
- id: magic
contents: [0x89, 0x50, 0x4e, 0x47] # PNG signature; parsing fails if it does not match
- id: uuid
size: 16
- id: payload
size: len_payload # length taken from an earlier field
- id: trailing
size-eos: true # everything left in the stream
The contents key accepts a list of byte values (decimal or hex) and may also
contain string literals, which are expanded to their byte values. If the bytes in
the stream do not match, parsing raises a validation error — this is how format
signatures and magic numbers are checked.
size and size-eos are mutually exclusive ways of bounding a field. Use size
when the length is known or computed, and size-eos only for the final field that
should consume whatever remains.
Strings
Strings are byte arrays decoded into text. Use type: str and supply an
encoding (for example ASCII, UTF-8, UTF-16LE, ISO-8859-1). A string
still needs a length, which you provide with the same sizing mechanisms as byte
arrays, or with a terminator byte.
| Key | Meaning |
|---|---|
encoding | Character encoding used to decode the bytes (required, unless set via meta/encoding). |
size | Fixed length in bytes. |
size-eos | Read to the end of the stream. |
terminator | A single byte value that ends the string. |
include | Whether the terminator byte is included in the value (default false). |
consume | Whether the stream position advances past the terminator (default true). |
eos-error | Whether reaching end-of-stream without finding the terminator is an error (default true). |
Fixed-size string:
seq:
- id: signature
type: str
size: 4
encoding: ASCII
Terminated string. type: strz is shorthand for a string terminated by a zero
byte (terminator: 0) — the classic C-style null-terminated string:
seq:
- id: filename
type: strz
encoding: UTF-8
A custom terminator, with the size-limiting keys spelled out (illustrative example):
seq:
- id: line
type: str
encoding: UTF-8
terminator: 0x0a # newline ends the string
include: false # do not keep the newline in the value
consume: true # advance past the newline
eos-error: false # tolerate a final line with no trailing newline
You can combine size and terminator: the field reads exactly size bytes, but
the decoded string stops at the terminator if one appears earlier. This models
fixed-width, null-padded string fields:
seq:
- id: name
type: str
size: 16
terminator: 0
encoding: ASCII
To avoid repeating encoding: on every string, set a default once with
meta/encoding. Per-field encoding: still overrides it where needed.
meta:
id: my_format
encoding: UTF-8
Bit-sized integers
When a format packs values into fewer than 8 bits — flags, small enums, version
nibbles — use bN, where N is the number of bits. For example b1 reads a
single bit (commonly treated as a boolean), b3 reads three bits, and b13 reads
thirteen bits spanning more than one byte.
meta:
id: my_format
bit-endian: be
seq:
- id: version
type: b4 # high nibble
- id: header_len
type: b4 # low nibble
- id: flag
type: b1 # single bit
- id: reserved
type: b11 # spans into the next byte
Bit fields are read most-conveniently when you declare the bit order with
meta/bit-endian:
bit-endian: be— bits fill each byte from the most significant bit toward the least significant.bit-endian: le— bits fill from the least significant bit toward the most significant.
You can also override the bit order on an individual field with a be/le
suffix, for example b5le or b3be.
Bit-sized parsing keeps an internal sub-byte position. When a non-bit type (such
as u2 or a byte array) follows bit fields, the reader aligns back to a byte
boundary before reading it. Keep this in mind when mixing bit fields with
byte-aligned fields in the same seq.
Quick reference
| Category | Types | Size | Notes |
|---|---|---|---|
| Unsigned integer | u1, u2, u4, u8 | 1–8 bytes | Append le/be for 2+ byte widths |
| Signed integer | s1, s2, s4, s8 | 1–8 bytes | Append le/be for 2+ byte widths |
| Float | f4, f8 | 4 / 8 bytes | IEEE 754; endianness via suffix or meta/endian |
| Byte array | (no type) | variable | Requires size, size-eos, or terminator (or use contents for fixed bytes) |
| String | str, strz | variable | Needs encoding; bound by size/size-eos/terminator |
| Bit integer | b1 … bN | N bits | Set order with meta/bit-endian (or bNle/bNbe) |