Data types

A .ksy file is a YAML document that describes how to parse a binary format. The core of any description is the seq section, a list of attributes that are read in order from the stream. Each attribute usually carries a type key that tells the compiler how many bytes to read and how to interpret them.

This page covers the built-in primitive types: integers, floating-point numbers, raw byte arrays, strings, and bit-sized integers. User-defined types and enums are documented separately.

note

Type names in Kaitai Struct encode both the kind of value and its width. For example, u4 is an unsigned 4-byte integer and s2 is a signed 2-byte integer. The leading letter is the kind; the trailing digit is the size in bytes.

Integers

Kaitai Struct provides fixed-width signed and unsigned integers. The first letter is u (unsigned) or s (signed); the number is the byte width.

Type	Signedness	Size (bytes)	Range
`u1`	unsigned	1	0 … 255
`u2`	unsigned	2	0 … 65 535
`u4`	unsigned	4	0 … 4 294 967 295
`u8`	unsigned	8	0 … 18 446 744 073 709 551 615
`s1`	signed	1	−128 … 127
`s2`	signed	2	−32 768 … 32 767
`s4`	signed	4	−2 147 483 648 … 2 147 483 647
`s8`	signed	8	−9 223 372 036 854 775 808 … 9 223 372 036 854 775 807

seq:
  - id: record_count
    type: u4
  - id: temperature
    type: s2

Endianness

Multi-byte integers need an endianness: big-endian (be, most significant byte first) or little-endian (le, least significant byte first). There are two ways to set it.

Set a default for the whole type with meta/endian:

meta:
  id: my_format
  endian: be
seq:
  - id: width
    type: u4   # read as big-endian, per meta/endian
  - id: height
    type: u4

Or append a suffix to a single field to override the default (or to specify endianness when no meta/endian is set):

seq:
  - id: network_order
    type: u4be   # big-endian
  - id: intel_order
    type: u4le   # little-endian

tip

u1 and s1 are single-byte values, so endianness does not apply to them and no suffix is needed. You only need le/be for types that are 2 bytes or wider.

Floating-point numbers

Floats follow IEEE 754. The digit is the byte width: f4 is single precision (32-bit) and f8 is double precision (64-bit). Endianness is specified the same way as for integers — through meta/endian or a per-field le/be suffix.

Type	Precision	Size (bytes)
`f4`	single (IEEE 754)	4
`f8`	double (IEEE 754)	8

seq:
  - id: scale_factor
    type: f4
  - id: latitude
    type: f8le   # little-endian double

Byte arrays

A raw byte array is an attribute with no type key. You must tell the compiler how many bytes to read; one of the sizing keys below is mandatory.

size — a fixed number, or an expression referencing an earlier field.
size-eos: true — read all remaining bytes to the end of the stream.
terminator — read up to (and by default consuming) a delimiter byte.

The contents key is a related but distinct mechanism: it asserts a fixed, known sequence of bytes. With contents there is no need to specify size — the length comes naturally from the listed bytes.

seq:
  - id: magic
    contents: [0x89, 0x50, 0x4e, 0x47]   # PNG signature; parsing fails if it does not match
  - id: uuid
    size: 16
  - id: payload
    size: len_payload                    # length taken from an earlier field
  - id: trailing
    size-eos: true                       # everything left in the stream

The contents key accepts a list of byte values (decimal or hex) and may also contain string literals, which are expanded to their byte values. If the bytes in the stream do not match, parsing raises a validation error — this is how format signatures and magic numbers are checked.

info

size and size-eos are mutually exclusive ways of bounding a field. Use size when the length is known or computed, and size-eos only for the final field that should consume whatever remains.

Strings

Strings are byte arrays decoded into text. Use type: str and supply an encoding (for example ASCII, UTF-8, UTF-16LE, ISO-8859-1). A string still needs a length, which you provide with the same sizing mechanisms as byte arrays, or with a terminator byte.

Key	Meaning
`encoding`	Character encoding used to decode the bytes (required, unless set via `meta/encoding`).
`size`	Fixed length in bytes.
`size-eos`	Read to the end of the stream.
`terminator`	A single byte value that ends the string.
`include`	Whether the terminator byte is included in the value (default `false`).
`consume`	Whether the stream position advances past the terminator (default `true`).
`eos-error`	Whether reaching end-of-stream without finding the terminator is an error (default `true`).

Fixed-size string:

seq:
  - id: signature
    type: str
    size: 4
    encoding: ASCII

Terminated string. type: strz is shorthand for a string terminated by a zero byte (terminator: 0) — the classic C-style null-terminated string:

seq:
  - id: filename
    type: strz
    encoding: UTF-8

A custom terminator, with the size-limiting keys spelled out (illustrative example):

seq:
  - id: line
    type: str
    encoding: UTF-8
    terminator: 0x0a   # newline ends the string
    include: false     # do not keep the newline in the value
    consume: true      # advance past the newline
    eos-error: false   # tolerate a final line with no trailing newline

You can combine size and terminator: the field reads exactly size bytes, but the decoded string stops at the terminator if one appears earlier. This models fixed-width, null-padded string fields:

seq:
  - id: name
    type: str
    size: 16
    terminator: 0
    encoding: ASCII

tip

To avoid repeating encoding: on every string, set a default once with meta/encoding. Per-field encoding: still overrides it where needed.

meta:
  id: my_format
  encoding: UTF-8

Bit-sized integers

When a format packs values into fewer than 8 bits — flags, small enums, version nibbles — use bN, where N is the number of bits. For example b1 reads a single bit (commonly treated as a boolean), b3 reads three bits, and b13 reads thirteen bits spanning more than one byte.

meta:
  id: my_format
  bit-endian: be
seq:
  - id: version
    type: b4    # high nibble
  - id: header_len
    type: b4    # low nibble
  - id: flag
    type: b1    # single bit
  - id: reserved
    type: b11   # spans into the next byte

Bit fields are read most-conveniently when you declare the bit order with meta/bit-endian:

bit-endian: be — bits fill each byte from the most significant bit toward the least significant.
bit-endian: le — bits fill from the least significant bit toward the most significant.

You can also override the bit order on an individual field with a be/le suffix, for example b5le or b3be.

note

Bit-sized parsing keeps an internal sub-byte position. When a non-bit type (such as u2 or a byte array) follows bit fields, the reader aligns back to a byte boundary before reading it. Keep this in mind when mixing bit fields with byte-aligned fields in the same seq.

Quick reference

Category	Types	Size	Notes
Unsigned integer	`u1`, `u2`, `u4`, `u8`	1–8 bytes	Append `le`/`be` for 2+ byte widths
Signed integer	`s1`, `s2`, `s4`, `s8`	1–8 bytes	Append `le`/`be` for 2+ byte widths
Float	`f4`, `f8`	4 / 8 bytes	IEEE 754; endianness via suffix or `meta/endian`
Byte array	(no `type`)	variable	Requires `size`, `size-eos`, or `terminator` (or use `contents` for fixed bytes)
String	`str`, `strz`	variable	Needs `encoding`; bound by `size`/`size-eos`/`terminator`
Bit integer	`b1` … `bN`	N bits	Set order with `meta/bit-endian` (or `bNle`/`bNbe`)

Integers​

Endianness​

Floating-point numbers​

Byte arrays​

Strings​

Bit-sized integers​

Quick reference​

Sources​