Skip to main content

Data types

A .ksy file is a YAML document that describes how to parse a binary format. The core of any description is the seq section, a list of attributes that are read in order from the stream. Each attribute usually carries a type key that tells the compiler how many bytes to read and how to interpret them.

This page covers the built-in primitive types: integers, floating-point numbers, raw byte arrays, strings, and bit-sized integers. User-defined types and enums are documented separately.

note

Type names in Kaitai Struct encode both the kind of value and its width. For example, u4 is an unsigned 4-byte integer and s2 is a signed 2-byte integer. The leading letter is the kind; the trailing digit is the size in bytes.

Integers

Kaitai Struct provides fixed-width signed and unsigned integers. The first letter is u (unsigned) or s (signed); the number is the byte width.

TypeSignednessSize (bytes)Range
u1unsigned10 … 255
u2unsigned20 … 65 535
u4unsigned40 … 4 294 967 295
u8unsigned80 … 18 446 744 073 709 551 615
s1signed1−128 … 127
s2signed2−32 768 … 32 767
s4signed4−2 147 483 648 … 2 147 483 647
s8signed8−9 223 372 036 854 775 808 … 9 223 372 036 854 775 807
seq:
- id: record_count
type: u4
- id: temperature
type: s2

Endianness

Multi-byte integers need an endianness: big-endian (be, most significant byte first) or little-endian (le, least significant byte first). There are two ways to set it.

Set a default for the whole type with meta/endian:

meta:
id: my_format
endian: be
seq:
- id: width
type: u4 # read as big-endian, per meta/endian
- id: height
type: u4

Or append a suffix to a single field to override the default (or to specify endianness when no meta/endian is set):

seq:
- id: network_order
type: u4be # big-endian
- id: intel_order
type: u4le # little-endian
tip

u1 and s1 are single-byte values, so endianness does not apply to them and no suffix is needed. You only need le/be for types that are 2 bytes or wider.

Floating-point numbers

Floats follow IEEE 754. The digit is the byte width: f4 is single precision (32-bit) and f8 is double precision (64-bit). Endianness is specified the same way as for integers — through meta/endian or a per-field le/be suffix.

TypePrecisionSize (bytes)
f4single (IEEE 754)4
f8double (IEEE 754)8
seq:
- id: scale_factor
type: f4
- id: latitude
type: f8le # little-endian double

Byte arrays

A raw byte array is an attribute with no type key. You must tell the compiler how many bytes to read; one of the sizing keys below is mandatory.

  • size — a fixed number, or an expression referencing an earlier field.
  • size-eos: true — read all remaining bytes to the end of the stream.
  • terminator — read up to (and by default consuming) a delimiter byte.

The contents key is a related but distinct mechanism: it asserts a fixed, known sequence of bytes. With contents there is no need to specify size — the length comes naturally from the listed bytes.

seq:
- id: magic
contents: [0x89, 0x50, 0x4e, 0x47] # PNG signature; parsing fails if it does not match
- id: uuid
size: 16
- id: payload
size: len_payload # length taken from an earlier field
- id: trailing
size-eos: true # everything left in the stream

The contents key accepts a list of byte values (decimal or hex) and may also contain string literals, which are expanded to their byte values. If the bytes in the stream do not match, parsing raises a validation error — this is how format signatures and magic numbers are checked.

info

size and size-eos are mutually exclusive ways of bounding a field. Use size when the length is known or computed, and size-eos only for the final field that should consume whatever remains.

Strings

Strings are byte arrays decoded into text. Use type: str and supply an encoding (for example ASCII, UTF-8, UTF-16LE, ISO-8859-1). A string still needs a length, which you provide with the same sizing mechanisms as byte arrays, or with a terminator byte.

KeyMeaning
encodingCharacter encoding used to decode the bytes (required, unless set via meta/encoding).
sizeFixed length in bytes.
size-eosRead to the end of the stream.
terminatorA single byte value that ends the string.
includeWhether the terminator byte is included in the value (default false).
consumeWhether the stream position advances past the terminator (default true).
eos-errorWhether reaching end-of-stream without finding the terminator is an error (default true).

Fixed-size string:

seq:
- id: signature
type: str
size: 4
encoding: ASCII

Terminated string. type: strz is shorthand for a string terminated by a zero byte (terminator: 0) — the classic C-style null-terminated string:

seq:
- id: filename
type: strz
encoding: UTF-8

A custom terminator, with the size-limiting keys spelled out (illustrative example):

seq:
- id: line
type: str
encoding: UTF-8
terminator: 0x0a # newline ends the string
include: false # do not keep the newline in the value
consume: true # advance past the newline
eos-error: false # tolerate a final line with no trailing newline

You can combine size and terminator: the field reads exactly size bytes, but the decoded string stops at the terminator if one appears earlier. This models fixed-width, null-padded string fields:

seq:
- id: name
type: str
size: 16
terminator: 0
encoding: ASCII
tip

To avoid repeating encoding: on every string, set a default once with meta/encoding. Per-field encoding: still overrides it where needed.

meta:
id: my_format
encoding: UTF-8

Bit-sized integers

When a format packs values into fewer than 8 bits — flags, small enums, version nibbles — use bN, where N is the number of bits. For example b1 reads a single bit (commonly treated as a boolean), b3 reads three bits, and b13 reads thirteen bits spanning more than one byte.

meta:
id: my_format
bit-endian: be
seq:
- id: version
type: b4 # high nibble
- id: header_len
type: b4 # low nibble
- id: flag
type: b1 # single bit
- id: reserved
type: b11 # spans into the next byte

Bit fields are read most-conveniently when you declare the bit order with meta/bit-endian:

  • bit-endian: be — bits fill each byte from the most significant bit toward the least significant.
  • bit-endian: le — bits fill from the least significant bit toward the most significant.

You can also override the bit order on an individual field with a be/le suffix, for example b5le or b3be.

note

Bit-sized parsing keeps an internal sub-byte position. When a non-bit type (such as u2 or a byte array) follows bit fields, the reader aligns back to a byte boundary before reading it. Keep this in mind when mixing bit fields with byte-aligned fields in the same seq.

Quick reference

CategoryTypesSizeNotes
Unsigned integeru1, u2, u4, u81–8 bytesAppend le/be for 2+ byte widths
Signed integers1, s2, s4, s81–8 bytesAppend le/be for 2+ byte widths
Floatf4, f84 / 8 bytesIEEE 754; endianness via suffix or meta/endian
Byte array(no type)variableRequires size, size-eos, or terminator (or use contents for fixed bytes)
Stringstr, strzvariableNeeds encoding; bound by size/size-eos/terminator
Bit integerb1bNN bitsSet order with meta/bit-endian (or bNle/bNbe)

Sources