Skip to main content

Attributes & instances

A Kaitai Struct format description (.ksy) is a YAML document. Two of its sections describe where the data is and how to read it:

  • seq — a list of attributes parsed in order, one after another, starting from the current stream position.
  • instances — named values that are not part of the sequential read order. They are either computed from other fields or parsed from an explicit position in the stream.

This page covers the attribute keys you use most often inside seq, and the two kinds of instances.

note

Everything here is compiled by ksc (the kaitai-struct-compiler) into a parser in your target language — C++, C#, Go, Java, JavaScript, Lua, Nim, Perl, PHP, Python, Ruby, or Rust. The .ksy file is the single source of truth; you write it once and generate parsers for all of them.

Sequence attributes (seq)

Each entry in seq is an attribute spec — a mapping of keys that tell the compiler how to read one field. The most common keys are below.

KeyPurpose
idNames the attribute so you can reference it in expressions and in generated code.
typeThe data type to read (built-in like u4/str, or a user-defined type).
sizeNumber of bytes to read. A constant or an expression over earlier fields.
repeatRepeats the attribute: eos, expr (with repeat-expr), or until (with repeat-until).
ifA boolean expression; the attribute is only parsed when it evaluates to true.
contentsA fixed byte sequence the parser asserts must be present (used for magic signatures).
enumMaps the parsed integer to named constants declared under enums.
encodingText encoding used to decode a str field (e.g. UTF-8, ASCII).

id and type

id is the field name; type selects how the bytes are interpreted. Built-in integer types are u1/u2/u4/u8 (unsigned) and s1/s2/s4/s8 (signed); floats are f4/f8; text is str/strz. A type may also name a user-defined type declared under types.

seq:
- id: version
type: u2
- id: flags
type: u4

contents — fixed signatures

contents reads a fixed byte sequence and fails if the bytes do not match. It is the idiomatic way to check a file's magic.

seq:
- id: magic
contents: [0xca, 0xfe, 0xba, 0xbe]

size — explicit byte length

size sets how many bytes the attribute occupies. It can be a constant or an expression that refers to a field read earlier in the same seq.

seq:
- id: name_len
type: u4
- id: name
type: str
size: name_len
encoding: UTF-8

When you put size on an attribute whose type is a user-defined type, Kaitai Struct creates a substream limited to those bytes — the inner type can only read within them.

encoding — decoding strings

A str field requires an encoding so the raw bytes can be turned into a string. (strz reads a null-terminated string and uses a terminator.)

seq:
- id: comment
type: str
size: 32
encoding: ASCII

repeat — arrays

repeat turns a single attribute into an array. There are three forms:

FormCompanion keyReads…
repeat: eosuntil the end of the stream
repeat: exprrepeat-expra fixed count given by an expression
repeat: untilrepeat-untiluntil a per-element condition is true
seq:
- id: num_entries
type: u4
- id: entries
type: entry
repeat: expr
repeat-expr: num_entries

In repeat-until, the special variable _ refers to the element that was just read:

seq:
- id: records
type: record
repeat: until
repeat-until: _.is_last

if — conditional fields

if parses an attribute only when its boolean expression is true. The field is skipped entirely otherwise.

seq:
- id: has_crc32
type: u1
- id: crc32
type: u4
if: has_crc32 != 0

enum — named constants

enum maps the parsed integer onto names declared in the enums section, giving readable values instead of raw numbers.

seq:
- id: protocol
type: u1
enum: ip_protocol
enums:
ip_protocol:
1: icmp
6: tcp
17: udp

Instances

instances declares named members that sit outside the seq order. There are two kinds.

Value instances

A value instance has a value key. It is a derived expression computed from other fields — it reads nothing from the stream itself.

# Illustrative example
instances:
length_in_m:
value: length_in_feet * 0.3048
info

Value instances have no setter in generated serialization code. To change one, you change the fields it depends on and invalidate its cached result; the value is recomputed from those inputs.

Parse instances

A parse instance reads from the stream at an explicit position using pos. It accepts the same reading keys as a seq attribute (type, size, repeat, if, enum, encoding, …), plus:

KeyPurpose
posAbsolute position in the stream to seek to before reading.
ioWhich stream to read from (e.g. _root._io) when escaping a substream.
instances:
some_integer:
pos: 0x10
type: u4
body:
pos: ofs_body
size: len_body
type: str
encoding: UTF-8

The io key lets a parse instance read from a different stream than the one it was declared in. This is useful when the current object is a substream but the data you need lives in the root (or parent) stream:

instances:
body:
io: _root._io
pos: ofs_body
size: len_body

Lazy evaluation

Instances are lazy: they are not computed when the object is first parsed. Each instance is evaluated the first time it is accessed, and its result is cached for subsequent accesses.

tip

Laziness is why instances are well suited to large or rarely-needed regions of a file. A parse instance pointing at a multi-megabyte blob costs nothing until you actually read it. It also lets you describe fields whose position depends on values located later in the file — something a strictly sequential seq cannot express.

For example, a header at the start of a file can hold an offset to a structure near the end. A parse instance follows that offset on demand:

# Illustrative example
seq:
- id: ofs_footer
type: u4
instances:
footer:
pos: ofs_footer
type: footer
types:
footer:
seq:
- id: checksum
type: u4

Sources