Attributes & instances
A Kaitai Struct format description (.ksy) is a YAML document. Two of its
sections describe where the data is and how to read it:
seq— a list of attributes parsed in order, one after another, starting from the current stream position.instances— named values that are not part of the sequential read order. They are either computed from other fields or parsed from an explicit position in the stream.
This page covers the attribute keys you use most often inside seq, and the
two kinds of instances.
Everything here is compiled by ksc (the kaitai-struct-compiler) into a
parser in your target language — C++, C#, Go, Java, JavaScript, Lua, Nim,
Perl, PHP, Python, Ruby, or Rust. The .ksy file is the single source of
truth; you write it once and generate parsers for all of them.
Sequence attributes (seq)
Each entry in seq is an attribute spec — a mapping of keys that tell the
compiler how to read one field. The most common keys are below.
| Key | Purpose |
|---|---|
id | Names the attribute so you can reference it in expressions and in generated code. |
type | The data type to read (built-in like u4/str, or a user-defined type). |
size | Number of bytes to read. A constant or an expression over earlier fields. |
repeat | Repeats the attribute: eos, expr (with repeat-expr), or until (with repeat-until). |
if | A boolean expression; the attribute is only parsed when it evaluates to true. |
contents | A fixed byte sequence the parser asserts must be present (used for magic signatures). |
enum | Maps the parsed integer to named constants declared under enums. |
encoding | Text encoding used to decode a str field (e.g. UTF-8, ASCII). |
id and type
id is the field name; type selects how the bytes are interpreted. Built-in
integer types are u1/u2/u4/u8 (unsigned) and s1/s2/s4/s8
(signed); floats are f4/f8; text is str/strz. A type may also name a
user-defined type declared under types.
seq:
- id: version
type: u2
- id: flags
type: u4
contents — fixed signatures
contents reads a fixed byte sequence and fails if the bytes do not match. It
is the idiomatic way to check a file's magic.
seq:
- id: magic
contents: [0xca, 0xfe, 0xba, 0xbe]
size — explicit byte length
size sets how many bytes the attribute occupies. It can be a constant or an
expression that refers to a field read earlier in the same seq.
seq:
- id: name_len
type: u4
- id: name
type: str
size: name_len
encoding: UTF-8
When you put size on an attribute whose type is a user-defined type, Kaitai
Struct creates a substream limited to those bytes — the inner type can only
read within them.
encoding — decoding strings
A str field requires an encoding so the raw bytes can be turned into a
string. (strz reads a null-terminated string and uses a terminator.)
seq:
- id: comment
type: str
size: 32
encoding: ASCII
repeat — arrays
repeat turns a single attribute into an array. There are three forms:
| Form | Companion key | Reads… |
|---|---|---|
repeat: eos | — | until the end of the stream |
repeat: expr | repeat-expr | a fixed count given by an expression |
repeat: until | repeat-until | until a per-element condition is true |
seq:
- id: num_entries
type: u4
- id: entries
type: entry
repeat: expr
repeat-expr: num_entries
In repeat-until, the special variable _ refers to the element that was just
read:
seq:
- id: records
type: record
repeat: until
repeat-until: _.is_last
if — conditional fields
if parses an attribute only when its boolean expression is true. The field is
skipped entirely otherwise.
seq:
- id: has_crc32
type: u1
- id: crc32
type: u4
if: has_crc32 != 0
enum — named constants
enum maps the parsed integer onto names declared in the enums section,
giving readable values instead of raw numbers.
seq:
- id: protocol
type: u1
enum: ip_protocol
enums:
ip_protocol:
1: icmp
6: tcp
17: udp
Instances
instances declares named members that sit outside the seq order. There
are two kinds.
Value instances
A value instance has a value key. It is a derived expression computed from
other fields — it reads nothing from the stream itself.
# Illustrative example
instances:
length_in_m:
value: length_in_feet * 0.3048
Value instances have no setter in generated serialization code. To change one, you change the fields it depends on and invalidate its cached result; the value is recomputed from those inputs.
Parse instances
A parse instance reads from the stream at an explicit position using pos.
It accepts the same reading keys as a seq attribute (type, size,
repeat, if, enum, encoding, …), plus:
| Key | Purpose |
|---|---|
pos | Absolute position in the stream to seek to before reading. |
io | Which stream to read from (e.g. _root._io) when escaping a substream. |
instances:
some_integer:
pos: 0x10
type: u4
body:
pos: ofs_body
size: len_body
type: str
encoding: UTF-8
The io key lets a parse instance read from a different stream than the one it
was declared in. This is useful when the current object is a substream but the
data you need lives in the root (or parent) stream:
instances:
body:
io: _root._io
pos: ofs_body
size: len_body
Lazy evaluation
Instances are lazy: they are not computed when the object is first parsed. Each instance is evaluated the first time it is accessed, and its result is cached for subsequent accesses.
Laziness is why instances are well suited to large or rarely-needed regions of
a file. A parse instance pointing at a multi-megabyte blob costs nothing until
you actually read it. It also lets you describe fields whose position depends on
values located later in the file — something a strictly sequential seq
cannot express.
For example, a header at the start of a file can hold an offset to a structure near the end. A parse instance follows that offset on demand:
# Illustrative example
seq:
- id: ofs_footer
type: u4
instances:
footer:
pos: ofs_footer
type: footer
types:
footer:
seq:
- id: checksum
type: u4