The expression language & imports
Almost every field in a .ksy file that is not a literal value — a size:, an if:, a repeat-until:, a value: instance, a pos: — accepts an expression. The expression language is a small, object-oriented, statically-typed language that the compiler (ksc, the kaitai-struct-compiler) transpiles into your chosen target language. It deliberately borrows syntax from C, Java, C#, Ruby, Python, JavaScript and Scala, so most of it reads the way you would expect.
Because everything is statically typed, expressions are checked at compile time and produce idiomatic, native code in each output language rather than a runtime interpreter.
You will use expressions in size:, if:, repeat-expr:, repeat-until:, pos:, process: arguments, parametric type arguments, valid: checks, and in value: instances. Anywhere the reference calls for an "expression", everything in this page applies.
Operators
Arithmetic
Work on integers and floats. The + operator is also string concatenation.
| Operator | Meaning |
|---|---|
a + b | addition (or concat) |
a - b | subtraction |
a * b | multiplication |
a / b | division |
a % b | modulo |
Relational
<, <=, >, >= compare numbers. == and != work on integers, floats, strings, booleans and enums.
Bitwise (integers only)
| Operator | Meaning |
|---|---|
a << b | left shift |
a >> b | right shift |
a & b | bitwise AND |
a | b | bitwise OR |
a ^ b | bitwise XOR |
Logical (booleans only)
not x, a and b, a or b.
Ternary (if-then-else)
condition ? if_true : if_false
For example, choosing a field size from an enum value:
# illustrative example
seq:
- id: code
type: u4
enum: block_type
- id: payload
size: code == block_type::int32 ? 4 : 8
Built-in methods
Methods are called with the .method / .method(args) syntax. The table below lists the commonly used ones grouped by the type they apply to.
| Type | Member / method | Result |
|---|---|---|
| Integer | .to_s | decimal string representation |
| Float | .to_i | truncate to integer |
| Boolean | .to_i | 0 for false, 1 for true |
| Enum | .to_i | underlying integer value |
| String | .length | number of characters |
| String | .reverse | reversed string |
| String | .substring(from, to) | substring (includes from, excludes to) |
| String | .to_i / .to_i(radix) | parse to integer (decimal, or in the given radix) |
| Byte array | .length | number of bytes |
| Byte array | .to_s(encoding) | decode bytes to a string with the named encoding |
| Array | .first, .last | first / last element |
| Array | .size | element count |
| Array | .min, .max | minimum / maximum element |
Stream (_io) | .eof | true if at end of stream |
Stream (_io) | .size | total size of the stream in bytes |
Stream (_io) | .pos | current position in bytes |
some_str.to_i(16) parses a string of hex digits into an integer — handy when a format stores numbers as ASCII text rather than raw bytes.
Special objects
Every user-defined type exposes a few pseudo-attributes you can reference from expressions.
| Name | Refers to |
|---|---|
_root | the top-level structure in the current file |
_parent | the structure that produced this particular instance |
_io | the stream associated with this object |
_ | the element just parsed (in repeat-until), or the value being checked (in valid:) |
_io is what gives you "bytes remaining" style logic, e.g. _io.size - _io.pos:
# illustrative example
seq:
- id: rest
size: _io.size - _io.pos
The _ variable is most often seen in repeat-until, where it is the value that was just read:
seq:
- id: numbers
type: s4
repeat: until
repeat-until: _ == -1
It also appears in valid: expression constraints, where it is the value of the field being validated:
- id: even_value
type: u4
valid:
expr: _ % 2 == 0
_io during serializationWhen you use Kaitai Struct's serialization support (writing data, currently for Java and Python), expressions that depend on _io — such as _io.size - _io.pos — are evaluated during the write step rather than as a standalone consistency check, because the stream does not exist yet for a freshly constructed object. See the serialization guide for the details.
Navigating the object tree
Use the . operator to read attributes of nested types, and [index] to index into arrays. Navigation can chain across multiple levels and can reach upward via _parent / _root.
# illustrative example
seq:
- id: header
type: main_header
- id: body
size: header.body_len
A deeper path simply chains the dots: header.subheader_1.field_4.
sizeof and _sizeof
Kaitai Struct provides compile-time size operators. They only resolve when the size is known at compile time; otherwise ksc raises a compile error.
- Type form —
sizeof<type>andbitsizeof<type>give the byte / bit size of a type, e.g.sizeof<u4>is4,bitsizeof<b13>is13, andsizeof<some_user_type>is the size of that type. - Value form —
field_name._sizeof(andfield_name._bitsizeof) give the size occupied by a specific parsed field's value.
# illustrative example
instances:
trailer_pos:
value: header._sizeof + body._sizeof
Typecasting
When the inferred type of an expression is too general (for instance a switch-on field that could be one of several types, or a generic value), you can enforce a concrete type with the .as<type_name> cast so that you can then access that type's members:
# illustrative example
instances:
first_record:
value: records[0].as<record_v2>
A cast does not change any bytes; it only tells the compiler which type to treat the value as, so downstream member access type-checks correctly.
Enum references
Enums are declared with enums: and referenced in expressions with the enum_name::value syntax. Comparing against named values is far more readable than comparing against raw integers, and == / != work directly on enum values.
seq:
- id: protocol
type: u1
enum: ip_protocol
- id: crc32
type: u4
if: protocol == ip_protocol::tcp
enums:
ip_protocol:
1: icmp
6: tcp
17: udp
To get back the numeric value of an enum field, use .to_i (for example protocol.to_i).
Importing types from other .ksy files
Large or shared structures are best factored into their own .ksy files and pulled in with the imports key under meta. Once imported, the other file's meta/id becomes usable as a type name, exactly like a locally defined type.
Define the reusable type in its own file:
# date.ksy
meta:
id: date
seq:
- id: year
type: u2le
- id: month
type: u2le
- id: day
type: u2le
Then import and use it:
# filelist.ksy
meta:
id: filelist
imports:
- date
seq:
- id: entries
type: entry
repeat: eos
types:
entry:
seq:
- id: filename
type: strz
encoding: ASCII
- id: timestamp
type: date
Imported types behave like any other type in expressions — you can read their members:
# illustrative example
meta:
id: doc_container
imports:
- date
seq:
- id: timestamp
type: date
- id: data_size
type: u4
if: timestamp.year >= 2000
Import paths
| Path style | Resolved against |
|---|---|
Relative (foo, foo/bar, ../foo/bar/baz) | the directory of the current .ksy file |
Absolute (/common_types, /formats/image_headers) | the compiler's module search paths |
Absolute paths are searched, in order, in:
- Paths given with the
-Icommand-line switch toksc. - Paths in the
KSPATHenvironment variable (separated by:on Linux/macOS,;on Windows). - Default platform-dependent search paths.
Write import paths with / regardless of operating system — ksc converts them to the right path separator for you. Do not include the .ksy extension; use the format's meta/id.
You can see plenty of real imports in action in the official format gallery, where many specs reuse common building blocks across formats.