Skip to main content

The expression language & imports

Almost every field in a .ksy file that is not a literal value — a size:, an if:, a repeat-until:, a value: instance, a pos: — accepts an expression. The expression language is a small, object-oriented, statically-typed language that the compiler (ksc, the kaitai-struct-compiler) transpiles into your chosen target language. It deliberately borrows syntax from C, Java, C#, Ruby, Python, JavaScript and Scala, so most of it reads the way you would expect.

Because everything is statically typed, expressions are checked at compile time and produce idiomatic, native code in each output language rather than a runtime interpreter.

Where expressions are allowed

You will use expressions in size:, if:, repeat-expr:, repeat-until:, pos:, process: arguments, parametric type arguments, valid: checks, and in value: instances. Anywhere the reference calls for an "expression", everything in this page applies.

Operators

Arithmetic

Work on integers and floats. The + operator is also string concatenation.

OperatorMeaning
a + baddition (or concat)
a - bsubtraction
a * bmultiplication
a / bdivision
a % bmodulo

Relational

<, <=, >, >= compare numbers. == and != work on integers, floats, strings, booleans and enums.

Bitwise (integers only)

OperatorMeaning
a << bleft shift
a >> bright shift
a & bbitwise AND
a | bbitwise OR
a ^ bbitwise XOR

Logical (booleans only)

not x, a and b, a or b.

Ternary (if-then-else)

condition ? if_true : if_false

For example, choosing a field size from an enum value:

# illustrative example
seq:
- id: code
type: u4
enum: block_type
- id: payload
size: code == block_type::int32 ? 4 : 8

Built-in methods

Methods are called with the .method / .method(args) syntax. The table below lists the commonly used ones grouped by the type they apply to.

TypeMember / methodResult
Integer.to_sdecimal string representation
Float.to_itruncate to integer
Boolean.to_i0 for false, 1 for true
Enum.to_iunderlying integer value
String.lengthnumber of characters
String.reversereversed string
String.substring(from, to)substring (includes from, excludes to)
String.to_i / .to_i(radix)parse to integer (decimal, or in the given radix)
Byte array.lengthnumber of bytes
Byte array.to_s(encoding)decode bytes to a string with the named encoding
Array.first, .lastfirst / last element
Array.sizeelement count
Array.min, .maxminimum / maximum element
Stream (_io).eoftrue if at end of stream
Stream (_io).sizetotal size of the stream in bytes
Stream (_io).poscurrent position in bytes
Parsing a hex string

some_str.to_i(16) parses a string of hex digits into an integer — handy when a format stores numbers as ASCII text rather than raw bytes.

Special objects

Every user-defined type exposes a few pseudo-attributes you can reference from expressions.

NameRefers to
_rootthe top-level structure in the current file
_parentthe structure that produced this particular instance
_iothe stream associated with this object
_the element just parsed (in repeat-until), or the value being checked (in valid:)

_io is what gives you "bytes remaining" style logic, e.g. _io.size - _io.pos:

# illustrative example
seq:
- id: rest
size: _io.size - _io.pos

The _ variable is most often seen in repeat-until, where it is the value that was just read:

seq:
- id: numbers
type: s4
repeat: until
repeat-until: _ == -1

It also appears in valid: expression constraints, where it is the value of the field being validated:

- id: even_value
type: u4
valid:
expr: _ % 2 == 0
note
_io during serialization

When you use Kaitai Struct's serialization support (writing data, currently for Java and Python), expressions that depend on _io — such as _io.size - _io.pos — are evaluated during the write step rather than as a standalone consistency check, because the stream does not exist yet for a freshly constructed object. See the serialization guide for the details.

Use the . operator to read attributes of nested types, and [index] to index into arrays. Navigation can chain across multiple levels and can reach upward via _parent / _root.

# illustrative example
seq:
- id: header
type: main_header
- id: body
size: header.body_len

A deeper path simply chains the dots: header.subheader_1.field_4.

sizeof and _sizeof

Kaitai Struct provides compile-time size operators. They only resolve when the size is known at compile time; otherwise ksc raises a compile error.

  • Type formsizeof<type> and bitsizeof<type> give the byte / bit size of a type, e.g. sizeof<u4> is 4, bitsizeof<b13> is 13, and sizeof<some_user_type> is the size of that type.
  • Value formfield_name._sizeof (and field_name._bitsizeof) give the size occupied by a specific parsed field's value.
# illustrative example
instances:
trailer_pos:
value: header._sizeof + body._sizeof

Typecasting

When the inferred type of an expression is too general (for instance a switch-on field that could be one of several types, or a generic value), you can enforce a concrete type with the .as<type_name> cast so that you can then access that type's members:

# illustrative example
instances:
first_record:
value: records[0].as<record_v2>

A cast does not change any bytes; it only tells the compiler which type to treat the value as, so downstream member access type-checks correctly.

Enum references

Enums are declared with enums: and referenced in expressions with the enum_name::value syntax. Comparing against named values is far more readable than comparing against raw integers, and == / != work directly on enum values.

seq:
- id: protocol
type: u1
enum: ip_protocol
- id: crc32
type: u4
if: protocol == ip_protocol::tcp
enums:
ip_protocol:
1: icmp
6: tcp
17: udp

To get back the numeric value of an enum field, use .to_i (for example protocol.to_i).

Importing types from other .ksy files

Large or shared structures are best factored into their own .ksy files and pulled in with the imports key under meta. Once imported, the other file's meta/id becomes usable as a type name, exactly like a locally defined type.

Define the reusable type in its own file:

# date.ksy
meta:
id: date
seq:
- id: year
type: u2le
- id: month
type: u2le
- id: day
type: u2le

Then import and use it:

# filelist.ksy
meta:
id: filelist
imports:
- date
seq:
- id: entries
type: entry
repeat: eos
types:
entry:
seq:
- id: filename
type: strz
encoding: ASCII
- id: timestamp
type: date

Imported types behave like any other type in expressions — you can read their members:

# illustrative example
meta:
id: doc_container
imports:
- date
seq:
- id: timestamp
type: date
- id: data_size
type: u4
if: timestamp.year >= 2000

Import paths

Path styleResolved against
Relative (foo, foo/bar, ../foo/bar/baz)the directory of the current .ksy file
Absolute (/common_types, /formats/image_headers)the compiler's module search paths

Absolute paths are searched, in order, in:

  1. Paths given with the -I command-line switch to ksc.
  2. Paths in the KSPATH environment variable (separated by : on Linux/macOS, ; on Windows).
  3. Default platform-dependent search paths.
Always use forward slashes

Write import paths with / regardless of operating system — ksc converts them to the right path separator for you. Do not include the .ksy extension; use the format's meta/id.

You can see plenty of real imports in action in the official format gallery, where many specs reuse common building blocks across formats.

Sources