Anatomy of a .ksy file

A .ksy file is a declarative description of a binary format. It is written in YAML, and the Kaitai Struct compiler (ksc, also called kaitai-struct-compiler) reads it to generate a parser library in a target language. The same description can be reused across many languages, so you write the format once and get a Java, Python, C++, JavaScript, Ruby, Go, and more parser out of it.

This page walks through the top-level keys you will see in nearly every .ksy file and shows how they fit together.

note

Because .ksy is YAML, indentation is significant and keys are case-sensitive. Several keys use hyphens (for example file-extension, size-eos, repeat-expr), not underscores or camelCase.

The top-level keys at a glance

A .ksy document is a YAML map. At the top level, the keys below are the ones you will use most often. Only meta (with an id) and one of seq / instances are needed for a useful spec — the rest are optional.

Key	Purpose
`meta`	Metadata about the format: its `id`, default endianness, extensions, imports, encoding.
`seq`	The sequence of attributes (fields), read in order from the stream.
`instances`	Named values parsed lazily at an arbitrary position, or computed values.
`types`	Named subtypes (nested structures) used by `seq` and `instances`.
`enums`	Named sets of integer constants you can attach to fields.
`doc`	Free-text documentation for the type.

`meta` — format metadata

meta holds information about the description as a whole. Common sub-keys:

Sub-key	Meaning
`id`	The identifier of the format. Used to name the generated top-level type. Must be a lowercase, underscore-separated name.
`endian`	Default byte order for multi-byte integers: `le` (little-endian) or `be` (big-endian).
`file-extension`	The file extension(s) commonly associated with this format.
`imports`	A list of other `.ksy` files whose types you want to reuse.
`encoding`	The default character encoding for string fields (for example `UTF-8`, `ASCII`).

Other useful sub-keys include title (a human-readable name), license (an SPDX license identifier), ks-version (the minimum compiler version needed), bit-endian (default bit order), and xref (cross-references to external registries such as MIME, RFC, or Wikidata).

meta:
  id: animal_record
  title: Animal Record
  file-extension: anr
  endian: le
  encoding: UTF-8

info

endian: le in meta sets the default for the whole file, so you do not have to write type: u4le on every field — a plain type: u4 inherits the default. You can still override the default on individual fields when needed.

`seq` — the sequence of attributes

seq is an ordered list of attributes. Each attribute is read from the stream one after another, in the order written. An attribute is a map; the keys you will use most are:

Attribute key	Meaning
`id`	The field name (used in the generated code).
`type`	The data type: a built-in (`u1`, `s4`, `f8`, `str`, `strz`, …) or a name from `types`.
`size`	Number of bytes to read (a literal or an expression referencing earlier fields).
`contents`	Fixed/magic bytes the parser must match exactly (a byte array or string).
`repeat`	Repeat the field: `expr` (a count), `eos` (until end of stream), or `until` (until a condition).
`repeat-expr`	The count expression, used with `repeat: expr`.
`if`	Parse this field only when the expression is true.
`enum`	Interpret this integer field using a named enum.

Built-in integer types follow the pattern u/s (unsigned/signed) plus a byte width of 1, 2, 4, or 8, with an optional le/be suffix; f4/f8 are floats. Strings use str (with a size and encoding) or strz (zero-terminated).

seq:
  - id: magic
    contents: [0x41, 0x4e, 0x52]   # "ANR"
  - id: name_len
    type: u1
  - id: name
    type: str
    size: name_len
  - id: species
    type: u1
    enum: species

`types` — named subtypes

types is a map from a type name to a nested type definition. A nested type has the same structure as the top level (it can have its own seq, instances, types, enums, and doc) but does not need a meta block — it inherits settings such as endianness from its parent. Reference a subtype from a field by putting its name in type.

types:
  measurement:
    seq:
      - id: weight_grams
        type: u4
      - id: height_mm
        type: u2

A field referencing this subtype looks like type: measurement. If you also set a size, the subtype is parsed inside a bounded substream of that many bytes.

`instances` — lazy and computed values

instances defines named values that are not part of the linear seq. There are two flavours:

Positional instances use pos (and usually type/size) to parse data at an arbitrary byte offset, for example a footer at the end of the file. They are parsed lazily — only when first accessed.
Value instances use value to compute a value from an expression, without reading any bytes.

instances:
  footer:
    pos: _io.size - 4
    type: u4
  is_heavy:
    value: details.weight_grams > 10000

`enums` — named constants

enums maps a name to a set of integer-to-symbol pairs. Attach an enum to an integer field with enum: (as shown in the seq example above). The generated code then exposes the symbolic name instead of a bare number.

enums:
  species:
    0: unknown
    1: cat
    2: dog
    3: bird

`doc` — documentation

doc attaches a free-text description to a type or attribute. It is carried into the generated code as comments/docstrings. Use doc-ref for a link or citation to an external specification.

doc: |
  A minimal record describing a single animal, used to illustrate
  the structure of a .ksy file.

A small complete example

Illustrative example

The format below is invented for this page to show how the pieces connect — it is not a real-world file format. For real, production-quality specs, browse the official Kaitai Struct format gallery.

meta:
  id: animal_record
  title: Animal Record (illustrative)
  file-extension: anr
  endian: le
  encoding: UTF-8
doc: |
  A toy format: a 3-byte magic, a length-prefixed name, a species code,
  and a measurement block. Used only to demonstrate .ksy structure.
seq:
  - id: magic
    contents: [0x41, 0x4e, 0x52]   # "ANR"
  - id: name_len
    type: u1
  - id: name
    type: str
    size: name_len
  - id: species
    type: u1
    enum: species
  - id: details
    type: measurement
instances:
  is_heavy:
    value: details.weight_grams > 10000
types:
  measurement:
    seq:
      - id: weight_grams
        type: u4
      - id: height_mm
        type: u2
enums:
  species:
    0: unknown
    1: cat
    2: dog
    3: bird

Compiling a `.ksy` file

Once the description is written, run the compiler and pick a target language with -t:

ksc -t python animal_record.ksy

Use -d <directory> to choose an output directory, and -t all to generate parsers for every supported language at once.

Parsing and serialization

Generated code parses binary data by default (the _read step). Kaitai Struct also supports serialization — writing data back out from the same .ksy description — in selected languages such as Java and Python. The write path adds a _check step (consistency validation) before _write emits bytes. See the serialization guide linked below.

The top-level keys at a glance​

meta — format metadata​

seq — the sequence of attributes​

types — named subtypes​

instances — lazy and computed values​

enums — named constants​

doc — documentation​

A small complete example​

Compiling a .ksy file​

Sources​