Skip to main content

Anatomy of a .ksy file

A .ksy file is a declarative description of a binary format. It is written in YAML, and the Kaitai Struct compiler (ksc, also called kaitai-struct-compiler) reads it to generate a parser library in a target language. The same description can be reused across many languages, so you write the format once and get a Java, Python, C++, JavaScript, Ruby, Go, and more parser out of it.

This page walks through the top-level keys you will see in nearly every .ksy file and shows how they fit together.

note

Because .ksy is YAML, indentation is significant and keys are case-sensitive. Several keys use hyphens (for example file-extension, size-eos, repeat-expr), not underscores or camelCase.

The top-level keys at a glance

A .ksy document is a YAML map. At the top level, the keys below are the ones you will use most often. Only meta (with an id) and one of seq / instances are needed for a useful spec — the rest are optional.

KeyPurpose
metaMetadata about the format: its id, default endianness, extensions, imports, encoding.
seqThe sequence of attributes (fields), read in order from the stream.
instancesNamed values parsed lazily at an arbitrary position, or computed values.
typesNamed subtypes (nested structures) used by seq and instances.
enumsNamed sets of integer constants you can attach to fields.
docFree-text documentation for the type.

meta — format metadata

meta holds information about the description as a whole. Common sub-keys:

Sub-keyMeaning
idThe identifier of the format. Used to name the generated top-level type. Must be a lowercase, underscore-separated name.
endianDefault byte order for multi-byte integers: le (little-endian) or be (big-endian).
file-extensionThe file extension(s) commonly associated with this format.
importsA list of other .ksy files whose types you want to reuse.
encodingThe default character encoding for string fields (for example UTF-8, ASCII).

Other useful sub-keys include title (a human-readable name), license (an SPDX license identifier), ks-version (the minimum compiler version needed), bit-endian (default bit order), and xref (cross-references to external registries such as MIME, RFC, or Wikidata).

meta:
id: animal_record
title: Animal Record
file-extension: anr
endian: le
encoding: UTF-8
info

endian: le in meta sets the default for the whole file, so you do not have to write type: u4le on every field — a plain type: u4 inherits the default. You can still override the default on individual fields when needed.

seq — the sequence of attributes

seq is an ordered list of attributes. Each attribute is read from the stream one after another, in the order written. An attribute is a map; the keys you will use most are:

Attribute keyMeaning
idThe field name (used in the generated code).
typeThe data type: a built-in (u1, s4, f8, str, strz, …) or a name from types.
sizeNumber of bytes to read (a literal or an expression referencing earlier fields).
contentsFixed/magic bytes the parser must match exactly (a byte array or string).
repeatRepeat the field: expr (a count), eos (until end of stream), or until (until a condition).
repeat-exprThe count expression, used with repeat: expr.
ifParse this field only when the expression is true.
enumInterpret this integer field using a named enum.

Built-in integer types follow the pattern u/s (unsigned/signed) plus a byte width of 1, 2, 4, or 8, with an optional le/be suffix; f4/f8 are floats. Strings use str (with a size and encoding) or strz (zero-terminated).

seq:
- id: magic
contents: [0x41, 0x4e, 0x52] # "ANR"
- id: name_len
type: u1
- id: name
type: str
size: name_len
- id: species
type: u1
enum: species

types — named subtypes

types is a map from a type name to a nested type definition. A nested type has the same structure as the top level (it can have its own seq, instances, types, enums, and doc) but does not need a meta block — it inherits settings such as endianness from its parent. Reference a subtype from a field by putting its name in type.

types:
measurement:
seq:
- id: weight_grams
type: u4
- id: height_mm
type: u2

A field referencing this subtype looks like type: measurement. If you also set a size, the subtype is parsed inside a bounded substream of that many bytes.

instances — lazy and computed values

instances defines named values that are not part of the linear seq. There are two flavours:

  • Positional instances use pos (and usually type/size) to parse data at an arbitrary byte offset, for example a footer at the end of the file. They are parsed lazily — only when first accessed.
  • Value instances use value to compute a value from an expression, without reading any bytes.
instances:
footer:
pos: _io.size - 4
type: u4
is_heavy:
value: details.weight_grams > 10000

enums — named constants

enums maps a name to a set of integer-to-symbol pairs. Attach an enum to an integer field with enum: (as shown in the seq example above). The generated code then exposes the symbolic name instead of a bare number.

enums:
species:
0: unknown
1: cat
2: dog
3: bird

doc — documentation

doc attaches a free-text description to a type or attribute. It is carried into the generated code as comments/docstrings. Use doc-ref for a link or citation to an external specification.

doc: |
A minimal record describing a single animal, used to illustrate
the structure of a .ksy file.

A small complete example

Illustrative example

The format below is invented for this page to show how the pieces connect — it is not a real-world file format. For real, production-quality specs, browse the official Kaitai Struct format gallery.

meta:
id: animal_record
title: Animal Record (illustrative)
file-extension: anr
endian: le
encoding: UTF-8
doc: |
A toy format: a 3-byte magic, a length-prefixed name, a species code,
and a measurement block. Used only to demonstrate .ksy structure.
seq:
- id: magic
contents: [0x41, 0x4e, 0x52] # "ANR"
- id: name_len
type: u1
- id: name
type: str
size: name_len
- id: species
type: u1
enum: species
- id: details
type: measurement
instances:
is_heavy:
value: details.weight_grams > 10000
types:
measurement:
seq:
- id: weight_grams
type: u4
- id: height_mm
type: u2
enums:
species:
0: unknown
1: cat
2: dog
3: bird

Compiling a .ksy file

Once the description is written, run the compiler and pick a target language with -t:

ksc -t python animal_record.ksy

Use -d <directory> to choose an output directory, and -t all to generate parsers for every supported language at once.

Parsing and serialization

Generated code parses binary data by default (the _read step). Kaitai Struct also supports serialization — writing data back out from the same .ksy description — in selected languages such as Java and Python. The write path adds a _check step (consistency validation) before _write emits bytes. See the serialization guide linked below.

Sources