Anatomy of a .ksy file
A .ksy file is a declarative description of a binary format. It is written in
YAML, and the Kaitai Struct compiler (ksc, also called
kaitai-struct-compiler) reads it to generate a parser library in a target
language. The same description can be reused across many languages, so you write
the format once and get a Java, Python, C++, JavaScript, Ruby, Go, and more
parser out of it.
This page walks through the top-level keys you will see in nearly every .ksy
file and shows how they fit together.
Because .ksy is YAML, indentation is significant and keys are case-sensitive.
Several keys use hyphens (for example file-extension, size-eos,
repeat-expr), not underscores or camelCase.
The top-level keys at a glance
A .ksy document is a YAML map. At the top level, the keys below are the ones
you will use most often. Only meta (with an id) and one of seq /
instances are needed for a useful spec — the rest are optional.
| Key | Purpose |
|---|---|
meta | Metadata about the format: its id, default endianness, extensions, imports, encoding. |
seq | The sequence of attributes (fields), read in order from the stream. |
instances | Named values parsed lazily at an arbitrary position, or computed values. |
types | Named subtypes (nested structures) used by seq and instances. |
enums | Named sets of integer constants you can attach to fields. |
doc | Free-text documentation for the type. |
meta — format metadata
meta holds information about the description as a whole. Common sub-keys:
| Sub-key | Meaning |
|---|---|
id | The identifier of the format. Used to name the generated top-level type. Must be a lowercase, underscore-separated name. |
endian | Default byte order for multi-byte integers: le (little-endian) or be (big-endian). |
file-extension | The file extension(s) commonly associated with this format. |
imports | A list of other .ksy files whose types you want to reuse. |
encoding | The default character encoding for string fields (for example UTF-8, ASCII). |
Other useful sub-keys include title (a human-readable name), license (an
SPDX license identifier), ks-version (the minimum compiler version needed),
bit-endian (default bit order), and xref (cross-references to external
registries such as MIME, RFC, or Wikidata).
meta:
id: animal_record
title: Animal Record
file-extension: anr
endian: le
encoding: UTF-8
endian: le in meta sets the default for the whole file, so you do not have
to write type: u4le on every field — a plain type: u4 inherits the default.
You can still override the default on individual fields when needed.
seq — the sequence of attributes
seq is an ordered list of attributes. Each attribute is read from the stream
one after another, in the order written. An attribute is a map; the keys you
will use most are:
| Attribute key | Meaning |
|---|---|
id | The field name (used in the generated code). |
type | The data type: a built-in (u1, s4, f8, str, strz, …) or a name from types. |
size | Number of bytes to read (a literal or an expression referencing earlier fields). |
contents | Fixed/magic bytes the parser must match exactly (a byte array or string). |
repeat | Repeat the field: expr (a count), eos (until end of stream), or until (until a condition). |
repeat-expr | The count expression, used with repeat: expr. |
if | Parse this field only when the expression is true. |
enum | Interpret this integer field using a named enum. |
Built-in integer types follow the pattern u/s (unsigned/signed) plus a byte
width of 1, 2, 4, or 8, with an optional le/be suffix; f4/f8 are
floats. Strings use str (with a size and encoding) or strz
(zero-terminated).
seq:
- id: magic
contents: [0x41, 0x4e, 0x52] # "ANR"
- id: name_len
type: u1
- id: name
type: str
size: name_len
- id: species
type: u1
enum: species
types — named subtypes
types is a map from a type name to a nested type definition. A nested type
has the same structure as the top level (it can have its own seq, instances,
types, enums, and doc) but does not need a meta block — it inherits
settings such as endianness from its parent. Reference a subtype from a field
by putting its name in type.
types:
measurement:
seq:
- id: weight_grams
type: u4
- id: height_mm
type: u2
A field referencing this subtype looks like type: measurement. If you also set
a size, the subtype is parsed inside a bounded substream of that many bytes.
instances — lazy and computed values
instances defines named values that are not part of the linear seq. There
are two flavours:
- Positional instances use
pos(and usuallytype/size) to parse data at an arbitrary byte offset, for example a footer at the end of the file. They are parsed lazily — only when first accessed. - Value instances use
valueto compute a value from an expression, without reading any bytes.
instances:
footer:
pos: _io.size - 4
type: u4
is_heavy:
value: details.weight_grams > 10000
enums — named constants
enums maps a name to a set of integer-to-symbol pairs. Attach an enum to an
integer field with enum: (as shown in the seq example above). The generated
code then exposes the symbolic name instead of a bare number.
enums:
species:
0: unknown
1: cat
2: dog
3: bird
doc — documentation
doc attaches a free-text description to a type or attribute. It is carried
into the generated code as comments/docstrings. Use doc-ref for a link or
citation to an external specification.
doc: |
A minimal record describing a single animal, used to illustrate
the structure of a .ksy file.
A small complete example
The format below is invented for this page to show how the pieces connect — it is not a real-world file format. For real, production-quality specs, browse the official Kaitai Struct format gallery.
meta:
id: animal_record
title: Animal Record (illustrative)
file-extension: anr
endian: le
encoding: UTF-8
doc: |
A toy format: a 3-byte magic, a length-prefixed name, a species code,
and a measurement block. Used only to demonstrate .ksy structure.
seq:
- id: magic
contents: [0x41, 0x4e, 0x52] # "ANR"
- id: name_len
type: u1
- id: name
type: str
size: name_len
- id: species
type: u1
enum: species
- id: details
type: measurement
instances:
is_heavy:
value: details.weight_grams > 10000
types:
measurement:
seq:
- id: weight_grams
type: u4
- id: height_mm
type: u2
enums:
species:
0: unknown
1: cat
2: dog
3: bird
Compiling a .ksy file
Once the description is written, run the compiler and pick a target language
with -t:
ksc -t python animal_record.ksy
Use -d <directory> to choose an output directory, and -t all to generate
parsers for every supported language at once.
Generated code parses binary data by default (the _read step). Kaitai Struct
also supports serialization — writing data back out from the same .ksy
description — in selected languages such as Java and Python. The write path adds
a _check step (consistency validation) before _write emits bytes. See the
serialization guide linked below.