Introduction
Kaitai Struct is a declarative language for describing
binary data structures — the layouts found in files or in memory. You describe a
format once in a .ksy file (which is plain YAML), then compile that single
description into parsing code for many programming languages. The generated code
turns raw bytes into a clean, typed object tree, so you never hand-write the
repetitive, error-prone byte-shuffling logic that binary parsing usually
requires.
The core idea is describe once, parse anywhere. A single .ksy specification
becomes a reader in C++, Python, Java, JavaScript, and more — all kept in sync
because they are generated from the same source of truth.
What problem does it solve?
Parsing binary formats is hard. You have to track endianness, alignment, variable-length fields, conditional sections, and cross-references — and then repeat that work in every language your team uses. Kaitai Struct lifts the format description out of imperative code and into a structured specification, leaving the boilerplate to the compiler.
The .ksy format
A .ksy file is a YAML document with a small set of top-level keys.
| Key | Purpose |
|---|---|
meta | Metadata about the format: its id, default endian, encoding, etc. |
seq | An ordered sequence of fields (attributes) parsed one after another. |
types | User-defined named subtypes, each of which can have its own seq, types, and so on. |
instances | Data that lies outside the normal sequential flow, or is loaded only on demand. |
enums | Named symbolic constants mapped to integer values. |
doc | Human-readable documentation embedded in the spec. |
Each attribute inside a seq can carry keys such as id, type, size,
contents (a fixed magic value), repeat, and if (a conditional). A small
illustrative example of a fixed-size record:
# Illustrative example — a fixed-size "animal record"
meta:
id: animal_record
endian: be
seq:
- id: uuid
size: 16
- id: name
type: str
size: 24
encoding: UTF-8
The full set of YAML keys is documented in the KSY reference and visualized in the KSY syntax diagram.
The compiler: ksc
The Kaitai Struct compiler, ksc (also distributed as
kaitai-struct-compiler), reads a .ksy file and emits source code in a target
language. The basic invocation is:
ksc [options] <file>...
For example, to generate a Java parser into an output directory:
ksc -t java -d output animal_record.ksy
Commonly used options:
| Option | Meaning |
|---|---|
-t <language> | Target language (or all to generate every supported language). |
-d <directory> | Output directory for generated files. |
--java-package <package> | Package name for generated Java code. |
--dotnet-namespace <ns> | Namespace for generated C# code. |
--verbose | Verbose compiler output. |
Pass -t all to emit parsers for every supported language at once — handy when a
format definition is consumed by multiple codebases.
Target languages
Kaitai Struct compiles a single .ksy description into parsers for 12
languages:
C++, C#, Go, Java, JavaScript, Lua, Nim, Perl, PHP, Python, Ruby, and Rust.
Each generated parser pairs with a small runtime library for that language, which provides the stream-reading primitives the generated code calls into.
From spec to parsers
The flow from a format description to working code looks like this:
Reading and writing
Historically Kaitai Struct focused on parsing (reading bytes into objects).
It also supports serialization — going the other way and writing structured
data back out to bytes that conform to the format. Serialization is available for
Java and Python, and works by setting field values on a generated object,
calling _check() to validate consistency, and then _write() to emit the bytes
through a KaitaiStream.
Parsing is supported in all target languages; serialization is currently limited to Java and Python. See the serialization guide for details.
The ecosystem
Kaitai Struct is more than a compiler. The surrounding tooling includes:
| Component | What it is |
|---|---|
| ksc | The compiler that turns .ksy files into target-language code. |
| Runtime libraries | Small per-language modules the generated parsers depend on. |
| Web IDE | A browser-based format editor and debugger at ide.kaitai.io. |
| Format gallery | A collection of ready-made format definitions at formats.kaitai.io. |
Visualizer (ksv) | A console tool for testing a format against real data. |
ksdump | A utility for dumping parsed structures. |
The format gallery alone hosts well over a hundred maintained specifications across categories such as archives (gzip, zip, rar), executables and byte-code (ELF, Mach-O, PE, Java class files), images (PNG, JPEG, GIF, BMP), multimedia (WAV, AVI, Ogg), network protocols (DNS, TCP/IP, TLS), filesystems (ISO 9660, ext2, GPT), and more.
Who uses it?
Kaitai Struct is used wherever binary data has to be understood reliably:
- Reverse engineering — documenting and parsing undocumented or proprietary formats and protocols.
- Digital forensics — extracting structured evidence from disk images, filesystems, and application artifacts.
- File format work — building robust, multi-language readers for established formats without re-implementing the parser per language.
By centering everything on one declarative description, teams keep their parsers consistent across languages and tools — which is exactly what makes the format gallery, Web IDE, and visualizer interoperate so cleanly.