Skip to main content

Introduction

Kaitai Struct is a declarative language for describing binary data structures — the layouts found in files or in memory. You describe a format once in a .ksy file (which is plain YAML), then compile that single description into parsing code for many programming languages. The generated code turns raw bytes into a clean, typed object tree, so you never hand-write the repetitive, error-prone byte-shuffling logic that binary parsing usually requires.

note

The core idea is describe once, parse anywhere. A single .ksy specification becomes a reader in C++, Python, Java, JavaScript, and more — all kept in sync because they are generated from the same source of truth.

What problem does it solve?

Parsing binary formats is hard. You have to track endianness, alignment, variable-length fields, conditional sections, and cross-references — and then repeat that work in every language your team uses. Kaitai Struct lifts the format description out of imperative code and into a structured specification, leaving the boilerplate to the compiler.

The .ksy format

A .ksy file is a YAML document with a small set of top-level keys.

KeyPurpose
metaMetadata about the format: its id, default endian, encoding, etc.
seqAn ordered sequence of fields (attributes) parsed one after another.
typesUser-defined named subtypes, each of which can have its own seq, types, and so on.
instancesData that lies outside the normal sequential flow, or is loaded only on demand.
enumsNamed symbolic constants mapped to integer values.
docHuman-readable documentation embedded in the spec.

Each attribute inside a seq can carry keys such as id, type, size, contents (a fixed magic value), repeat, and if (a conditional). A small illustrative example of a fixed-size record:

# Illustrative example — a fixed-size "animal record"
meta:
id: animal_record
endian: be
seq:
- id: uuid
size: 16
- id: name
type: str
size: 24
encoding: UTF-8
info

The full set of YAML keys is documented in the KSY reference and visualized in the KSY syntax diagram.

The compiler: ksc

The Kaitai Struct compiler, ksc (also distributed as kaitai-struct-compiler), reads a .ksy file and emits source code in a target language. The basic invocation is:

ksc [options] <file>...

For example, to generate a Java parser into an output directory:

ksc -t java -d output animal_record.ksy

Commonly used options:

OptionMeaning
-t <language>Target language (or all to generate every supported language).
-d <directory>Output directory for generated files.
--java-package <package>Package name for generated Java code.
--dotnet-namespace <ns>Namespace for generated C# code.
--verboseVerbose compiler output.
tip

Pass -t all to emit parsers for every supported language at once — handy when a format definition is consumed by multiple codebases.

Target languages

Kaitai Struct compiles a single .ksy description into parsers for 12 languages:

C++, C#, Go, Java, JavaScript, Lua, Nim, Perl, PHP, Python, Ruby, and Rust.

Each generated parser pairs with a small runtime library for that language, which provides the stream-reading primitives the generated code calls into.

From spec to parsers

The flow from a format description to working code looks like this:

Reading and writing

Historically Kaitai Struct focused on parsing (reading bytes into objects). It also supports serialization — going the other way and writing structured data back out to bytes that conform to the format. Serialization is available for Java and Python, and works by setting field values on a generated object, calling _check() to validate consistency, and then _write() to emit the bytes through a KaitaiStream.

note

Parsing is supported in all target languages; serialization is currently limited to Java and Python. See the serialization guide for details.

The ecosystem

Kaitai Struct is more than a compiler. The surrounding tooling includes:

ComponentWhat it is
kscThe compiler that turns .ksy files into target-language code.
Runtime librariesSmall per-language modules the generated parsers depend on.
Web IDEA browser-based format editor and debugger at ide.kaitai.io.
Format galleryA collection of ready-made format definitions at formats.kaitai.io.
Visualizer (ksv)A console tool for testing a format against real data.
ksdumpA utility for dumping parsed structures.

The format gallery alone hosts well over a hundred maintained specifications across categories such as archives (gzip, zip, rar), executables and byte-code (ELF, Mach-O, PE, Java class files), images (PNG, JPEG, GIF, BMP), multimedia (WAV, AVI, Ogg), network protocols (DNS, TCP/IP, TLS), filesystems (ISO 9660, ext2, GPT), and more.

Who uses it?

Kaitai Struct is used wherever binary data has to be understood reliably:

  • Reverse engineering — documenting and parsing undocumented or proprietary formats and protocols.
  • Digital forensics — extracting structured evidence from disk images, filesystems, and application artifacts.
  • File format work — building robust, multi-language readers for established formats without re-implementing the parser per language.

By centering everything on one declarative description, teams keep their parsers consistent across languages and tools — which is exactly what makes the format gallery, Web IDE, and visualizer interoperate so cleanly.

Sources