Skip to main content

The ksc compiler

Kaitai Struct is a declarative language for describing binary data structures — file formats, network packet layouts, in-memory structures, and so on. You write a format description once, in a .ksy file (which is YAML), and the Kaitai Struct compilerkaitai-struct-compiler, abbreviated ksc — translates it into a ready-to-use parser in the programming language of your choice.

This page documents the ksc command-line interface: how to pick a target language, where output goes, the most common options, and what compilation actually produces.

note

This page covers the compiler (ksc), which is a build-time tool. The generated parser depends at run time on a small runtime library for the target language (for example, kaitaistruct on PyPI for Python). The runtime is shipped separately from the compiler.

What compilation produces

A .ksy file is purely declarative — it describes the layout of the data, not how to read it. ksc reads that description and emits source code: a parser class (or a set of classes, one per declared type) in the target language.

The generated code includes a parser that can read the described structure from a file or stream and expose every field through a plain, idiomatic API in that language. In other words, compiling foo.ksy for Python gives you a Foo class with attributes matching the fields you declared; compiling the same file for Java gives you an equivalent Foo class following Java conventions, and so on. You never edit the generated code — you re-run ksc whenever the .ksy changes.

Basic usage

The invocation follows this pattern:

ksc [options] <file>...

You may pass more than one .ksy file. A minimal compile to Python looks like this:

ksc --target python gif.ksy

To put the generated files in a specific directory instead of the current one, use --outdir (filenames are generated automatically from the type names):

ksc --target python --outdir build/parsers gif.ksy

The short forms -t and -d are equivalent to --target and --outdir:

ksc -t python -d build/parsers gif.ksy

You can request several targets at once by repeating -t:

ksc -t java -t python gif.ksy

When you use -d with multiple targets, ksc creates a language-specific subdirectory for each target inside the output directory.

Supported target languages

The --target (-t) option accepts one of the following identifiers. These are the exact strings the compiler recognizes (the keys of the compiler's internal NAME_TO_CLASS map), so use them verbatim.

--target valueOutput
cpp_stlC++ with the STL
csharpC#
goGo
javaJava
javascriptJavaScript
luaLua
nimNim
perlPerl
phpPHP
pythonPython
rubyRuby
rustRust
constructPython construct declarations
graphvizGraphviz .dot diagram of the format
htmlHTML documentation page
zigZig

In addition, --target all compiles for every supported language at once.

tip

graphviz and html are not "programming language" targets — they generate a visual block diagram and a documentation page, respectively. They are handy for reviewing a format specification without writing any code.

info

For C++ specifically, you can select the language standard with --cpp-standard 98 or --cpp-standard 11 (the cpp_stl target supports both C++98 and C++11).

Common options

These are the options you will reach for most often.

OptionArgumentMeaning
-t, --target<language>Target language(s); repeat to emit several at once.
-d, --outdir<directory>Output directory; filenames are auto-generated.
-I, --import-path<directory>Search path(s) for .ksy files pulled in via imports.
--verbose<subsystem>Enable verbose/debug output.
--ksc-json-outputEmit compilation results as JSON to stdout (useful for tooling).
--ksc-exceptionsMake ksc throw exceptions instead of printing error messages.
--versionPrint version information and exit.
--helpPrint usage information and exit.

Language-specific naming

Several options set the namespace/package of the generated code for a single language:

OptionApplies to
--cpp-namespace <namespace>C++
--dotnet-namespace <namespace>C# / .NET
--java-package <package>Java
--go-package <package>Go
--php-namespace <namespace>PHP
--python-package <package>Python

For example, to compile a Java parser into a named package:

ksc -t java --java-package com.example.formats gif.ksy

Nim has no namespace/package option; instead --nim-module <module> sets the path of the Nim runtime module the generated code imports (default kaitai_struct_nim_runtime).

Debug and serialization options

OptionMeaning
--no-auto-readDo not call _read() automatically from the constructor; you call it explicitly.
--read-posMake _read() remember the stream position of each attribute.
--debugShorthand for --no-auto-read --read-pos.
--read-write (-w)Generate read-and-write classes (serialization support).
--opaque-types <boolean>Allow references to externally defined (opaque) types.
--zero-copy-substream <boolean>Allow zero-copy substreams (Java, Python, Ruby).
note

Serialization (--read-write) is only available for Java and Python at the time of writing. In read-write mode, _read() is no longer invoked automatically from the constructor — --read-write implies --no-auto-read — so you can build an empty object and write it from scratch. The generated classes gain a _write(io) method to serialize to a KaitaiStream, and a _check() method that performs stream-independent consistency checks and raises ConsistencyError if a constraint from the spec is violated. See the serialization guide for details.

Using the generated parser

After compilation you import the generated class and point it at some data. The exact API is idiomatic to each language; the pattern below is the same everywhere.

Illustrative example. Suppose you compiled gif.ksy to Python with ksc -t python gif.ksy, producing gif.py containing a Gif class. The names of the parsed fields (header, logical_screen, …) come from whatever you declared in the .ksy; the snippet below is for illustration only.

from gif import Gif

# Parse a file from disk via the generated convenience constructor.
g = Gif.from_file("sample.gif")

# Access parsed fields as plain attributes.
print(g.header.magic) # e.g. b"GIF"
print(g.logical_screen.image_width)

Every Kaitai-generated parser exposes a from_file helper (and a constructor that takes an already-open stream), and every declared field becomes an attribute on the object. Substructures you declared as nested types become nested objects, so you navigate the parsed data the same way you wrote the format description.

tip

You do not have to write a .ksy from scratch to try this out. The official format gallery hosts 100+ ready-made specifications — PNG, GIF, ZIP, ELF, PE, and many more — that you can compile directly with ksc.

Sources