The ksc compiler
Kaitai Struct is a declarative language for describing binary
data structures — file formats, network packet layouts, in-memory structures, and
so on. You write a format description once, in a .ksy file (which is YAML), and the
Kaitai Struct compiler — kaitai-struct-compiler, abbreviated ksc —
translates it into a ready-to-use parser in the programming language of your choice.
This page documents the ksc command-line interface: how to pick a target language,
where output goes, the most common options, and what compilation actually produces.
This page covers the compiler (ksc), which is a build-time tool. The generated
parser depends at run time on a small runtime library for the target language
(for example, kaitaistruct on PyPI for Python). The runtime is shipped separately
from the compiler.
What compilation produces
A .ksy file is purely declarative — it describes the layout of the data, not how to
read it. ksc reads that description and emits source code: a parser class (or a
set of classes, one per declared type) in the target language.
The generated code includes a parser that can read the described structure from a file
or stream and expose every field through a plain, idiomatic API in that language. In
other words, compiling foo.ksy for Python gives you a Foo class with attributes
matching the fields you declared; compiling the same file for Java gives you an
equivalent Foo class following Java conventions, and so on. You never edit the
generated code — you re-run ksc whenever the .ksy changes.
Basic usage
The invocation follows this pattern:
ksc [options] <file>...
You may pass more than one .ksy file. A minimal compile to Python looks like this:
ksc --target python gif.ksy
To put the generated files in a specific directory instead of the current one, use
--outdir (filenames are generated automatically from the type names):
ksc --target python --outdir build/parsers gif.ksy
The short forms -t and -d are equivalent to --target and --outdir:
ksc -t python -d build/parsers gif.ksy
You can request several targets at once by repeating -t:
ksc -t java -t python gif.ksy
When you use -d with multiple targets, ksc creates a language-specific
subdirectory for each target inside the output directory.
Supported target languages
The --target (-t) option accepts one of the following identifiers. These are the
exact strings the compiler recognizes (the keys of the compiler's internal
NAME_TO_CLASS map), so use them verbatim.
--target value | Output |
|---|---|
cpp_stl | C++ with the STL |
csharp | C# |
go | Go |
java | Java |
javascript | JavaScript |
lua | Lua |
nim | Nim |
perl | Perl |
php | PHP |
python | Python |
ruby | Ruby |
rust | Rust |
construct | Python construct declarations |
graphviz | Graphviz .dot diagram of the format |
html | HTML documentation page |
zig | Zig |
In addition, --target all compiles for every supported language at once.
graphviz and html are not "programming language" targets — they generate a visual
block diagram and a documentation page, respectively. They are handy for reviewing a
format specification without writing any code.
For C++ specifically, you can select the language standard with
--cpp-standard 98 or --cpp-standard 11 (the cpp_stl target supports both
C++98 and C++11).
Common options
These are the options you will reach for most often.
| Option | Argument | Meaning |
|---|---|---|
-t, --target | <language> | Target language(s); repeat to emit several at once. |
-d, --outdir | <directory> | Output directory; filenames are auto-generated. |
-I, --import-path | <directory> | Search path(s) for .ksy files pulled in via imports. |
--verbose | <subsystem> | Enable verbose/debug output. |
--ksc-json-output | — | Emit compilation results as JSON to stdout (useful for tooling). |
--ksc-exceptions | — | Make ksc throw exceptions instead of printing error messages. |
--version | — | Print version information and exit. |
--help | — | Print usage information and exit. |
Language-specific naming
Several options set the namespace/package of the generated code for a single language:
| Option | Applies to |
|---|---|
--cpp-namespace <namespace> | C++ |
--dotnet-namespace <namespace> | C# / .NET |
--java-package <package> | Java |
--go-package <package> | Go |
--php-namespace <namespace> | PHP |
--python-package <package> | Python |
For example, to compile a Java parser into a named package:
ksc -t java --java-package com.example.formats gif.ksy
Nim has no namespace/package option; instead --nim-module <module> sets the path of
the Nim runtime module the generated code imports (default
kaitai_struct_nim_runtime).
Debug and serialization options
| Option | Meaning |
|---|---|
--no-auto-read | Do not call _read() automatically from the constructor; you call it explicitly. |
--read-pos | Make _read() remember the stream position of each attribute. |
--debug | Shorthand for --no-auto-read --read-pos. |
--read-write (-w) | Generate read-and-write classes (serialization support). |
--opaque-types <boolean> | Allow references to externally defined (opaque) types. |
--zero-copy-substream <boolean> | Allow zero-copy substreams (Java, Python, Ruby). |
Serialization (--read-write) is only available for Java and Python at the time
of writing. In read-write mode, _read() is no longer invoked automatically from the
constructor — --read-write implies --no-auto-read — so you can build an empty
object and write it from scratch. The generated classes gain a _write(io) method to
serialize to a KaitaiStream, and a _check() method that performs stream-independent
consistency checks and raises ConsistencyError if a constraint from the spec is
violated. See the serialization guide for
details.
Using the generated parser
After compilation you import the generated class and point it at some data. The exact API is idiomatic to each language; the pattern below is the same everywhere.
Illustrative example. Suppose you compiled
gif.ksyto Python withksc -t python gif.ksy, producinggif.pycontaining aGifclass. The names of the parsed fields (header,logical_screen, …) come from whatever you declared in the.ksy; the snippet below is for illustration only.
from gif import Gif
# Parse a file from disk via the generated convenience constructor.
g = Gif.from_file("sample.gif")
# Access parsed fields as plain attributes.
print(g.header.magic) # e.g. b"GIF"
print(g.logical_screen.image_width)
Every Kaitai-generated parser exposes a from_file helper (and a constructor that
takes an already-open stream), and every declared field becomes an attribute on the
object. Substructures you declared as nested types become nested objects, so you
navigate the parsed data the same way you wrote the format description.
You do not have to write a .ksy from scratch to try this out. The official
format gallery hosts 100+ ready-made specifications —
PNG, GIF, ZIP, ELF, PE, and many more — that you can compile directly with ksc.