Skip to main content

Using generated parsers

Writing a .ksy description is only half of the story. To actually read bytes, you compile that description into source code for a target language, add the matching runtime library to your project, and call the generated class. This page walks through the full loop in Python: compile, install, parse, and read fields. The same three-step shape applies to every supported language.

note

A generated parser depends on the Kaitai Struct runtime library for its language. The compiler emits only the format-specific code (field offsets, types, control flow); shared helpers such as the stream reader live in the runtime. You must add both to your project.

The workflow at a glance

StepWhat you doTool
1. CompileTurn format.ksy into format.pykaitai-struct-compiler (ksc)
2. Install runtimeAdd the language runtime to your projectpip, npm, gem, …
3. ParseLoad a file and read its fieldsthe generated class

Step 1 — Compile the .ksy to Python

The compiler is kaitai-struct-compiler, usually invoked as ksc. Pick the target language with -t / --target and (optionally) an output directory with -d / --outdir:

ksc -t python gif.ksy
# write the generated module into a specific directory
ksc -t python --outdir ./parsers gif.ksy

On Unix-like shells the short form -d requires its argument to be preceded by -- (for example -d -- ./parsers), so the long form --outdir is usually the simpler choice.

This reads gif.ksy and produces a Python module named after the format's meta/id — here, gif.py containing a Gif class.

tip

-t all compiles to every supported target at once, and you can emit several languages in one run by repeating the flag (for example -t python -t java). Run ksc --help to see the full flag list for your compiler version.

The supported target language identifiers are:

IdentifierLanguage
cpp_stlC++ (STL)
csharpC#
goGo
javaJava
javascriptJavaScript
luaLua
nimNim
perlPerl
phpPHP
pythonPython
rubyRuby
rustRust

Step 2 — Install the Python runtime

The Python runtime is published on PyPI as kaitaistruct:

python3 -m pip install --upgrade kaitaistruct
info

Kaitai Struct 0.11 supports Python 3.4+ as well as Python 2.7, but using Python 3.10 or newer is strongly recommended. Runtime versions 0.12 and later require Python 3.8 or above.

Runtime library per language

Every target ships its own runtime, distributed through that language's usual package manager. The most common ones:

LanguageRuntime packageInstalled with
Pythonkaitaistructpip install kaitaistruct
JavaScriptkaitai-structnpm install kaitai-struct
Rubykaitai-structgem install kaitai-struct
Javaio.kaitai:kaitai-struct-runtimeMaven / Gradle
PerlIO::KaitaiStructCPAN
PHPkaitai-io/kaitai_struct_php_runtimeComposer
C++ (STL)kaitai-struct-cpp-stl-runtimesource / package
C#KaitaiStruct.Runtime.CSharpNuGet
Gogithub.com/kaitai-io/kaitai_struct_go_runtimego get

For languages without a registry package (or to track the latest fixes), each runtime also lives in its own kaitai-io/kaitai_struct_<lang>_runtime repository on GitHub.

Step 3 — Parse a file and read fields

With gif.py generated and kaitaistruct installed, import the class and call from_file(). The call returns a parsed object whose attributes mirror the structure declared in the .ksy.

from gif import Gif

g = Gif.from_file("sample.gif")

# top-level fields, exactly as declared in gif.ksy
print(g.hdr.magic) # b"GIF"
print(g.hdr.version) # e.g. "89a"
print(g.logical_screen_descriptor.screen_width)
print(g.logical_screen_descriptor.screen_height)

# repeated structures come back as a list
for block in g.blocks:
print(block.block_type)

The fields above (hdr, logical_screen_descriptor, blocks) are the real top-level attributes exposed by the GIF format from the format gallery. Each one is a nested object or list whose own attributes you can drill into the same way.

Parsing from bytes or an existing stream

from_file() is the convenience entry point, but the generated class also accepts in-memory data or an already-open stream. Use these when the bytes come from a network response, a database column, or a larger container you are already reading:

from kaitaistruct import KaitaiStream, BytesIO
from gif import Gif

# from a bytes object already in memory
g = Gif.from_bytes(raw_bytes)

# from an existing KaitaiStream
g = Gif.from_io(KaitaiStream(BytesIO(raw_bytes)))
note

Parsing is read-only by default. Since v0.11, Kaitai Struct also has serialization (writing) support — at the time of writing only for Java and Python — enabled with the --read-write compiler flag (which also implies --no-auto-read). See the serialization guide for that workflow.

Illustrative end-to-end example

Illustrative. The commands below assume gif.ksy from the gallery and a sample.gif in the current directory. Field names follow the gallery's GIF specification.

# 1. compile
ksc -t python gif.ksy

# 2. install the runtime
python3 -m pip install --upgrade kaitaistruct

# 3. run a small script that uses the generated parser
python3 - <<'PY'
from gif import Gif
g = Gif.from_file("sample.gif")
print("version:", g.hdr.version)
print("size:", g.logical_screen_descriptor.screen_width,
"x", g.logical_screen_descriptor.screen_height)
PY

Sources