Using generated parsers
Writing a .ksy description is only half of the story. To actually read bytes,
you compile that description into source code for a target language, add the
matching runtime library to your project, and call the generated class. This
page walks through the full loop in Python: compile, install, parse, and read
fields. The same three-step shape applies to every supported language.
A generated parser depends on the Kaitai Struct runtime library for its language. The compiler emits only the format-specific code (field offsets, types, control flow); shared helpers such as the stream reader live in the runtime. You must add both to your project.
The workflow at a glance
| Step | What you do | Tool |
|---|---|---|
| 1. Compile | Turn format.ksy into format.py | kaitai-struct-compiler (ksc) |
| 2. Install runtime | Add the language runtime to your project | pip, npm, gem, … |
| 3. Parse | Load a file and read its fields | the generated class |
Step 1 — Compile the .ksy to Python
The compiler is kaitai-struct-compiler, usually invoked as ksc. Pick the
target language with -t / --target and (optionally) an output directory with
-d / --outdir:
ksc -t python gif.ksy
# write the generated module into a specific directory
ksc -t python --outdir ./parsers gif.ksy
On Unix-like shells the short form -d requires its argument to be preceded by
-- (for example -d -- ./parsers), so the long form --outdir is usually the
simpler choice.
This reads gif.ksy and produces a Python module named after the format's
meta/id — here, gif.py containing a Gif class.
-t all compiles to every supported target at once, and you can emit several
languages in one run by repeating the flag (for example -t python -t java).
Run ksc --help to see the full flag list for your compiler version.
The supported target language identifiers are:
| Identifier | Language |
|---|---|
cpp_stl | C++ (STL) |
csharp | C# |
go | Go |
java | Java |
javascript | JavaScript |
lua | Lua |
nim | Nim |
perl | Perl |
php | PHP |
python | Python |
ruby | Ruby |
rust | Rust |
Step 2 — Install the Python runtime
The Python runtime is published on PyPI as kaitaistruct:
python3 -m pip install --upgrade kaitaistruct
Kaitai Struct 0.11 supports Python 3.4+ as well as Python 2.7, but using Python 3.10 or newer is strongly recommended. Runtime versions 0.12 and later require Python 3.8 or above.
Runtime library per language
Every target ships its own runtime, distributed through that language's usual package manager. The most common ones:
| Language | Runtime package | Installed with |
|---|---|---|
| Python | kaitaistruct | pip install kaitaistruct |
| JavaScript | kaitai-struct | npm install kaitai-struct |
| Ruby | kaitai-struct | gem install kaitai-struct |
| Java | io.kaitai:kaitai-struct-runtime | Maven / Gradle |
| Perl | IO::KaitaiStruct | CPAN |
| PHP | kaitai-io/kaitai_struct_php_runtime | Composer |
| C++ (STL) | kaitai-struct-cpp-stl-runtime | source / package |
| C# | KaitaiStruct.Runtime.CSharp | NuGet |
| Go | github.com/kaitai-io/kaitai_struct_go_runtime | go get |
For languages without a registry package (or to track the latest fixes), each
runtime also lives in its own kaitai-io/kaitai_struct_<lang>_runtime
repository on GitHub.
Step 3 — Parse a file and read fields
With gif.py generated and kaitaistruct installed, import the class and call
from_file(). The call returns a parsed object whose attributes mirror the
structure declared in the .ksy.
from gif import Gif
g = Gif.from_file("sample.gif")
# top-level fields, exactly as declared in gif.ksy
print(g.hdr.magic) # b"GIF"
print(g.hdr.version) # e.g. "89a"
print(g.logical_screen_descriptor.screen_width)
print(g.logical_screen_descriptor.screen_height)
# repeated structures come back as a list
for block in g.blocks:
print(block.block_type)
The fields above (hdr, logical_screen_descriptor, blocks) are the real
top-level attributes exposed by the GIF format from the
format gallery. Each one is a nested object or
list whose own attributes you can drill into the same way.
Parsing from bytes or an existing stream
from_file() is the convenience entry point, but the generated class also
accepts in-memory data or an already-open stream. Use these when the bytes come
from a network response, a database column, or a larger container you are already
reading:
from kaitaistruct import KaitaiStream, BytesIO
from gif import Gif
# from a bytes object already in memory
g = Gif.from_bytes(raw_bytes)
# from an existing KaitaiStream
g = Gif.from_io(KaitaiStream(BytesIO(raw_bytes)))
Parsing is read-only by default. Since v0.11, Kaitai Struct also has
serialization (writing) support — at the time of writing only for Java and
Python — enabled with the --read-write compiler flag (which also implies
--no-auto-read). See the
serialization guide for that
workflow.
Illustrative end-to-end example
Illustrative. The commands below assume
gif.ksyfrom the gallery and asample.gifin the current directory. Field names follow the gallery's GIF specification.
# 1. compile
ksc -t python gif.ksy
# 2. install the runtime
python3 -m pip install --upgrade kaitaistruct
# 3. run a small script that uses the generated parser
python3 - <<'PY'
from gif import Gif
g = Gif.from_file("sample.gif")
print("version:", g.hdr.version)
print("size:", g.logical_screen_descriptor.screen_width,
"x", g.logical_screen_descriptor.screen_height)
PY