Skip to main content

Introduction

YARA-X is a pattern-matching engine for classifying and identifying files, designed with malware researchers in mind. It is VirusTotal's ground-up reimplementation of the original YARA in Rust, aiming to be faster, safer, and more user-friendly than its C predecessor.

You describe what you are looking for as a set of rules. Each rule pairs a collection of string patterns — text, hexadecimal byte sequences, or regular expressions — with a boolean condition that decides when a file (or any block of data) matches. Running those rules over a sample file or directory tells you which rules fire, which is the basis for triage, classification, and threat hunting workflows.

note

YARA was created to help malware researchers identify and classify malware samples. A common pattern is to write one rule per malware family, describing the byte sequences, strings, and structural features that family tends to exhibit, then scan unknown files to see which families they resemble.

YARA vs. YARA-X

YARA-X is described by the project as "a re-incarnation of YARA," sharing the same rule language and goals while being rewritten from scratch.

Original YARAYARA-X
ImplementationCRust
CLI commandyarayr
FocusThe established, widely deployed engineUser-friendliness, performance, and safety
Future modulesMaintenance and bug fixes onlyAll new feature/module work happens here
Project status

According to the project, YARA-X is mature and stable: VirusTotal has run it in production for a long time, scanning billions of files with tens of thousands of rules. Most existing YARA rules work with YARA-X without changes. Always confirm the current status and any compatibility differences against the official documentation and the README, since the project continues to evolve.

Anatomy of a rule

A YARA rule has a name and up to three sections:

  • meta — optional identifier/value pairs carrying descriptive information (author, description, severity, references). Values can be strings, integers, or booleans. Metadata does not affect matching.
  • strings — the patterns to search for. Each pattern has an identifier beginning with $ and may be a text string, a hex byte sequence in { ... }, or a regular expression in / ... /. This section is optional when the condition needs no patterns.
  • condition — a required boolean expression that determines when the rule matches. Pattern identifiers act as booleans that are true when the pattern is found.

The following is an illustrative example adapted from the YARA-X documentation. Replace the patterns and metadata with values that describe what you are actually hunting for.

rule ExampleRule {
meta:
author = "malware-researcher"
description = "Illustrative example rule"
severity = 5
strings:
$text = "text here"
$hex = { E2 34 A1 C8 23 FB }
$regex = /some regular expression: \w+/
condition:
$text or $hex or $regex
}

This rule matches any scanned file that contains the literal text, the hex byte sequence, or the regular-expression pattern.

Scanning with the yr CLI

The command-line tool is yr. The most common workflow is to point it at one or more rule files and a target file or directory.

# Scan a single file with one rule file
yr scan rules.yar suspicious.bin

# Scan a directory recursively
yr scan --recursive rules.yar /samples/

# Print the matching strings alongside each match
yr scan --print-strings rules.yar suspicious.bin

yr also provides several other subcommands:

CommandPurpose
yr scanMatch rules against a file or directory.
yr compileCompile one or more rule files into a single binary for reuse.
yr fmtReformat rule source for readability and consistency.
yr dumpInspect the structured output a module produces for a file.
yr depsShow a rule's dependency tree.
yr completionGenerate shell completion scripts.

When you scan the same rules repeatedly, compile them once and reuse the result:

yr compile --output rules.yarc rules.yar
yr scan --compiled-rules rules.yarc /samples/
tip

Use yr compile to build a .yarc binary when you scan a large rule set repeatedly — loading precompiled rules avoids recompiling the source on every scan and speeds up bulk or automated scanning.

End-to-end flow

At a high level, you feed rules and a sample file into yr scan, and you get back the set of rules that matched.

Language bindings

Beyond the CLI, YARA-X exposes its engine through bindings so you can embed scanning in your own tools and pipelines. The project provides bindings for C/C++, Python, Go, and JavaScript/TypeScript, in addition to the native Rust API. Check the API documentation for the current, authoritative list and per-language details.

Sources