Zeek Matchy Plugin
A Zeek plugin for high-performance threat intelligence matching using Matchy databases. Includes MatchyIntel, a drop-in alternative to Zeek's Intel Framework that fixes its two biggest pain points: memory consumption on clusters and updating data at runtime.
Table of Contents
- Why Replace the Intel Framework?
- Installation
- Quick Start
- Deployment
- MatchyIntel Framework
- Low-Level API
- API Reference
- Building Matchy Databases
- Testing
- Troubleshooting
Why Replace the Intel Framework?
If you've run Zeek's Intel Framework at scale, you've hit these problems:
Memory
The Intel Framework loads every indicator into each worker's heap. On a 32-core cluster, that's 32 copies of your indicator set in memory. A million indicators can easily consume tens of gigabytes across workers.
Matchy databases are memory-mapped. The OS maps the .mxy file once and all workers share the same physical pages via the page cache. Zero heap allocation per worker. On that same 32-core cluster, you go from 32 copies to 1.
Updating Data at Runtime
Replacing the loaded indicator set in the Intel Framework at runtime has been a long-standing pain point. You either restart Zeek (causing a gap in monitoring) or deal with the complexity of incremental insert/remove operations and Broker synchronization.
With Matchy, you just replace the .mxy file on disk. Auto-reload detects the change and swaps in the new database atomicallyâlock-free, with ~1-2ns overhead per query. No restart, no gap, no coordination between workers. Build your database offline, scp it to your sensor, done.
Performance
| Operation | Throughput |
|---|---|
| IP queries | 7M+/sec |
| Pattern queries (globs) | 3M+/sec |
| Database load time | <1ms |
| Auto-reload overhead | ~1-2ns/query |
Performance is deterministicâno GC pauses, no hash table resizing during operation.
Operational Simplicity
- No Broker: Database files are self-contained. Copy them with
scp, distribute with Ansible, serve from S3. - No
zeekctl deploy: Just replace the file on disk. Auto-reload handles the rest. - Debug offline:
matchy query threats.mxy 1.2.3.4works from any command lineâno need to inspect Zeek's internal state. - Build anywhere: Generate
.mxyfiles from CSV, JSON, or MISP feeds in CI/CD. The same binary file works on Linux, macOS, and FreeBSD.
Installation
Requirements
- Zeek 5.0+ (with development headers if not installed from source)
- Rust/Cargo (install from rustup.rs)
- CMake 3.15+
- C++17 compiler
Via Zeek Package Manager (zkg)
zkg install https://github.com/matchylabs/zeek-matchy-plugin
This requires Rust/Cargo to be installed on the build machine. The package manager handles the rest.
From Source
git clone https://github.com/matchylabs/zeek-matchy-plugin.git
cd zeek-matchy-plugin
mkdir build && cd build
cmake ..
make
This automatically clones and builds Matchy from source. If you already have a local Matchy checkout, point CMake at it to skip the clone:
cmake -DMATCHY_SOURCE_DIR=/path/to/matchy ..
Or if Matchy is already installed system-wide:
cmake -DBUILD_MATCHY=OFF ..
# Or specify the install prefix:
cmake -DBUILD_MATCHY=OFF -DMATCHY_ROOT=/usr/local ..
Install (optional)
sudo make install
Verify
# If using ZEEK_PLUGIN_PATH (development)
export ZEEK_PLUGIN_PATH=/path/to/zeek-matchy-plugin/build
zeek -N Matchy::DB
Expected:
Matchy::DB - Fast IP and pattern matching using Matchy databases (dynamic, version 0.3.0)
Quick Start
-
Install the Matchy CLI (if you don't have it already):
cargo install matchy -
Create a threat database from CSV:
cat > threats.csv << 'EOF' entry,threat_level,category,description 1.2.3.4,high,malware,Known C2 server 10.0.0.0/8,low,internal,RFC1918 private network *.evil.com,critical,phishing,Phishing domain pattern malware.example.com,high,malware,Malware distribution site EOF matchy build threats.csv -o threats.mxy --format csv -
Use it in Zeek (add to your
local.zeekor a site-specific script):@load Matchy/DB/intel redef MatchyIntel::db_path = "/opt/threat-intel/threats.mxy"; event MatchyIntel::match(s: MatchyIntel::Seen, metadata: string) { print fmt("THREAT: %s (%s) -> %s", s$indicator, s$where, metadata); }
That's it. MatchyIntel automatically checks connection IPs, DNS queries, HTTP hosts/URLs, and SSL/TLS SNI against your database.
Deployment
Adding to Your Zeek Configuration
Add these lines to your local.zeek (or a site-specific script):
@load Matchy/DB/intel
redef MatchyIntel::db_path = "/opt/threat-intel/threats.mxy";
Then deploy as usual with zeekctl deploy.
Cluster Deployment
Matchy databases are memory-mapped, which means all Zeek workers on the same host share the same physical memory pages. You don't need to worry about per-worker memory â the OS handles sharing via the page cache.
Each host in your cluster needs a copy of the .mxy file at the same path. Options:
- Shared filesystem (NFS, CIFS): Put the
.mxyon a shared mount. All hosts read from the same file. Simplest option. - Local copies: Distribute with
rsync, Ansible, Salt, etc. Better I/O performance since reads don't cross the network. - CI/CD pipeline: Build the database in CI, push to an artifact store or S3, pull from each sensor on a cron job.
Updating Threat Intel
With auto-reload enabled (the default), updating is a file replacement. Always write to a temporary file first, then mv it into place. This ensures workers never see a partially-written file â mv on the same filesystem is atomic.
# Build new database (on your build host or in CI)
matchy build updated-threats.csv -o /opt/threat-intel/threats.mxy.tmp --format csv
# Atomically replace the live file
mv /opt/threat-intel/threats.mxy.tmp /opt/threat-intel/threats.mxy
If distributing to remote sensors, copy to a temp path first:
scp threats.mxy sensor01:/opt/threat-intel/threats.mxy.tmp
ssh sensor01 'mv /opt/threat-intel/threats.mxy.tmp /opt/threat-intel/threats.mxy'
All workers detect the file change and reload automatically. No Zeek restart, no zeekctl deploy, no monitoring gap.
MatchyIntel Framework
MatchyIntel is designed to feel familiar if you've used the Intel Framework, but with a fundamentally different architecture.
What It Observes Automatically
When you @load Matchy/DB/intel, it immediately starts observing:
| Protocol | What | Where Enum |
|---|---|---|
| Connections | Originator and responder IPs | Conn::IN_ORIG, Conn::IN_RESP |
| DNS | Query strings | DNS::IN_REQUEST |
| HTTP | Host header, full URL | HTTP::IN_HOST_HEADER, HTTP::IN_URL |
| SSL/TLS | SNI, certificate CN | SSL::IN_SERVER_NAME, X509::IN_CERT |
Auto-Reload
By default, MatchyIntel watches the database file and reloads when it changes. This is the recommended mode for production.
# Enabled by default
redef MatchyIntel::auto_reload = T;
# To disable (for manual control):
redef MatchyIntel::auto_reload = F;
To update your threat intel, simply replace the .mxy file on disk. All workers pick up the change automatically.
Runtime Database Switching
You can also change the database path at runtime via Zeek's Config framework:
# Switch to a different database
Config::set_value("MatchyIntel::db_path", "/opt/threat-intel/updated.mxy");
# Unload the database (stop matching)
Config::set_value("MatchyIntel::db_path", "");
If the new path is invalid, the change is rejected and the current database stays loaded.
Manual Observation
Check arbitrary indicators programmatically:
# Check an IP
MatchyIntel::seen(MatchyIntel::Seen($host=1.2.3.4,
$where=MatchyIntel::IN_ANYWHERE));
# Check a domain
MatchyIntel::seen(MatchyIntel::Seen($indicator="evil.example.com",
$indicator_type=MatchyIntel::DOMAIN,
$where=MatchyIntel::IN_ANYWHERE));
Hooks
# Filter matches before they fire
hook MatchyIntel::seen_policy(s: MatchyIntel::Seen, found: bool) {
# Suppress matches for local IPs
if (s?$host && Site::is_local_addr(s$host))
break;
}
# Customize logging
hook MatchyIntel::extend_match(info: MatchyIntel::Info, s: MatchyIntel::Seen, metadata: string) {
# Add custom fields, modify info record, etc.
}
Log Output
Matches are logged to matchy_intel.log:
| Field | Description |
|---|---|
ts |
Timestamp |
uid |
Connection UID (if applicable) |
id |
Connection 4-tuple (if applicable) |
seen.indicator |
What was matched |
seen.indicator_type |
ADDR, DOMAIN, URL, etc. |
seen.where |
Where it was observed |
metadata |
JSON blob from your database (all your custom fields) |
Low-Level API
For more control, use the BiF functions directly:
global threats_db: opaque of MatchyDB;
event zeek_init() {
threats_db = Matchy::load_database("/path/to/threats.mxy");
if (!Matchy::is_valid(threats_db)) {
print "Failed to load database!";
return;
}
}
event new_connection(c: connection) {
local result = Matchy::query_ip(threats_db, c$id$orig_h);
if (result != "") {
print fmt("Threat detected from %s: %s", c$id$orig_h, result);
}
}
event dns_request(c: connection, msg: dns_msg, query: string, qtype: count, qclass: count) {
local result = Matchy::query_string(threats_db, query);
if (result != "") {
print fmt("Malicious domain queried: %s - %s", query, result);
}
}
Parsing Match Results
Query results are JSON strings. Use Zeek's from_json() to parse them into typed records:
@load base/frameworks/notice
module ThreatIntel;
export {
redef enum Notice::Type += {
Threat_Detected
};
type ThreatData: record {
category: string &optional;
threat_level: string &optional;
description: string &optional;
};
global threats_db: opaque of MatchyDB;
}
event zeek_init() {
threats_db = Matchy::load_database("/opt/threat-intel/threats.mxy");
}
event new_connection(c: connection) {
local result = Matchy::query_ip(threats_db, c$id$orig_h);
if (result != "") {
local parsed = from_json(result, ThreatData);
if (parsed$valid) {
local threat: ThreatData = parsed$v;
NOTICE([$note=Threat_Detected,
$conn=c,
$msg=fmt("Threat: %s (%s)", threat$category, threat$threat_level),
$sub=fmt("IP: %s", c$id$orig_h)]);
}
}
}
API Reference
Matchy::load_database(filename: string): opaque of MatchyDB
Load a database and return an opaque handle. The database is memory-mapped (not copied into memory). Automatically closed when the handle goes out of scope.
Matchy::load_database_with_options(filename: string, auto_reload: bool): opaque of MatchyDB
Load a database with auto-reload support. When auto_reload is T, the database watches its source file and transparently reloads when changes are detected (~1-2ns overhead per query, lock-free).
Matchy::is_valid(db: opaque of MatchyDB): bool
Check if a database handle is valid and open.
Matchy::query_ip(db: opaque of MatchyDB, ip: addr): string
Query by IP address. Returns a JSON string with match metadata, or "" if no match. Supports both exact IPs and CIDR matching (longest prefix wins).
Matchy::query_string(db: opaque of MatchyDB, query: string): string
Query by string. Returns a JSON string with match metadata, or "" if no match. Supports exact string matching and glob patterns (*.evil.com).
Building Matchy Databases
Install the CLI:
cargo install matchy
From CSV
# First column must be named "entry" â it's the match key.
# All other columns become metadata fields in query results.
cat > threats.csv << 'EOF'
entry,threat_level,category,description
1.2.3.4,high,malware,Known C2 server
10.0.0.0/8,low,internal,RFC1918 private network
*.evil.com,critical,phishing,Phishing domain pattern
malware.example.com,high,malware,Malware distribution site
EOF
matchy build threats.csv -o threats.mxy --format csv
Matchy auto-detects entry types: IP addresses, CIDR ranges, glob patterns, and literal strings. You can include as many entries as you need â databases with hundreds of thousands of indicators build in about a second.
From JSON
matchy build threats.json -o threats.mxy
From MISP Threat Feeds
Matchy can import directly from MISP JSON exports, preserving all metadata (tags, threat levels, categories):
matchy build misp-feed/ -o threats.mxy
This handles MISP's directory structure automatically, including manifest.json and per-event files. All indicator types are supported: IPs, domains, URLs, hashes, email addresses, etc.
Combining Multiple Sources
You can pass multiple files of the same format to a single build:
matchy build feed1.csv feed2.csv -o combined.mxy --format csv
Inspect and Query
# Show database metadata and statistics
matchy inspect threats.mxy
# Query from the command line (useful for debugging)
matchy query threats.mxy 1.2.3.4
matchy query threats.mxy "foo.evil.com"
Testing
The plugin includes a comprehensive btest suite:
cd testing
btest
Tests cover:
- Plugin loading
- IP and string queries (exact, CIDR, glob)
load_database_with_options()with auto-reload on/off- MatchyIntel
seen()function - MatchyIntel auto-reload mode
- Runtime database switching via
Config::set_value()
Troubleshooting
Plugin not found at runtime:
export ZEEK_PLUGIN_PATH=/path/to/zeek-matchy-plugin/build
zeek -N Matchy::DB
Database fails to load with "Unsupported version" error:
Your .mxy file was built with matchy 1.x. Rebuild it with matchy 2.x:
cargo install matchy # updates to 2.x
matchy build threats.csv -o threats.mxy --format csv
Build options:
# Use a local Matchy source checkout
cmake -DMATCHY_SOURCE_DIR=/path/to/matchy ..
# Use an existing Matchy installation
cmake -DBUILD_MATCHY=OFF -DMATCHY_ROOT=/path/to/matchy ..
# Specify Zeek location manually
cmake -DCMAKE_MODULE_PATH=/path/to/zeek/cmake ..
License
Apache-2.0 License. See LICENSE.
See Also
- Matchy â The matching engine
- Zeek Documentation â Zeek network security monitor
- Zeek Plugin Development â Plugin API docs