WHOIS (RFC 3912) Protocol Analyzer

Spicy-based WHOIS (RFC 3912) protocol analyzer for Zeek.

Detailed Description

WHOIS is a basic TCP request/response protocol: client sends one query line, server returns free-form text and closes.

This analyzer employs intelligent interpretation to both halves of the exchange, formulating a structured whois.log. It classifies the query as domain, ipv4, ipv6, or asn, then reads the reply (capped at 64 KB) and scans it for registry/RIR fields: owner, status, origin AS, registration, update and expiry dates, name servers, abuse contact.

Features

Logs WHOIS queries and structured reply metadata to whois.log
Dynamic protocol detection (DPD) via bidirectional signatures
Reply time tracking (request-to-reply delta)
Weirds for protocol anomalies (empty requests, unusually large queries)
UTF-8/IDN support (tested against JP, CN, KR WHOIS servers)

Detection use cases (examples)

Sinkhole / seizure â€” status of serverHold or clientHold marks a domain the registry has frozen.
Routing intelligence â€” origin_as on a network query is the BGP-filter input; flag route objects whose origin AS doesn't match expected peering.
Fresh infrastructure â€” a registered date inside your lookback window flags newly-stood-up domains; a short registry_expiry (1-year registration) sharpens the signal.
Infrastructure pivot â€” name_server ties a domain to its DNS hosting; pivot to related domains sharing a name server.

Requires

Zeek 6.1.0 (bundled with Spicy 1.9.0) minimum
C++ toolchain and libpcap headers are required to build the analyzer:
- gcc g++ make cmake libpcap-dev
- As with any zkg Spicy analyzer, the code is Spicy source and compiled at install time
- NOTE: The official zeek/zeek container image omits these, so install first or the build will fail with pcap.h: No such file or directory

Install

zkg package, from Zeek Package Source:

zkg install spicy-whois

Events

event WHOIS::request(c: connection, is_orig: bool, query: string)

Raised for each client query, with query holding the string stripped of its line terminator.

event WHOIS::reply(c: connection, is_orig: bool, data: string)

Raised once per reply, with data holding the full server text (read until close, capped at 64 KB).

Where the events above hand back raw bytes, WHOIS::log_whois(rec: WHOIS::Info) is where the analyzer interpretation executes: once per connection it emits the assembled WHOIS::Info record â€” query classified, reply fields extracted â€” that is written to whois.log.
See WHOIS answer schema for fields.

Example output

Run with testing pcap, pretty-print whois.log with jq:

zeek -C -r testing/Traces/whois-domain.pcap whois.hlto scripts/__load__.zeek LogAscii::use_json=T
jq --color-output . whois.log

domain lookup (whois-domain.pcap) â€” registrar, EPP status codes, name servers, abuse contact:

{
  "ts": 1779334478.346291,
  "uid": "Cm3FuO2WPLUSPqUolb",
  "id.orig_h": "192.168.1.231",
  "id.orig_p": 63154,
  "id.resp_h": "192.34.234.30",
  "id.resp_p": 43,
  "query": "domain cloudflare.com",
  "query_type": "domain",
  "resource": "CLOUDFLARE.COM",
  "owner": "Cloudflare, Inc.",
  "registered": "2009-02-17T22:07:54Z",
  "updated": "2024-01-09T16:45:28Z",
  "registry_expiry": "2033-02-17T22:07:54Z",
  "name_server": [
    "ns3.cloudflare.com",
    "ns4.cloudflare.com",
    "ns5.cloudflare.com",
    "ns6.cloudflare.com",
    "ns7.cloudflare.com"
  ],
  "status": [
    "clientDeleteProhibited https://icann.org/epp#clientDeleteProhibited",
    "clientTransferProhibited https://icann.org/epp#clientTransferProhibited",
    "clientUpdateProhibited https://icann.org/epp#clientUpdateProhibited",
    "serverDeleteProhibited https://icann.org/epp#serverDeleteProhibited",
    "serverTransferProhibited https://icann.org/epp#serverTransferProhibited",
    "serverUpdateProhibited https://icann.org/epp#serverUpdateProhibited"
  ],
  "abuse_contact": "registrar-abuse@cloudflare.com",
  "reply_time": 0.025169849395751953,
  "reply_size": 3719
}

network lookup (whois-net.pcap) â€” the same record shape pivoted on query_type, here an RIR inetnum with server_name and origin_as populated and the domain-only fields absent:

{
  "ts": 1779334777.802331,
  "uid": "CBdloO3gjjCrOi6Q5l",
  "id.orig_h": "192.168.1.231",
  "id.orig_p": 64829,
  "id.resp_h": "193.0.6.135",
  "id.resp_p": 43,
  "query": "95.217.0.1",
  "query_type": "ipv4",
  "server_name": "RIPE",
  "resource": "95.217.0.0 - 95.217.15.255",
  "owner": "ORG-HOA1-RIPE",
  "origin_as": "AS24940",
  "registered": "2023-12-12T12:40:45Z",
  "updated": "2023-12-12T12:40:45Z",
  "status": [
    "ASSIGNED PA"
  ],
  "reply_time": 0.16294193267822266,
  "reply_size": 3800
}

Analyzer: Attachment, confirmation, and ports

A connection is logged only after two steps: the analyzer attaches to it, then the parser confirms the bytes are WHOIS.

Attach happens on 43/tcp. Analyzer::register_for_ports binds the analyzer to that port, so every connection on 43/tcp gets the analyzer at connection start, before any payload is parsed.

Confirm happens in the parser, independently on each side. A query line that parses calls spicy::accept_input(); a reply that carries data does the same. Either alone confirms, so a client query with no reply still tags the connection. A parse failure on either side calls zeek::reject_protocol() instead.

Confirmation, not the port match, is what sets service=whois in conn.log. Non-WHOIS traffic on 43/tcp still gets the analyzer attached, but never confirms, so service stays empty.

The full path is port â†’ attach â†’ parse â†’ accept_input() confirms â†’ service=whois.

DPD signature

The signature in scripts/dpd.sig is a third, independent mechanism: a content-based attach path for non-standard ports.

WHOIS has no constant byte pattern or fixed-offset header to key against, so the signature pairs a client and server match tuned against the captured bytes in testing/Traces/. The server side fires only after the client query matches (requires-reverse-signature), avoiding false positives from other text protocols whose replies carry a stray keyword.

tcp-state originator/responder is used without established, matching core analyzers; payload exists only post-handshake, so established is redundant.

Client (originator) â€” a single query line ending in CRLF:

Character class covers domain/IP/ASN chars plus flag punctuation (- . @ = / + : ,) and a literal space, so RIPE-style flag queries like -T dn,ace example.de match.
No \s in the class. An earlier version included it, silently matching internal \r/\n â€” so multi-line payloads (foo.com\r\nbar.com\r\n) and bare CRLF floods registered as valid single queries. Dropping \s rejects them.
Underscore is excluded: it let SSH banners (SSH-2.0-libssh_â€¦) match.
\x80-\xff is kept for IDN/CJK queries.

Server (responder) â€” keyword match, gated by requires-reverse-signature:

Matches keywords present in real registry/registrar/RIR replies.
Includes route6?: (not just route:) plus origin:/source:, closing a gap where IPv6 route6: objects from RIPE/RADB went undetected.

WHOIS on a non-standard port

To parse WHOIS off 43/tcp, add the port so the analyzer attaches there at connection start, the same path it uses on 43/tcp:

redef WHOIS::ports += { 4343/tcp };

Parsing limits and bounds

Cutoff bounds protect against malformed / hostile traffic:

Request line (whois.spicy) â€” printable bytes (\x09, \x20â€“\x7e, \x80â€“\xff for IDN), terminated by an optional CR and a required LF. An empty query raises whois_empty_request; a query over 512 bytes raises whois_oversized_request (the line still parses â€” the weird is the signal).
Reply body â€” read to close, capped at 64 KB (&size=65536 &eod); the first 64 KB parse, bytes past the cap are discarded, so reply_size truncates.
Field extraction (main.zeek) â€” reply split on LF, each line on its first :; keys lowercased, values stripped, empties skipped. Single-valued fields are first-wins; status and name_server accumulate into a set (name_server lowercased to dedup), bounded by the 64 KB cap.

WHOIS answer schema

Answers come in two forms, both mapped into one set of generic fields pivoted on query_type:

domain responses (registrar/registry data)
network responses (RIR inetnum/route/ASN objects)

Always read a value alongside query_type â€” the same column carries different elements per type (owner is a registrar for a domain, an mnt-by maintainer for a network).

Field	Domain response	Network response	Why it matters to a defender
`query`	the query string	the query string	What was looked up
`query_type`	`domain`	`ipv4` / `ipv6` / `asn`	Split registrar lookups from routing-intel lookups
`server_name`	â€”	source registry (RIPE, ARINâ€¦)	Which database answered
`resource`	`Domain Name`	`NetRange` / `CIDR` / `inetnum` / `route`	The object the response describes
`owner`	`Registrar`	org / `mnt-by` maintainer	Who controls the resource
`origin_as`	â€”	`OriginAS` / `origin:`	BGP-filter input the CCC RIPE talk focuses on
`registered`	`Creation Date`	`RegDate` / `created:`	Age â€” new registrations are suspicious
`updated`	`Updated Date`	`last-modified`	Recent repoint/takeover signal
`registry_expiry`	`Registry Expiry Date`	â€”	Short (1-year) registrations are a hunting signal
`name_server`	`Name Server` (set)	â€”	DNS hosting + pivot to related domains via shared NS
`status`	EPP codes	â€”	`serverHold`/`clientHold` = seized/sinkholed
`abuse_contact`	`Registrar Abuse Contact Email`	â€”	Abuse reporting + bulletproof-registrar fingerprinting
`reply_time`	requestâ†’reply delta	requestâ†’reply delta	Latency â€” tunneling/abuse signal
`reply_size`	total bytes	total bytes	Volume, without storing the blob

Protocol Reference

License

BSD-3-Clause, see COPYING.

Credits

Created

Craig P (@detection-labs)

Thanks

Inspiration

38th Chaos Communication Congress, 38c3: "The WHOIS protocol for internet routing policy, or: how plaintext retrieved over TCP/43 ends up in router configurations"

spicy-whois