Compiler Fuzzing in EK9
EK9 includes a built-in grammar-based fuzzer that generates random EK9 programs, compiles them, and tracks statistics including error code coverage, phase penetration, and compiler crashes. This is used for compiler quality assurance — verifying that the compiler handles every possible input gracefully.
This feature has been built into the compiler and shipped, so that anyone can test the compiler.
If you do find an crashes or issues, then let us know and supply the 'offending' ek9 source that caused
the issue. Then we can address it. We build new versions of the complier weekly, so a fix won't take long.
Looking for test generation? If you want to generate tests for your code, see Test Generation instead. This page documents the compiler fuzzer that tests the compiler itself.
- What is Compiler Fuzzing? - Why it matters
- Running the Fuzzer - CLI flags and format options
- Three-Strand Generation Strategy - How programs are generated
- Output Formats - Human, JSON, HTML, CI
- Reading the Dashboard - What each section shows
- Best Practices - Getting the most from fuzzing
What is Compiler Fuzzing?
The EK9 compiler has a 22-phase compilation pipeline, over 300 distinct error codes, and supports 29 construct types with complex interactions. Grammar-based fuzzing systematically exercises this complexity by generating random-but-structurally-plausible EK9 programs and compiling them.
The fuzzer answers three questions:
- Crash detection — Does the compiler crash on any generated input?
Crash-triggering source files are saved to
./fuzz-crashes/for investigation. - Error code coverage — How many of the compiler's ~307 error codes are exercised? Low coverage indicates untested error paths.
- Phase penetration — How far do generated programs progress through the compilation pipeline? Programs that reach later phases exercise more of the compiler.
Running the Fuzzer
The fuzzer runs for a specified number of minutes, generating and compiling programs continuously:
- $ ek9 -fuzz 30 // 30 minutes, human-readable output
- $ ek9 -fuzz0 30 // Terse CI pass/fail (one line)
- $ ek9 -fuzz2 1440 // 24 hours, JSON for pipelines
- $ ek9 -fuzz6 60 // 1 hour, HTML dashboard
The format suffix convention (-fuzz0, -fuzz2, -fuzz6) mirrors the test format convention (-t0, -t2, -t6) — each action owns its format suffixes.
Crash-triggering source files are always saved to ./fuzz-crashes/ regardless of output format. The HTML dashboard is written to ./fuzz-report/.
Three-Strand Generation Strategy
The fuzzer uses three complementary generation strategies to maximise the diversity of generated programs. Each strand produces different error distributions, and together they exercise the compiler more thoroughly than any single strategy could.
Strand 1: Template-Based ATN Generation
The primary strand uses 25+ built-in templates for common EK9 patterns (classes, functions, traits, records, etc.) and fills them using ANTLR4 Augmented Transition Network (ATN) walks of the EK9 grammar. At each grammar decision point, the generator makes a random choice, producing structurally plausible code. This strand produces the highest density of syntactically correct programs.
Strand 2: Compiler-Aware Injection
This strand harvests real symbols from previously compiled Q&A example files (426 templates) and injects them into generated programs. By using real type names, method signatures, and module structures, these programs exercise deeper semantic analysis phases that pure random generation rarely reaches.
Strand 3: Template Mutation
This strand takes working Q&A example files and applies single-point mutations: dropping modifiers, swapping types, changing operators, altering indentation, duplicating lines, injecting boolean literals, stripping guards, and swapping adjacent statements. These targeted mutations exercise specific error detection paths (E08010, E08030, E08081, E11050, etc.) that random generation is unlikely to trigger.
| Strand | Share | Strength |
|---|---|---|
| Template-Based ATN | ~50% | High volume, broad grammar coverage, many parse errors for parser robustness |
| Compiler-Aware | ~25% | Deeper phase penetration, exercises type resolution and semantic checks |
| Template Mutation | ~25% | Targeted error code coverage, exercises specific detection logic |
Output Formats
Human-Readable (-fuzz)
The default format prints terminal histograms, phase penetration charts, and a summary to stdout. Suitable for interactive monitoring during development:
EK9 Fuzzer: 30 minutes, seed 1709312456789 Programs: 14,832 | Parse: 72.4% | Crashes: 631 | Errors: 156/307 (50.8%) Phase Distribution: READING ████░░░░░░░░░░░░░░░░ 5.8% SYMBOL_DEFINITION █████████░░░░░░░░░░░░ 23.9% FULL_RESOLUTION ██████████░░░░░░░░░░░ 26.0% CODE_GENERATION_AGGREGATES ████████████░░░░░░░░░ 32.3%
Terse CI (-fuzz0)
One-line pass/fail for CI gates. Returns exit code 0 if no new crashes were found:
FUZZ OK: 14832 programs, 0 new crashes, 156/307 errors (50.8%) in 30m
JSON (-fuzz2)
Produces two files for programmatic analysis:
- fuzz-report.json — Final summary with all statistics
- fuzz-snapshots.jsonl — Time-series snapshots (one JSON object per line) for tracking metrics over time
{
"duration": "PT30M",
"programs": 14832,
"parseRate": 0.724,
"crashes": 631,
"errorCoverage": { "triggered": 156, "total": 307 },
"phases": { "READING": 860, "SYMBOL_DEFINITION": 3546, ... },
"constructs": { "class": 4231, "function": 3892, ... }
}
HTML Dashboard (-fuzz6)
Generates an interactive dashboard at ./fuzz-report/index.html with charts, heatmaps, and drill-down details. This is the richest output format and the recommended way to review fuzzing results.
Reading the Dashboard
The HTML dashboard (-fuzz6) provides eight visualisation sections. Each is
described below with the key metrics to watch.
Status Banner and KPI Cards
The status banner shows duration, programs generated, throughput, crash count, and corrections. The border colour indicates overall health: green (no crashes), amber (few crashes), or red (significant crashes).
Four KPI donut charts provide at-a-glance metrics:
- Parse Rate — Percentage of generated programs that parse successfully. Higher is better for exercising semantic phases, but some parse failures are expected and valuable for testing parser error recovery.
- Error Coverage — Percentage of the compiler's ~307 error codes triggered. The goal is to exercise as many error paths as possible.
- Constructs — Percentage of EK9's 29 construct types exercised (class, function, record, trait, service, etc.). Target: 100%.
- Multi-File — Percentage of programs that span multiple files, testing cross-module scenarios.
Timing Breakdown and Source Statistics
Three mini-donuts show where time is spent: generation, parse checking, and compilation. Source statistics show min/avg/max lines per program, total bytes generated, file counts, and compile rate. If compilation dominates, programs are reaching deep phases (good). If parse checking dominates, most programs fail early (consider adjusting generation strategy).
Phase Distribution
Horizontal bars show how far programs penetrate the 20-phase compilation pipeline. Each bar represents a phase where programs were rejected — programs that pass a phase move to the next bar. A healthy distribution shows programs spread across all phases, not clustered at the front.
- Early phases (READING, SYMBOL_DEFINITION) — Programs with syntax or basic structural errors. Expected from random generation.
- Middle phases (REFERENCE_CHECKS through PRE_IR_CHECKS) — Programs with type errors, unresolved references, or semantic issues. These exercise the type system.
- Late phases (CODE_GENERATION onwards) — Programs that pass all frontend checks. Crashes here indicate code generation bugs.
Error Code Coverage Heatmap
The largest dashboard section shows all ~307 compiler error codes as a searchable, filterable grid. Error codes are grouped by category (E01xxx Lexer/Parser, E05xxx Hierarchy, E06xxx Resolution, etc.).
- Green cells — Error code was triggered. Darker green indicates more hits.
- Grey cells — Error code was not triggered. These represent untested error paths that may need targeted generation strategies.
- Red crash badges — Error code triggered a compiler crash (not just an error).
Use the search box to find specific error codes, or the filter buttons (All / Triggered / Untriggered) to focus on gaps.
Construct Coverage
A heatmap of all 29 EK9 construct types: class, function, record, trait, service, component, program, enumeration, generic-type, dynamic-class, dynamic-function, and more. Colour intensity indicates frequency of generation. Red-bordered cells with a pulse animation indicate constructs that have caused compiler crashes.
The goal is uniform coverage across all construct types. If some constructs are underrepresented, the generation templates may need adjustment.
Control Flow Coverage
Grouped horizontal bars for each control flow type (for-in, do-while, for-range, switch, if, while, throw, try-catch, stream, etc.). Each type has three sub-bars:
- stmt (green) — Statement form (e.g.,
if condition) - guard (blue) — Guard form (e.g.,
if x <- getValue()) - expr (purple) — Expression form used in assignments
Crash badges on specific control flow types highlight where the compiler is most vulnerable. Stream operations and deeply nested constructs often reveal the most interesting bugs.
Argument Count Distribution
Shows the frequency distribution of argument counts (0-25+) in generated functions and methods. A realistic distribution has most functions with 0-3 parameters, with decreasing frequency for higher counts. Edge cases at 15+ parameters stress the compiler's parameter handling.
Template Usage
Shows utilisation of the 426 Q&A example templates used by Strand 2 (compiler-aware injection) and Strand 3 (template mutation). Identifies underused templates that may need attention to ensure comprehensive coverage.
Best Practices
- Run 24/7, not 5 minutes. Stochastic testing is probabilistic — a short run may miss rare crash conditions. The fuzzer is designed for continuous operation.
- QA the statistics daily. The dashboard is a monitoring system. Check that error coverage is growing, constructs are uniformly exercised, and no new crash patterns have emerged.
- Statistics are the product, not crash files. Crashes are important, but the real value is understanding how much of the compiler has been exercised. A clean run with low error coverage is worse than a crashy run with high coverage.
- Use seeds for reproducibility. When investigating a crash, note the seed from the dashboard and re-run with the same seed to reproduce the exact sequence of generated programs.
- Dark mode toggle. The dashboard includes a dark/light mode toggle for late-night monitoring sessions.
See Also
- Test Generation - Generate edge-case tests and mutation variants for your code
- Testing - Test types, assertions, test runner commands
- Code Coverage - Threshold enforcement, quality metrics, HTML reports
- Command Line - All flags and exit codes
- For AI Assistants - Machine-readable output schemas