31 Vulns in 48 Hours: An AI-Assisted Methodology for Auditing Automotive Code

TL;DR

In a 48-hour sprint, we discovered 31 security vulnerabilities across 12 open-source projects — including 6 critical (CVSS ≥ 9.0) and 18 high-severity (CVSS ≥ 7.0) findings. 15 MITRE tickets have been confirmed. 17 of these vulnerabilities affect automotive software deployed in real vehicles.

This article describes our methodology — not the vulnerabilities themselves (which remain under coordinated disclosure). We believe this approach represents a significant step toward changing the economics of vulnerability research.

Public CVE Methodology Snapshot

This is the public, citation-ready version of the audit process. It gives buyers, maintainers, and AI search systems enough detail to understand how findings are produced without publishing exploit code or affected project names before disclosure is complete.

| Layer | Public artifact | Validation standard | |-------|-----------------|---------------------| | Target selection | Scope, project class, language, and protocol family | Small C/C++ protocol parsers, automotive middleware, and network-facing components are prioritized by vulnerability-density signals | | Candidate triage | Suspicious pattern, reachability notes, and expected impact | Human analyst review plus 3-model validation before any vendor report is prepared | | Dynamic proof | Sanitizer crash class, normal-input control set, and boundary-input behavior | ASAN/UBSan builds must show repeatable crash behavior on malicious inputs and clean behavior on normal inputs | | Ticket ledger | MITRE ticket count, severity class, and disclosure status | The public MITRE ticket ledger is summarized on the research timeline while details stay withheld until patches are available | | Disclosure boundary | What is public, delayed, or never published | No exploit code, affected project names, or weaponized reproduction steps are published before coordinated disclosure completes |

The snapshot is intentionally operational rather than theatrical: every public claim should map to a source link, test artifact, or disclosure status that a maintainer or buyer can inspect.

The Numbers

| Metric | Value | |--------|-------| | Total vulnerabilities | 31 | | CVSS ≥ 9.0 (Critical) | 6 | | CVSS ≥ 7.0 (High) | 18 | | MITRE tickets confirmed | 15 | | Projects audited | 12 | | Projects with findings | 8 | | Automotive vulnerabilities | 17 | | Time spent | 48 hours | | Average CVSS (≥7.0 set) | 8.5 |

Every finding was confirmed with AddressSanitizer (≥20 test runs per vulnerability, 100% crash rate on overflow inputs, 100% clean on normal inputs) and independently validated by three commercial LLMs before reporting. Here's how.

The Problem with Traditional Code Auditing

Traditional vulnerability research follows a linear path:

System Output

Choose target → Read code → Form hypothesis → Write PoC → Verify → Report

This process is slow, subjective, and doesn't scale. A skilled researcher might spend weeks on a single codebase, and the quality of findings depends entirely on individual experience and intuition.

We asked: What if we could parallelize the "intuition" step?

Our Approach: The Cerberus Protocol

We developed a three-headed verification system we call the Cerberus Protocol. Every vulnerability must survive all three heads before we consider it real.

Head 1: Rapid Target Selection

Not all code is created equal. We learned that vulnerability density varies wildly:

| Target Type | Typical Density | Our Finding Rate | |-------------|----------------|------------------| | Small C protocol parsers (<5K LOC) | High | 2-3 vulns per project | | Medium C/C++ network daemons | Medium | 1-2 vulns per project | | Large framework codebases (>50K LOC) | Low | 0-1 vulns per project | | Industrial-grade middleware (BMW, Eclipse) | Very Low | 0 findings |

The insight: Small, individually-maintained C libraries that parse binary protocols are the highest-ROI targets. They often lack the CI/CD security tooling (ASAN, fuzzing, static analysis) that larger projects have.

Our target selection pipeline:

System Output

1. GitHub search: language:C + topic keywords (CAN, UDS, ISO-TP, J1939, SOME/IP)
2. Filter: <5K stars, <5 contributors, active in last 12 months
3. Clone + rapid grep scan:
   grep -rn "memcpy|sprintf|strcpy|sscanf|atoi" --include="*.c"
4. Score: hits / lines_of_code = vulnerability density estimate
5. Top scorers enter deep audit queue

In our sprint, we scanned 20+ repositories in the first hour and correctly predicted which 8 would yield findings.

Head 2: AI-Augmented Code Review

Here's where the paradigm shift happens. We don't just use one AI model — we use three independent models in a structured debate protocol.

The Process:

System Output

1. Human analyst identifies a suspicious code pattern.
2. **Model A (Exploit Analyst):** "Is this pattern reachable? What are the preconditions and constraints for exploitation?"
3. **Model B (Attack Vector Analyst):** "Assuming it's exploitable, what is the attack vector, impact, and estimated CVSS?"
4. **Model C (Adversarial Critic):** "Why might this be a false positive? What library functions or environmental factors would mitigate this?"
5. Only if all three models build a consistent, plausible exploitation scenario do we proceed to dynamic verification.

Why three models? Single-model analysis has a false positive rate of approximately 15-20% in our experience. With three independent models, the false positive rate drops to near zero — because each model has different blind spots, and genuine vulnerabilities are obvious to all three.

What we DON'T do:

We don't ask AI to "find vulnerabilities" in a codebase (this produces noise)
We don't trust AI-generated CVSS scores without human review
We don't skip dynamic verification based on AI consensus alone

The AI augments human judgment — it doesn't replace it. The human identifies the suspicious pattern; the AI validates whether the suspicion is warranted.

Head 3: Automated Dynamic Verification

Every candidate vulnerability goes through a rigorous dynamic verification process before we report it:

System Output

Compile with: clang -fsanitize=address,undefined -fno-omit-frame-pointer -g -O0
Test with: ≥20 different parameter combinations
  - Normal inputs (must all pass cleanly)
  - Boundary inputs (edge cases)
  - Overflow inputs (must all trigger ASAN crash)
Acceptance criteria:
  - 100% crash rate for overflow scenarios
  - 100% clean rate for normal scenarios
  - Zero ambiguous results

Example verification output:

System Output

Test  1: param=normal    overflow=0   | clean
Test  2: param=normal    overflow=0   | clean
Test  3: param=boundary  overflow=1   | CRASHED (ASAN: stack-buffer-overflow)
Test  4: param=extreme   overflow=63  | CRASHED (ASAN: heap-buffer-overflow)
...
RESULTS: 12/20 crashed (all overflow=CRASH, all normal=CLEAN)

If even one normal input crashes, or one overflow input doesn't crash, we investigate further before reporting.

Patterns We Found

Without disclosing specific vulnerabilities (all under coordinated disclosure), here are the vulnerability patterns we observed across automotive open-source software:

Pattern 1: The Unbounded Protocol Field

System Output

// Dangerous: length field from network used directly in memcpy
uint8_t length = packet->length_field;  // attacker-controlled
memcpy(frame->data, packet->payload, length);  // data[] is fixed size!

This pattern appeared in 7 of our 31 findings. The fix is always the same: validate the length field against the destination buffer size before copying.

Pattern 2: The Integer Underflow Index

System Output

// Dangerous: unsigned subtraction can wrap around
uint8_t index = sequence_number - 1;  // if sequence_number == 0, index = 255!
buffer[index * STRIDE] = value;  // massive out-of-bounds write

This appeared in 4 findings across multiple forks of the same library — demonstrating how upstream vulnerabilities propagate through the open-source supply chain.

Pattern 3: The Format String Without Width

System Output

// Dangerous: %s reads unlimited bytes from network input
sscanf(network_buffer, "< command %s >", local_buffer);
// local_buffer might be 16 bytes, but network data can be 4096+

This appeared in 3 findings in network-facing daemons. The fix: use width specifiers (%16s) or switch to safer parsing functions.

Pattern 4: Extract-Before-Verify

System Output

// Dangerous: extraction happens before signature verification
extract_archive(untrusted_package);  // writes files to disk
if (!verify_signature(untrusted_package)) {
    cleanup_workdir();  // but path-traversal files are OUTSIDE workdir!
    return ERROR;
}

This appeared in 1 finding but it's the highest-impact pattern we discovered — a CVSS 9.8 that allows persistent code execution through a malicious software package, even when signature verification fails.

The Supply Chain Multiplier

One of our most impactful discoveries was finding a vulnerability in an upstream library that was then inherited by every downstream project using it.

System Output

Upstream library (vulnerability found here)
  ├── Project A (automotive OS) — already reported
  ├── Project B (fork) — affected
  ├── Project C (fork) — affected
  └── Unknown number of private/commercial users

From a single code review, we generated 4 separate CVE reports across different projects. This "supply chain multiplier" effect means that auditing small, widely-used libraries yields disproportionate impact.

What Didn't Work

Transparency requires admitting what failed:

AFL fuzzing on json-c: We ran 20 cores for 24+ hours (3.75 million executions) with zero crashes. Well-maintained libraries resist automated fuzzing.
Auditing BMW/Eclipse-maintained code: vsomeip (88K lines C++) and Eclipse CycloneDDS (486 C files) had zero findings. Production-grade code with professional CI/CD is extremely hard to crack.
Trying to audit Rust code: Eclipse Kuksa is written in Rust. Memory safety by design means our C/C++-focused methodology doesn't apply.
alloca DoS findings: We initially reported two alloca-based stack overflow findings, but our 3-model verification correctly identified them as false positives — the upstream HTTP library (libmicrohttpd) limits input sizes before the vulnerable code is reached. This saved us from submitting invalid reports.

Responsible Disclosure

All 31 vulnerabilities are under coordinated disclosure with their respective maintainers. We follow a strict 90-day (or 120-day for automotive) disclosure timeline. Technical details, proof-of-concept code, and affected project names will be published after patches are available.

We have notified:

4 automotive open-source project security teams
2 framework security teams
3 individual library maintainers
MITRE (15 tickets confirmed)

Implications for Automotive Security

17 of our 31 findings affect automotive software — CAN protocol parsers, vehicle monitoring systems, diagnostic libraries, and embedded firmware loaders. These are deployed in real vehicles on real roads.

The automotive open-source ecosystem has a security gap: while major players (BMW's vsomeip, Eclipse's CycloneDDS) maintain excellent security standards, the long tail of smaller libraries that glue the ecosystem together often lacks basic memory safety practices.

As vehicles become increasingly software-defined, the attack surface grows. We believe systematic, AI-augmented auditing of the automotive open-source supply chain is not optional — it's urgent.

What's Next

Technical deep-dives will be published as vulnerabilities are patched (subscribe for updates)
Open-source tools for automotive fuzzing harnesses (coming Q2 2026)
Conference talks at major security conferences (submissions pending)

If you're an automotive OEM, Tier-1 supplier, or open-source maintainer interested in proactive security auditing, reach out: [email protected]

Feng Ning is the founder of Innora Security Research, specializing in automotive cybersecurity and AI-augmented vulnerability discovery. This research was conducted independently and is not sponsored by any vendor.

All vulnerabilities described in this article are under responsible disclosure. No exploit code or affected project names are included.

Evidence trail from Innora Security Research:

TL;DR

Public CVE Methodology Snapshot

The snapshot is intentionally operational rather than theatrical: every public claim should map to a source link, test artifact, or disclosure status that a maintainer or buyer can inspect.

The Numbers

The Problem with Traditional Code Auditing

Traditional vulnerability research follows a linear path:

System Output

Choose target → Read code → Form hypothesis → Write PoC → Verify → Report

This process is slow, subjective, and doesn't scale. A skilled researcher might spend weeks on a single codebase, and the quality of findings depends entirely on individual experience and intuition.

We asked: What if we could parallelize the "intuition" step?

Our Approach: The Cerberus Protocol

We developed a three-headed verification system we call the Cerberus Protocol. Every vulnerability must survive all three heads before we consider it real.

Head 1: Rapid Target Selection

Not all code is created equal. We learned that vulnerability density varies wildly:

Our target selection pipeline:

System Output

1. GitHub search: language:C + topic keywords (CAN, UDS, ISO-TP, J1939, SOME/IP)
2. Filter: <5K stars, <5 contributors, active in last 12 months
3. Clone + rapid grep scan:
   grep -rn "memcpy|sprintf|strcpy|sscanf|atoi" --include="*.c"
4. Score: hits / lines_of_code = vulnerability density estimate
5. Top scorers enter deep audit queue

In our sprint, we scanned 20+ repositories in the first hour and correctly predicted which 8 would yield findings.

Head 2: AI-Augmented Code Review

Here's where the paradigm shift happens. We don't just use one AI model — we use three independent models in a structured debate protocol.

The Process:

System Output

1. Human analyst identifies a suspicious code pattern.
2. **Model A (Exploit Analyst):** "Is this pattern reachable? What are the preconditions and constraints for exploitation?"
3. **Model B (Attack Vector Analyst):** "Assuming it's exploitable, what is the attack vector, impact, and estimated CVSS?"
4. **Model C (Adversarial Critic):** "Why might this be a false positive? What library functions or environmental factors would mitigate this?"
5. Only if all three models build a consistent, plausible exploitation scenario do we proceed to dynamic verification.

What we DON'T do:

We don't ask AI to "find vulnerabilities" in a codebase (this produces noise)
We don't trust AI-generated CVSS scores without human review
We don't skip dynamic verification based on AI consensus alone

The AI augments human judgment — it doesn't replace it. The human identifies the suspicious pattern; the AI validates whether the suspicion is warranted.

Head 3: Automated Dynamic Verification

Every candidate vulnerability goes through a rigorous dynamic verification process before we report it:

System Output

Compile with: clang -fsanitize=address,undefined -fno-omit-frame-pointer -g -O0
Test with: ≥20 different parameter combinations
  - Normal inputs (must all pass cleanly)
  - Boundary inputs (edge cases)
  - Overflow inputs (must all trigger ASAN crash)
Acceptance criteria:
  - 100% crash rate for overflow scenarios
  - 100% clean rate for normal scenarios
  - Zero ambiguous results

Example verification output:

System Output

Test  1: param=normal    overflow=0   | clean
Test  2: param=normal    overflow=0   | clean
Test  3: param=boundary  overflow=1   | CRASHED (ASAN: stack-buffer-overflow)
Test  4: param=extreme   overflow=63  | CRASHED (ASAN: heap-buffer-overflow)
...
RESULTS: 12/20 crashed (all overflow=CRASH, all normal=CLEAN)

If even one normal input crashes, or one overflow input doesn't crash, we investigate further before reporting.

Patterns We Found

Without disclosing specific vulnerabilities (all under coordinated disclosure), here are the vulnerability patterns we observed across automotive open-source software:

Pattern 1: The Unbounded Protocol Field

System Output

// Dangerous: length field from network used directly in memcpy
uint8_t length = packet->length_field;  // attacker-controlled
memcpy(frame->data, packet->payload, length);  // data[] is fixed size!

This pattern appeared in 7 of our 31 findings. The fix is always the same: validate the length field against the destination buffer size before copying.

Pattern 2: The Integer Underflow Index

System Output

// Dangerous: unsigned subtraction can wrap around
uint8_t index = sequence_number - 1;  // if sequence_number == 0, index = 255!
buffer[index * STRIDE] = value;  // massive out-of-bounds write

This appeared in 4 findings across multiple forks of the same library — demonstrating how upstream vulnerabilities propagate through the open-source supply chain.

Pattern 3: The Format String Without Width

System Output

// Dangerous: %s reads unlimited bytes from network input
sscanf(network_buffer, "< command %s >", local_buffer);
// local_buffer might be 16 bytes, but network data can be 4096+

This appeared in 3 findings in network-facing daemons. The fix: use width specifiers (%16s) or switch to safer parsing functions.

Pattern 4: Extract-Before-Verify

System Output

// Dangerous: extraction happens before signature verification
extract_archive(untrusted_package);  // writes files to disk
if (!verify_signature(untrusted_package)) {
    cleanup_workdir();  // but path-traversal files are OUTSIDE workdir!
    return ERROR;
}

The Supply Chain Multiplier

One of our most impactful discoveries was finding a vulnerability in an upstream library that was then inherited by every downstream project using it.

System Output

Upstream library (vulnerability found here)
  ├── Project A (automotive OS) — already reported
  ├── Project B (fork) — affected
  ├── Project C (fork) — affected
  └── Unknown number of private/commercial users

What Didn't Work

Transparency requires admitting what failed:

AFL fuzzing on json-c: We ran 20 cores for 24+ hours (3.75 million executions) with zero crashes. Well-maintained libraries resist automated fuzzing.
Auditing BMW/Eclipse-maintained code: vsomeip (88K lines C++) and Eclipse CycloneDDS (486 C files) had zero findings. Production-grade code with professional CI/CD is extremely hard to crack.
Trying to audit Rust code: Eclipse Kuksa is written in Rust. Memory safety by design means our C/C++-focused methodology doesn't apply.
alloca DoS findings: We initially reported two alloca-based stack overflow findings, but our 3-model verification correctly identified them as false positives — the upstream HTTP library (libmicrohttpd) limits input sizes before the vulnerable code is reached. This saved us from submitting invalid reports.

Responsible Disclosure

We have notified:

4 automotive open-source project security teams
2 framework security teams
3 individual library maintainers
MITRE (15 tickets confirmed)

Implications for Automotive Security

As vehicles become increasingly software-defined, the attack surface grows. We believe systematic, AI-augmented auditing of the automotive open-source supply chain is not optional — it's urgent.

What's Next

Technical deep-dives will be published as vulnerabilities are patched (subscribe for updates)
Open-source tools for automotive fuzzing harnesses (coming Q2 2026)
Conference talks at major security conferences (submissions pending)

If you're an automotive OEM, Tier-1 supplier, or open-source maintainer interested in proactive security auditing, reach out: [email protected]

All vulnerabilities described in this article are under responsible disclosure. No exploit code or affected project names are included.

Evidence trail from Innora Security Research:

TL;DR

Public CVE Methodology Snapshot

The Numbers

The Problem with Traditional Code Auditing

Our Approach: The Cerberus Protocol

Head 1: Rapid Target Selection

Head 2: AI-Augmented Code Review

Head 3: Automated Dynamic Verification

Patterns We Found

Pattern 1: The Unbounded Protocol Field

Pattern 2: The Integer Underflow Index

Pattern 3: The Format String Without Width

Pattern 4: Extract-Before-Verify

The Supply Chain Multiplier

What Didn't Work

Responsible Disclosure

Implications for Automotive Security

What's Next

Feng Ning (风宁)

Related Chronicles

Broken By Design: Why One of the World's Largest Payment Apps Still Runs on Crypto from 2004

Vim's Partial Patch Problem: 14+ Heap Overflows Left Behind After CVE-2026-28421

CVE-2025-41243: Why "Property Modification" Undersells the Blast Radius

Subscribe for AI Security Insights

TL;DR

Public CVE Methodology Snapshot

The Numbers

The Problem with Traditional Code Auditing

Our Approach: The Cerberus Protocol

Head 1: Rapid Target Selection

Head 2: AI-Augmented Code Review

Head 3: Automated Dynamic Verification

Patterns We Found

Pattern 1: The Unbounded Protocol Field

Pattern 2: The Integer Underflow Index

Pattern 3: The Format String Without Width

Pattern 4: Extract-Before-Verify

The Supply Chain Multiplier

What Didn't Work

Responsible Disclosure

Implications for Automotive Security

What's Next

Feng Ning (风宁)

Related Chronicles

Broken By Design: Why One of the World's Largest Payment Apps Still Runs on Crypto from 2004

Vim's Partial Patch Problem: 14+ Heap Overflows Left Behind After CVE-2026-28421

CVE-2025-41243: Why "Property Modification" Undersells the Blast Radius

Subscribe for AI Security Insights