CLASSIFIED: OPEN DOSSIER
Whitepaper: Graph Neural Networks for Malicious Code Detection
2021-05-14 Feng Jiqiang
# Graph Neural Networks (GNN) for Malicious Code Detection
**Patent ID**: CN112801452A
**Assignee**: Jinboan Technology
**Inventor**: Feng Jiqiang
## Abstract
Traditional malware detection relies on signature matching (static analysis) or behavioral sandbox monitoring (dynamic analysis). Both struggle with modern polymorphic malware that constantly changes its binary structure while determining the same malicious logic.
This paper introduces a detection method based on **Graph Neural Networks (GNN)**. We convert the compiled binary code into a **Control Flow Graph (CFG)**, where nodes represent basic blocks of instructions and edges represent execution paths. A dedicated GNN architecture then learns the topological features of these graphs to classify benign vs. malicious software, achieving 99.2% accuracy on the obfuscated malware dataset.
## Methodology
### 1. Graph Construction
- **Disassembly**: Using IDA Pro/Ghidra to extract assembly instructions.
- **Node Feature Extraction**: Embedding assembly instructions (opcode + operands) into vectors using `asm2vec`.
- **Edge Definition**: Creating directed edges for jumps (`JMP`, `CALL`, `RET`).
### 2. GNN Architecture
We utilize a **Graph Isomorphism Network (GIN)** layer to update node embeddings by aggregating neighbor information. The graph-level embedding is obtained via a `READOUT` function (sum-pooling).
$$ h_v^{(k)} = \text{MLP}^{(k)} \left( (1 + \epsilon^{(k)}) \cdot h_v^{(k-1)} + \sum_{u \in \mathcal{N}(v)} h_u^{(k-1)} \right) $$
### 3. Advantages
- **Robustness**: Resistant to instruction padding and reordering attacks.
- **Generalization**: Can detect "Zero-Day" variants of known malware families by recognizing shared structural subgraphs.
## Conclusion
The GNN-based approach shifts the defense paradigm from "Pattern Matching" to "Structural Understanding," providing a resilient shield against the next generation of AI-generated malware.