Arxan/Digital.ai String Encryption: A Complete Mathematical Reverse Engineering of Commercial-Grade Obfuscation

Classification: Security Research — Mathematical Cryptanalysis Tools: Innora-Sentinel Reverse Engineering Engine, JADX, Frida Scope: Arxan/Digital.ai TransformIT string encryption subsystem Disclosure: Responsible disclosure completed. All identifying information redacted. Author: Feng Ning, CISSP — Innora.ai

Executive Summary

During an authorized security assessment of a commercially hardened Android banking application, we encountered Arxan/Digital.ai TransformIT — one of the most expensive and widely deployed commercial obfuscation solutions in the financial industry. The application contained 112,102 encrypted string call sites protected by 8 deterministic Pseudo-Random Number Generators (PRNGs), 18 obfuscated arithmetic wrappers, and a 3-plane Unicode dispatch mechanism.

This report presents the complete mathematical derivation of the encryption scheme — from seed initialization through final codepoint recovery — and documents how our Innora-Sentinel engine achieved a 72.7% decryption rate (81,775 strings, 38,970 unique), exposing hardcoded API endpoints, cryptographic keys, session tokens, and internal test environment configurations hidden behind millions of dollars worth of commercial protection.

The core vulnerability is architectural: Arxan's string encryption is entirely deterministic. Once the PRNG seeds are resolved (a one-time computation), every encrypted string in the application can be decrypted in constant time per character without any key material — because the "key" is the algorithm itself.

1. Architecture Overview

1.1 The Encryption Hierarchy

Arxan's string encryption operates as a 4-layer pipeline:

System Output

Layer 1: C2399Rq (String Iterator)
    ↓ extracts raw codepoint via XOR mask
Layer 2: AbstractC2524XZ.m6682yc() (Plane Dispatcher)
    ↓ routes to one of 3 Unicode planes
Layer 3: InU() (Plane Normalization)
    ↓ shifts codepoint to plane-relative value
Layer 4: rnU() (Modular Recovery)
    → maps to final decoded character

1.2 Class Mapping (Obfuscated → Functional)

| Obfuscated Name | Original Package | Functional Role | |---|---|---| | C2399Rq | ux.Rq | String iterator / character extractor | | AbstractC2524XZ | ux.XZ | PRNG dispatcher (abstract base) | | C3115yc | ux.yc | Plane 0: ASCII decoder (range 1–127) | | C2527Xc | ux.Xc | Plane 1: BMP decoder (range 128–2047) | | C2847lw | ux.lw | Plane 2: Extended decoder (range 2048–65535) | | C2855mq | ux.mq | Primary PRNG: Mersenne Twister 19937-64 | | C3072xZ | ux.xZ | Secondary PRNG: Dispatch constant generator | | C2115DN | ux.DN | Auxiliary PRNG: Plane 2 range lower bound | | C2212Iw | ux.Iw | Auxiliary PRNG: Plane 2 modulus | | C2598ZO | ux.ZO | Auxiliary PRNG: XOR wrapper support | | C2790ju | ux.ju | Auxiliary PRNG: Plane 1 validation | | C2849ly | ux.ly | Auxiliary PRNG: Secondary operations | | C3064xO | ux.xO | Auxiliary PRNG: Tertiary operations |

1.3 The Switch Dispatch Mechanism

Before we can analyze strings, we must understand how Arxan routes all method calls through a central dispatch:

System Output

// Every protected method routes through a single switch:
private Object dispatch(int encodedOp, Object... args) {
    switch (encodedOp % switchModulus) {
        case N1: /* operation 1 */ break;
        case N2: /* operation 2 */ break;
        // ... thousands of cases
    }
}

The switchModulus is computed once at initialization:

System Output

switchModulus = 236257144 ⊕ C2855mq.m9544Zc()

Where C2855mq.m9544Zc() returns a deterministic value derived from the Mersenne Twister. In the target application:

System Output

switchModulus = 236257144 ⊕ PRNG_OUTPUT = 12794

This means every method call reduces to: encodedOperand % 12794 → case index.

2. Layer 1: The String Iterator (C2399Rq)

2.1 Decompiled Source

System Output

public final class C2399Rq {
    public final String f_Yc;      // encoded string
    public int f_Zc = 0;           // current index
    public final int f_yc;         // string length

    public C2399Rq(String str) {
        this.f_Yc = str;
        this.f_yc = str.length();
    }

    // hasNext()
    public boolean HeD() {
        return this.f_Zc < this.f_yc;
    }

    // nextChar() — the critical extraction function
    public int YeD() {
        char c = this.f_Yc.charAt(this.f_Zc);
        int idx = this.f_Zc;
        this.f_Zc = (idx & 1) + (idx | 1);  // increment
        int K1 = 2109566128 ^ 522617746;
        int K2 = ((~1654329565) & K1) | ((~K1) & 1654329565);
        return (K2 + c) - (K2 | c);
    }
}

2.2 Mathematical Derivation

Step 1: Index Advancement

The expression (idx & 1) + (idx | 1) is an obfuscated increment:

Identity: For any integer a and b: (a & b) + (a | b) = a + b

Proof: In binary addition, a + b = (a ⊕ b) + 2·(a & b). Meanwhile, a | b = (a ⊕ b) + (a & b). Therefore: (a & b) + (a | b) = (a & b) + (a ⊕ b) + (a & b) = a + b. ∎

Applying with b = 1: (idx & 1) + (idx | 1) = idx + 1

So f_Zc simply increments by 1 each call. Standard iterator.

Step 2: The XOR Mask Constants

System Output

K1 = 2109566128 ⊕ 522617746

The expression (~A & B) | (~B & A) is the algebraic definition of XOR:

Identity: (~A & B) | (~B & A) = A ⊕ B

Therefore:

System Output

K2 = K1 ⊕ 1654329565 = (2109566128 ⊕ 522617746) ⊕ 1654329565

Since XOR is associative: K2 = 2109566128 ⊕ 522617746 ⊕ 1654329565

Step 3: The Extraction Operation

The return expression (K2 + c) - (K2 | c) uses another identity:

Identity: (a + b) - (a | b) = a & b

Proof: From a + b = (a | b) + (a & b), we get (a + b) - (a | b) = a & b. ∎

Therefore: YeD() = K2 & charValue

This is a bitwise AND mask applied to each character of the encoded string. The mask K2 filters specific bit positions, preserving only the bits needed for the downstream plane dispatch.

2.3 Simplified Equivalent

System Output

K2 = 2109566128 ^ 522617746 ^ 1654329565  # constant mask

def extract_char(encoded_string, index):
    return K2 & ord(encoded_string[index])

3. Layer 2: The 3-Plane Unicode Dispatch (AbstractC2524XZ)

3.1 Decompiled Dispatch Logic

System Output

public abstract class AbstractC2524XZ {
    static final AbstractC2524XZ PLANE_0 = new C3115yc();  // ASCII
    static final AbstractC2524XZ PLANE_1 = new C2527Xc();  // BMP lower
    static final AbstractC2524XZ PLANE_2 = new C2847lw();  // BMP upper

    public static AbstractC2524XZ m6682yc(int codepoint) {
        if (codepoint == 0) {
            return PLANE_1;
        }
        int k = C3072xZ.m11824Zc();  // PRNG-derived constant
        int mask = (k | (-959204836)) & ((~k) | (~(-959204836)));
        //       = k ⊕ (-959204836)

        // Test: (mask & codepoint) != 0
        if ((mask + codepoint) - (mask | codepoint) != 0) {
            return PLANE_2;   // supplementary range
        }
        // Test: (codepoint & 1920) != 0
        // 1920 = 0b11110000000 (bits 7–10)
        if ((codepoint & 1920) != 0) {
            return PLANE_1;   // BMP lower
        }
        return PLANE_0;       // ASCII
    }
}

3.2 Mathematical Analysis

The PRNG-Derived Bitmask:

C3072xZ.m11824Zc() returns a deterministic value at first call, cached thereafter. The expression (k | A) & (~k | ~A) is, once again, XOR:

System Output

mask = k ⊕ (-959204836)

This mask partitions the Unicode space into three planes based on bit pattern analysis:

| Condition | Plane | Decoder | Character Range | |---|---|---|---| | codepoint == 0 | Plane 1 | C2527Xc | NULL → special handling | | (mask & codepoint) != 0 | Plane 2 | C2847lw | 2048 – 65535 | | (codepoint & 0x780) != 0 | Plane 1 | C2527Xc | 128 – 2047 | | otherwise | Plane 0 | C3115yc | 1 – 127 (ASCII) |

The constant 1920 = 0x780 = 0b11110000000 tests whether any of bits 7–10 are set, which corresponds to codepoints ≥ 128 (outside pure ASCII range) but below the supplementary threshold.

3.3 Why Three Planes?

This is the elegant core of Arxan's design. By partitioning Unicode into three ranges, each with its own modular arithmetic:

Plane 0 (C3115yc): modulus 127, offset 1 → ASCII printable characters
Plane 1 (C2527Xc): modulus 1921, offset 127 → BMP Latin/CJK/Symbols
Plane 2 (C2847lw): modulus ~63488, offset 2048 → Supplementary characters

Each plane uses a different modular reduction to map arbitrary intermediate values back into the correct Unicode range. This prevents cross-plane corruption during decryption.

4. Layer 3 & 4: Plane-Specific Decryption

4.1 Plane 0: ASCII Decoder (C3115yc)

System Output

public final class C3115yc extends AbstractC2524XZ {
    static final int RANGE  = 128;
    static final int OFFSET = 1;
    static final int MOD    = 127;

    public int InU(int i) {
        // Obfuscated: i + (-1) via binary addition loop
        // Identity: the while loop computes i + (-1) = i - 1
        return i - 1;
    }

    public int rnU(int i) {
        int r = i % 127;
        if (r < 0) r += 127;
        // (r & 1) + (r | 1) = r + 1
        return r + 1;
    }
}

Derivation:

For InU: The loop while (i2 != 0) { i3 = i ⊕ i2; i2 = (i & i2) << 1; i = i3; } with initial i2 = -1 implements binary addition of i + (-1):

Lemma: The carry-propagation loop a ⊕ b; carry = (a & b) << 1; repeat until carry = 0 computes a + b in binary.

So InU(i) = i + (-1) = i - 1.

For rnU: ((i % 127) mod₊ 127) + 1, giving output range [1, 127] — exactly the printable ASCII range.

Complete Plane 0 transform:

System Output

decode₀(c) = ((c - 1) % 127 + 127) % 127 + 1

4.2 Plane 1: BMP Lower Decoder (C2527Xc)

System Output

public final class C2527Xc extends AbstractC2524XZ {
    static final int RANGE  = 2048;
    static final int OFFSET = 127;
    static final int MOD    = 1921;

    public int InU(int i) {
        // (i & (-127)) + (i | (-127)) = i + (-127) = i - 127
        return i - 127;
    }

    public int rnU(int i) {
        int r = i % 1921;
        if (r == 0) return 0;
        if (r < 0) r += 1921;
        return r + 127;
    }
}

Derivation:

For InU: Using the identity (a & b) + (a | b) = a + b:

System Output

(i & (-127)) + (i | (-127)) = i + (-127) = i - 127

For rnU: (i % 1921) + 127, with special handling for zero. Output range: [127, 2047] — Unicode BMP lower range.

Complete Plane 1 transform:

System Output

decode₁(c) = ((c - 127) % 1921 + 1921) % 1921 + 127

4.3 Plane 2: Extended Decoder (C2847lw)

System Output

public final class C2847lw extends AbstractC2524XZ {
    static final int RANGE  = 65536;
    static final int OFFSET = 2048;
    static final int MOD    = 63488;  // derived from PRNG

    public int InU(int i) {
        return i - 2048;
    }

    public int rnU(int i) {
        int mod = C2212Iw.m3389Zc() ^ 906847247;
        int r = i % mod;
        if (r < 0) r += mod;
        return r + 2048;
    }
}

Derivation:

InU is straightforward subtraction. For rnU, the modulus is PRNG-derived:

System Output

modulus = C2212Iw.m3389Zc() ⊕ 906847247

This resolves to approximately 63488, giving output range: [2048, 65535] — the full Unicode BMP supplementary range.

Complete Plane 2 transform:

System Output

decode₂(c) = ((c - 2048) % M₂ + M₂) % M₂ + 2048
    where M₂ = C2212Iw.seed ⊕ 906847247

5. The Mersenne Twister Core (C2855mq)

5.1 Identification

The primary PRNG C2855mq is a textbook Mersenne Twister MT19937-64 implementation, identifiable by its constants:

| Parameter | Value | MT19937-64 Standard | |---|---|---| | State array size | 312 (long[312]) | 312 (nn) | | Twist offset | 156 | 156 (mm) | | Multiplication constant | 6364136223846793005 | 6364136223846793005 | | Tempering constants | 8202884508482404352 | — (customized) | | Seed | 1582653990636492227 | — (application-specific) |

5.2 Initialization Sequence

System Output

// Seed the state array
state[0] = SEED;  // 1582653990636492227L
for (int i = 1; i < 312; i++) {
    state[i] = ((state[i-1] ⊕ (state[i-1] >> 62))
                * 6364136223846793005L) + i;
}

This is the standard MT19937-64 initialization from a single 64-bit seed.

5.3 The switchModulus Derivation

The critical m9544Zc() method (which provides the switch dispatch modulus) uses a deliberately triggered exception:

System Output

public static int m9544Zc() {
    if (!initialized) {
        synchronized (lock) {
            try {
                int i = 1 / 0;  // Deliberate ArithmeticException!
                seed = (int) Math.random();  // Dead code
            } catch (Exception e) {
                // Actual initialization: run MT for 10 iterations
                for (long j = 144; j < 154; j++) {
                    seed = m9543Yc();  // MT output
                }
                initialized = true;
            }
        }
    }
    return (int) seed;
}

Anti-Analysis Technique: The 1 / 0 division ensures the code always enters the catch block. The Math.random() line is dead code — a decoy to mislead static analyzers into thinking the seed is non-deterministic.

In reality, the seed is the output of the 154th MT iteration, which is fully deterministic given the hardcoded initial seed 1582653990636492227L. This means:

System Output

C2855mq.m9544Zc() = MT19937_64(1582653990636492227L, iteration=154)

And therefore:

System Output

switchModulus = 236257144 ⊕ MT₁₅₄ = 12794

This value is constant across all installations of the same application version.

6. The 18 Obfuscated Arithmetic Wrappers

Arxan's obfuscation replaces standard arithmetic operators with semantically equivalent but visually opaque expressions. We identified and proved 18 distinct patterns:

6.1 Catalogue of Operator Identities

| # | Obfuscated Form | Simplified | Proof | |---|---|---|---| | 1 | (~A & B) \| (~B & A) | A ⊕ B | Definition of XOR | | 2 | (A \| B) & (~A \| ~B) | A ⊕ B | De Morgan + distribution | | 3 | (A + B) - (A \| B) | A & B | From A+B = (A\|B) + (A&B) | | 4 | (A & 1) + (A \| 1) | A + 1 | Instance of (A&B)+(A\|B) = A+B | | 5 | (A & (-1)) + (A \| (-1)) | A - 1 | A + (-1) via identity | | 6 | (A & (-N)) + (A \| (-N)) | A - N | Generalized subtraction | | 7 | (-1) - ((-1 - A) \| (-1 - B)) | A & B | De Morgan: ~(~A \| ~B) | | 8 | (A - B) - (A \| (-B)) | ... | Obfuscated subtraction variant | | 9 | Binary addition loop | A + B | Carry-propagation adder | | 10 | A ⊕ C1 ⊕ C2 ⊕ C3 | A ⊕ K | Chained XOR constants | | 11 | 1 / 0 in try block | Exception trigger | Anti-analysis: force catch path | | 12 | Volatile flag + synchronized | One-time init | Thread-safe lazy singleton | | 13 | C.isInstance(obj.getClass()) | false | Always-false branch (dead code) | | 14 | (A + 1) - (A \| 1) | A & 1 | Parity bit extraction | | 15 | ~(~A \| ~B) | A & B | De Morgan's Law | | 16 | (A | B) + (A & B) | A + B | Binary addition identity | | 17 | XOR cascade loop | Binary addition | Equivalent to + operator | | 18 | Conditional nested XOR | Modular reduction | Plane-specific normalization |

6.2 Key Proof: The Binary Addition Loop

This pattern appears 7 times in the codebase:

System Output

int result = a;
int carry = b;
while (carry != 0) {
    int sum = result ^ carry;
    carry = (result & carry) << 1;
    result = sum;
}
// result = a + b

Proof by invariant: At each iteration, result + carry equals the original a + b. The XOR computes the sum-without-carry, while (result & carry) << 1 computes the carry bits shifted left. When carry reaches 0, all carries have been propagated and result holds the final sum. This is a hardware-level ripple-carry adder implemented in software. ∎

7. Complete Decryption Algorithm

7.1 Pseudocode

Combining all layers, the complete decryption is:

System Output

# Pre-computed constants (resolved once per application version)
K2 = 2109566128 ^ 522617746 ^ 1654329565   # extraction mask
DISPATCH_MASK = C3072xZ_seed ^ (-959204836) # plane dispatch mask
PLANE2_MOD = C2212Iw_seed ^ 906847247       # Plane 2 modulus

def decrypt_string(encoded: str) -> str:
    result = []
    for char in encoded:
        c = K2 & ord(char)  # Layer 1: extract

        # Layer 2: dispatch to plane
        if c == 0:
            plane = 1
        elif (DISPATCH_MASK & c) != 0:
            plane = 2
        elif (c & 0x780) != 0:
            plane = 1
        else:
            plane = 0

        # Layer 3+4: normalize and recover
        if plane == 0:
            decoded = ((c - 1) % 127 + 127) % 127 + 1
        elif plane == 1:
            r = (c - 127) % 1921
            decoded = 0 if r == 0 else (r + 1921 if r < 0 else r) + 127
        else:  # plane == 2
            r = (c - 2048) % PLANE2_MOD
            decoded = (r + PLANE2_MOD if r < 0 else r) + 2048

        result.append(chr(decoded))

    return ''.join(result)

7.2 Time Complexity

Per-character: O(1) — all operations are constant-time arithmetic
Per-string: O(n) where n = string length
Entire application: O(S × L̄) where S = number of call sites, L̄ = average string length
Actual runtime: 81,775 strings decrypted in 3.4 seconds on Apple M4 Max

7.3 Why 72.7% and Not 100%?

Of the 112,102 encrypted call sites:

| Category | Count | Percentage | Reason | |---|---|---|---| | Successfully decrypted | 81,775 | 72.7% | Standard string encryption | | Control flow artifacts | ~15,000 | 13.4% | Not actual strings — switch case padding | | Dynamic PRNG variants | ~8,000 | 7.1% | Use runtime-computed PRNG seeds | | Non-string dispatch | ~5,000 | 4.5% | Method dispatch, not string calls | | Encoding errors | ~2,327 | 2.1% | Surrogate pairs / malformed sequences |

The 72.7% rate represents all genuine encrypted strings in the application. The remaining 27.3% are either not strings at all (control flow artifacts from Arxan's flattening) or use dynamic seeds that require runtime interception.

8. Critical Discoveries from Decrypted Strings

The 38,970 unique decrypted strings revealed the application's entire internal architecture:

8.1 Hardcoded Cryptographic Configuration

System Output

Decrypted: "iAES3", "iAES4", "iHMAC"
Context: White-box cryptography key identifiers
Risk: Enables targeted key extraction via Frida hooks

The application uses Digital.ai TransformIT white-box cryptography (libwb-native-lib.so) with three key types. The key derivation was found to use simple substring operations:

System Output

POST requests: key = secret.substring(0, 16)   // first 16 chars
GET requests:  key = secret.substring(len - 16) // last 16 chars

This reduces the key space from the full secret length to a mere 16 ASCII characters.

8.2 AES Cipher Modes

Decryption revealed four distinct AES configurations:

| Mode | Usage | Security Note | |---|---|---| | AES/CCM/NoPadding | Authenticated encryption | Secure if properly implemented | | AES/CTR/NoPadding | Stream encryption | No authentication — malleable | | AES/CTR/PKCS5Padding | Padded stream | Unusual combination; potential oracle | | AES/GCM/NoPadding | Authenticated encryption | Secure if nonce management correct |

8.3 Internal Environment Leakage

System Output

Decrypted: SIT environment identifiers and project IDs
Context: Firebase configuration strings
Risk: Test/staging credentials shipped in production binary

The production APK contained references to SIT (System Integration Testing) environment configurations — a direct violation of secure SDLC practices indicating that test configurations were never stripped from the release build.

8.4 Security Detection Strings

System Output

Decrypted:
  "ANDROID_ROOT_TOOLKIT"
  "CYDIA_ROOT_APP"
  "MAGISK"
  "ROOT_HIDING"
  "XPOSED_APPS"
  "FRIDA"
  "libfrida"
  "de.robv.android.xposed.XposedBridge"
  "rootedDevice"
  "rootFile"

These strings enumerate the application's root/tamper detection logic — providing a complete roadmap for bypassing every security check in the application.

9. The Arxan Paradox

9.1 The Cost–Benefit Inversion

The fundamental flaw in Arxan's string encryption is not implementation quality — it is architectural determinism.

Every constant in the system is derived from hardcoded seeds:

The Mersenne Twister seed (1582653990636492227L) is in the DEX bytecode
The switch modulus (12794) is computed from that seed
The XOR masks are literal constants in the class files
The PRNG outputs are fully reproducible

This means that any analyst with a JADX decompiler can reconstruct the entire decryption key by reading the source code. No dynamic analysis, no Frida hooks, no runtime interception — just static analysis and a pocket calculator.

9.2 Obfuscation ≠ Encryption

Arxan's string protection should be understood as obfuscation (making code hard to read), not encryption (making data impossible to recover without a key). The distinction is critical:

| Property | True Encryption | Arxan Strings | |---|---|---| | Requires key material | Yes | No — key is the algorithm | | Resists known-plaintext | Yes | No — plaintext fully recoverable | | Key stored separately | Yes | N/A — "key" is in the same DEX | | Changes per installation | Should | No — deterministic from seed | | Computational hardness | Based on math problem | Based on analyst patience |

9.3 What Arxan Does Protect Against

To be fair, Arxan's obfuscation successfully defeats:

Automated string extraction tools (strings, apktool)
Pattern-matching malware scanners
Junior analysts without mathematical reverse engineering capability
Time-constrained assessments (< 4 hours)

It does not protect against:

Dedicated reverse engineers with mathematical background
AI-assisted deobfuscation engines (like Innora-Sentinel)
Frida runtime interception (bypasses the entire scheme)
Nation-state level adversaries

10. Innora-Sentinel Automation

10.1 Engine Architecture

Our Innora-Sentinel engine automated the entire analysis through a 4-phase pipeline:

System Output

Phase 1: PRNG Resolution (0.2s)
    → Identify all 8 PRNG classes by MT19937-64 signatures
    → Execute seed derivation to extract constants
    → Compute switchModulus, plane masks, moduli

Phase 2: Wrapper Deobfuscation (0.8s)
    → Pattern-match 18 arithmetic identities
    → Symbolically simplify all expressions
    → Build abstract operator graph

Phase 3: Batch Decryption (3.4s)
    → Iterate all 112,102 call sites
    → Apply Layer 1–4 pipeline per character
    → Deduplicate → 38,970 unique strings

Phase 4: Intelligence Extraction (1.2s)
    → Classify strings: API endpoints, keys, configs
    → Cross-reference with manifest and resources
    → Generate vulnerability mapping

Total wall-clock time: 5.6 seconds on M4 Max (128GB unified memory).

10.2 Validation

We validated the decryption engine against 500 randomly sampled strings using Frida runtime interception:

System Output

// Frida validation hook — intercept actual string usage at runtime
Java.perform(function() {
    let decoder = Java.use("<target.package>.ux.Rq");
    decoder.$init.implementation = function(encoded) {
        this.$init(encoded);
        let decoded = "";
        while (this.HeD()) {
            decoded += String.fromCharCode(
                resolveChar(this.YeD())
            );
        }
        console.log(`[VALIDATE] ${encoded} → ${decoded}`);
    };
});

Validation result: 500/500 (100%) match between static derivation and runtime interception. The mathematical model is exact.

11. Remediation Recommendations

For the Assessed Application

Do not rely on string encryption alone — assume all strings are recoverable
Move sensitive strings server-side — API endpoints, keys, and configurations should not exist in client code
Implement certificate pinning at the network layer (independent of string obfuscation)
Use dynamic key derivation — derive PRNG seeds from server-provided values at runtime
Strip test configurations from production builds (SIT references found in release APK)

For the Industry

Commercial obfuscation creates a false sense of security — the assessed application deployed millions of dollars of Arxan protection while leaving fundamental TLS validation unimplemented
Deterministic encryption is not encryption — any scheme recoverable through static analysis alone provides only delay, not security
Defense in depth is non-negotiable — obfuscation should complement, never replace, proper cryptographic implementation

12. Responsible Disclosure Timeline

| Date | Action | |---|---| | 2026-02-24 | Vulnerability discovered during authorized assessment | | 2026-02-25 | Full technical report generated and verified | | 2026-02-26 | Report submitted to institution's CISO through secure channel | | 2026-02-27 | Public disclosure (redacted) — no identifying details released | | Ongoing | Remediation support offered; monitoring for patch deployment |

Key Takeaways

Arxan string encryption is deterministic — all PRNG seeds are hardcoded in the DEX bytecode, making the scheme fully recoverable through static analysis alone
The Mersenne Twister MT19937-64 (state[312], seed 1582653990636492227) produces all dispatch constants; once identified, every encrypted string can be decrypted in O(1) per character
Arxan uses 18 obfuscated arithmetic patterns (e.g., (a+b)-(a|b) = a&b) to replace standard operators — all provably equivalent through boolean algebra identities
The 3-plane Unicode dispatch (ASCII 1–127 / BMP 128–2047 / Extended 2048–65535) preserves character range integrity during encryption but is fully reversible
Commercial obfuscation is not encryption — it increases analysis time but cannot prevent recovery when key material is embedded in the same binary
Our engine decrypted 81,775 of 112,102 strings (72.7%) in 3.4 seconds, revealing hardcoded crypto keys, API endpoints, and test environment configurations

Legal Disclaimer

This research was conducted under explicit written authorization from the asset owner as part of a contracted security assessment engagement. All testing was performed in a controlled environment with no impact on production systems or real user data. Class names shown are Arxan-obfuscated identifiers that do not reveal the target application's identity. No exploit code targeting specific institutions is provided. This publication serves educational and defensive purposes in accordance with responsible disclosure principles.

The mathematical analysis presented here applies to the general Arxan/Digital.ai TransformIT string encryption architecture and is intended to help security teams understand the limitations of commercial obfuscation when relied upon as a primary security control.

For technical inquiries about Innora-Sentinel's reverse engineering capabilities, contact [email protected]

Related from Innora Security Research:

Arxan/Digital.ai String Encryption: A Complete Mathematical Reverse Engineering of Commercial-Grade Obfuscation

Classification: Security Research — Mathematical Cryptanalysis Tools: Innora-Sentinel Reverse Engineering Engine, JADX, Frida Scope: Arxan/Digital.ai TransformIT string encryption subsystem Disclosure: Responsible disclosure completed. All identifying information redacted. Author: Feng Ning, CISSP — Innora.ai

Executive Summary

1. Architecture Overview

1.1 The Encryption Hierarchy

Arxan's string encryption operates as a 4-layer pipeline:

System Output

Layer 1: C2399Rq (String Iterator)
    ↓ extracts raw codepoint via XOR mask
Layer 2: AbstractC2524XZ.m6682yc() (Plane Dispatcher)
    ↓ routes to one of 3 Unicode planes
Layer 3: InU() (Plane Normalization)
    ↓ shifts codepoint to plane-relative value
Layer 4: rnU() (Modular Recovery)
    → maps to final decoded character

1.2 Class Mapping (Obfuscated → Functional)

1.3 The Switch Dispatch Mechanism

Before we can analyze strings, we must understand how Arxan routes all method calls through a central dispatch:

System Output

// Every protected method routes through a single switch:
private Object dispatch(int encodedOp, Object... args) {
    switch (encodedOp % switchModulus) {
        case N1: /* operation 1 */ break;
        case N2: /* operation 2 */ break;
        // ... thousands of cases
    }
}

The switchModulus is computed once at initialization:

System Output

switchModulus = 236257144 ⊕ C2855mq.m9544Zc()

Where C2855mq.m9544Zc() returns a deterministic value derived from the Mersenne Twister. In the target application:

System Output

switchModulus = 236257144 ⊕ PRNG_OUTPUT = 12794

This means every method call reduces to: encodedOperand % 12794 → case index.

2. Layer 1: The String Iterator (C2399Rq)

2.1 Decompiled Source

System Output

public final class C2399Rq {
    public final String f_Yc;      // encoded string
    public int f_Zc = 0;           // current index
    public final int f_yc;         // string length

    public C2399Rq(String str) {
        this.f_Yc = str;
        this.f_yc = str.length();
    }

    // hasNext()
    public boolean HeD() {
        return this.f_Zc < this.f_yc;
    }

    // nextChar() — the critical extraction function
    public int YeD() {
        char c = this.f_Yc.charAt(this.f_Zc);
        int idx = this.f_Zc;
        this.f_Zc = (idx & 1) + (idx | 1);  // increment
        int K1 = 2109566128 ^ 522617746;
        int K2 = ((~1654329565) & K1) | ((~K1) & 1654329565);
        return (K2 + c) - (K2 | c);
    }
}

2.2 Mathematical Derivation

Step 1: Index Advancement

The expression (idx & 1) + (idx | 1) is an obfuscated increment:

Identity: For any integer a and b: (a & b) + (a | b) = a + b

Proof: In binary addition, a + b = (a ⊕ b) + 2·(a & b). Meanwhile, a | b = (a ⊕ b) + (a & b). Therefore: (a & b) + (a | b) = (a & b) + (a ⊕ b) + (a & b) = a + b. ∎

Applying with b = 1: (idx & 1) + (idx | 1) = idx + 1

So f_Zc simply increments by 1 each call. Standard iterator.

Step 2: The XOR Mask Constants

System Output

K1 = 2109566128 ⊕ 522617746

The expression (~A & B) | (~B & A) is the algebraic definition of XOR:

Identity: (~A & B) | (~B & A) = A ⊕ B

Therefore:

System Output

K2 = K1 ⊕ 1654329565 = (2109566128 ⊕ 522617746) ⊕ 1654329565

Since XOR is associative: K2 = 2109566128 ⊕ 522617746 ⊕ 1654329565

Step 3: The Extraction Operation

The return expression (K2 + c) - (K2 | c) uses another identity:

Identity: (a + b) - (a | b) = a & b

Proof: From a + b = (a | b) + (a & b), we get (a + b) - (a | b) = a & b. ∎

Therefore: YeD() = K2 & charValue

This is a bitwise AND mask applied to each character of the encoded string. The mask K2 filters specific bit positions, preserving only the bits needed for the downstream plane dispatch.

2.3 Simplified Equivalent

System Output

K2 = 2109566128 ^ 522617746 ^ 1654329565  # constant mask

def extract_char(encoded_string, index):
    return K2 & ord(encoded_string[index])

3. Layer 2: The 3-Plane Unicode Dispatch (AbstractC2524XZ)

3.1 Decompiled Dispatch Logic

System Output

public abstract class AbstractC2524XZ {
    static final AbstractC2524XZ PLANE_0 = new C3115yc();  // ASCII
    static final AbstractC2524XZ PLANE_1 = new C2527Xc();  // BMP lower
    static final AbstractC2524XZ PLANE_2 = new C2847lw();  // BMP upper

    public static AbstractC2524XZ m6682yc(int codepoint) {
        if (codepoint == 0) {
            return PLANE_1;
        }
        int k = C3072xZ.m11824Zc();  // PRNG-derived constant
        int mask = (k | (-959204836)) & ((~k) | (~(-959204836)));
        //       = k ⊕ (-959204836)

        // Test: (mask & codepoint) != 0
        if ((mask + codepoint) - (mask | codepoint) != 0) {
            return PLANE_2;   // supplementary range
        }
        // Test: (codepoint & 1920) != 0
        // 1920 = 0b11110000000 (bits 7–10)
        if ((codepoint & 1920) != 0) {
            return PLANE_1;   // BMP lower
        }
        return PLANE_0;       // ASCII
    }
}

3.2 Mathematical Analysis

The PRNG-Derived Bitmask:

C3072xZ.m11824Zc() returns a deterministic value at first call, cached thereafter. The expression (k | A) & (~k | ~A) is, once again, XOR:

System Output

mask = k ⊕ (-959204836)

This mask partitions the Unicode space into three planes based on bit pattern analysis:

The constant 1920 = 0x780 = 0b11110000000 tests whether any of bits 7–10 are set, which corresponds to codepoints ≥ 128 (outside pure ASCII range) but below the supplementary threshold.

3.3 Why Three Planes?

This is the elegant core of Arxan's design. By partitioning Unicode into three ranges, each with its own modular arithmetic:

Plane 0 (C3115yc): modulus 127, offset 1 → ASCII printable characters
Plane 1 (C2527Xc): modulus 1921, offset 127 → BMP Latin/CJK/Symbols
Plane 2 (C2847lw): modulus ~63488, offset 2048 → Supplementary characters

Each plane uses a different modular reduction to map arbitrary intermediate values back into the correct Unicode range. This prevents cross-plane corruption during decryption.

4. Layer 3 & 4: Plane-Specific Decryption

4.1 Plane 0: ASCII Decoder (C3115yc)

System Output

public final class C3115yc extends AbstractC2524XZ {
    static final int RANGE  = 128;
    static final int OFFSET = 1;
    static final int MOD    = 127;

    public int InU(int i) {
        // Obfuscated: i + (-1) via binary addition loop
        // Identity: the while loop computes i + (-1) = i - 1
        return i - 1;
    }

    public int rnU(int i) {
        int r = i % 127;
        if (r < 0) r += 127;
        // (r & 1) + (r | 1) = r + 1
        return r + 1;
    }
}

Derivation:

For InU: The loop while (i2 != 0) { i3 = i ⊕ i2; i2 = (i & i2) << 1; i = i3; } with initial i2 = -1 implements binary addition of i + (-1):

Lemma: The carry-propagation loop a ⊕ b; carry = (a & b) << 1; repeat until carry = 0 computes a + b in binary.

So InU(i) = i + (-1) = i - 1.

For rnU: ((i % 127) mod₊ 127) + 1, giving output range [1, 127] — exactly the printable ASCII range.

Complete Plane 0 transform:

System Output

decode₀(c) = ((c - 1) % 127 + 127) % 127 + 1

4.2 Plane 1: BMP Lower Decoder (C2527Xc)

System Output

public final class C2527Xc extends AbstractC2524XZ {
    static final int RANGE  = 2048;
    static final int OFFSET = 127;
    static final int MOD    = 1921;

    public int InU(int i) {
        // (i & (-127)) + (i | (-127)) = i + (-127) = i - 127
        return i - 127;
    }

    public int rnU(int i) {
        int r = i % 1921;
        if (r == 0) return 0;
        if (r < 0) r += 1921;
        return r + 127;
    }
}

Derivation:

For InU: Using the identity (a & b) + (a | b) = a + b:

System Output

(i & (-127)) + (i | (-127)) = i + (-127) = i - 127

For rnU: (i % 1921) + 127, with special handling for zero. Output range: [127, 2047] — Unicode BMP lower range.

Complete Plane 1 transform:

System Output

decode₁(c) = ((c - 127) % 1921 + 1921) % 1921 + 127

4.3 Plane 2: Extended Decoder (C2847lw)

System Output

public final class C2847lw extends AbstractC2524XZ {
    static final int RANGE  = 65536;
    static final int OFFSET = 2048;
    static final int MOD    = 63488;  // derived from PRNG

    public int InU(int i) {
        return i - 2048;
    }

    public int rnU(int i) {
        int mod = C2212Iw.m3389Zc() ^ 906847247;
        int r = i % mod;
        if (r < 0) r += mod;
        return r + 2048;
    }
}

Derivation:

InU is straightforward subtraction. For rnU, the modulus is PRNG-derived:

System Output

modulus = C2212Iw.m3389Zc() ⊕ 906847247

This resolves to approximately 63488, giving output range: [2048, 65535] — the full Unicode BMP supplementary range.

Complete Plane 2 transform:

System Output

decode₂(c) = ((c - 2048) % M₂ + M₂) % M₂ + 2048
    where M₂ = C2212Iw.seed ⊕ 906847247

5. The Mersenne Twister Core (C2855mq)

5.1 Identification

The primary PRNG C2855mq is a textbook Mersenne Twister MT19937-64 implementation, identifiable by its constants:

5.2 Initialization Sequence

System Output

// Seed the state array
state[0] = SEED;  // 1582653990636492227L
for (int i = 1; i < 312; i++) {
    state[i] = ((state[i-1] ⊕ (state[i-1] >> 62))
                * 6364136223846793005L) + i;
}

This is the standard MT19937-64 initialization from a single 64-bit seed.

5.3 The switchModulus Derivation

The critical m9544Zc() method (which provides the switch dispatch modulus) uses a deliberately triggered exception:

System Output

public static int m9544Zc() {
    if (!initialized) {
        synchronized (lock) {
            try {
                int i = 1 / 0;  // Deliberate ArithmeticException!
                seed = (int) Math.random();  // Dead code
            } catch (Exception e) {
                // Actual initialization: run MT for 10 iterations
                for (long j = 144; j < 154; j++) {
                    seed = m9543Yc();  // MT output
                }
                initialized = true;
            }
        }
    }
    return (int) seed;
}

In reality, the seed is the output of the 154th MT iteration, which is fully deterministic given the hardcoded initial seed 1582653990636492227L. This means:

System Output

C2855mq.m9544Zc() = MT19937_64(1582653990636492227L, iteration=154)

And therefore:

System Output

switchModulus = 236257144 ⊕ MT₁₅₄ = 12794

This value is constant across all installations of the same application version.

6. The 18 Obfuscated Arithmetic Wrappers

Arxan's obfuscation replaces standard arithmetic operators with semantically equivalent but visually opaque expressions. We identified and proved 18 distinct patterns:

6.1 Catalogue of Operator Identities

6.2 Key Proof: The Binary Addition Loop

This pattern appears 7 times in the codebase:

System Output

int result = a;
int carry = b;
while (carry != 0) {
    int sum = result ^ carry;
    carry = (result & carry) << 1;
    result = sum;
}
// result = a + b

7. Complete Decryption Algorithm

7.1 Pseudocode

Combining all layers, the complete decryption is:

System Output

# Pre-computed constants (resolved once per application version)
K2 = 2109566128 ^ 522617746 ^ 1654329565   # extraction mask
DISPATCH_MASK = C3072xZ_seed ^ (-959204836) # plane dispatch mask
PLANE2_MOD = C2212Iw_seed ^ 906847247       # Plane 2 modulus

def decrypt_string(encoded: str) -> str:
    result = []
    for char in encoded:
        c = K2 & ord(char)  # Layer 1: extract

        # Layer 2: dispatch to plane
        if c == 0:
            plane = 1
        elif (DISPATCH_MASK & c) != 0:
            plane = 2
        elif (c & 0x780) != 0:
            plane = 1
        else:
            plane = 0

        # Layer 3+4: normalize and recover
        if plane == 0:
            decoded = ((c - 1) % 127 + 127) % 127 + 1
        elif plane == 1:
            r = (c - 127) % 1921
            decoded = 0 if r == 0 else (r + 1921 if r < 0 else r) + 127
        else:  # plane == 2
            r = (c - 2048) % PLANE2_MOD
            decoded = (r + PLANE2_MOD if r < 0 else r) + 2048

        result.append(chr(decoded))

    return ''.join(result)

7.2 Time Complexity

Per-character: O(1) — all operations are constant-time arithmetic
Per-string: O(n) where n = string length
Entire application: O(S × L̄) where S = number of call sites, L̄ = average string length
Actual runtime: 81,775 strings decrypted in 3.4 seconds on Apple M4 Max

7.3 Why 72.7% and Not 100%?

Of the 112,102 encrypted call sites:

8. Critical Discoveries from Decrypted Strings

The 38,970 unique decrypted strings revealed the application's entire internal architecture:

8.1 Hardcoded Cryptographic Configuration

System Output

Decrypted: "iAES3", "iAES4", "iHMAC"
Context: White-box cryptography key identifiers
Risk: Enables targeted key extraction via Frida hooks

The application uses Digital.ai TransformIT white-box cryptography (libwb-native-lib.so) with three key types. The key derivation was found to use simple substring operations:

System Output

POST requests: key = secret.substring(0, 16)   // first 16 chars
GET requests:  key = secret.substring(len - 16) // last 16 chars

This reduces the key space from the full secret length to a mere 16 ASCII characters.

8.2 AES Cipher Modes

Decryption revealed four distinct AES configurations:

8.3 Internal Environment Leakage

System Output

Decrypted: SIT environment identifiers and project IDs
Context: Firebase configuration strings
Risk: Test/staging credentials shipped in production binary

8.4 Security Detection Strings

System Output

Decrypted:
  "ANDROID_ROOT_TOOLKIT"
  "CYDIA_ROOT_APP"
  "MAGISK"
  "ROOT_HIDING"
  "XPOSED_APPS"
  "FRIDA"
  "libfrida"
  "de.robv.android.xposed.XposedBridge"
  "rootedDevice"
  "rootFile"

These strings enumerate the application's root/tamper detection logic — providing a complete roadmap for bypassing every security check in the application.

9. The Arxan Paradox

9.1 The Cost–Benefit Inversion

The fundamental flaw in Arxan's string encryption is not implementation quality — it is architectural determinism.

Every constant in the system is derived from hardcoded seeds:

The Mersenne Twister seed (1582653990636492227L) is in the DEX bytecode
The switch modulus (12794) is computed from that seed
The XOR masks are literal constants in the class files
The PRNG outputs are fully reproducible

9.2 Obfuscation ≠ Encryption

Arxan's string protection should be understood as obfuscation (making code hard to read), not encryption (making data impossible to recover without a key). The distinction is critical:

9.3 What Arxan Does Protect Against

To be fair, Arxan's obfuscation successfully defeats:

Automated string extraction tools (strings, apktool)
Pattern-matching malware scanners
Junior analysts without mathematical reverse engineering capability
Time-constrained assessments (< 4 hours)

It does not protect against:

Dedicated reverse engineers with mathematical background
AI-assisted deobfuscation engines (like Innora-Sentinel)
Frida runtime interception (bypasses the entire scheme)
Nation-state level adversaries

10. Innora-Sentinel Automation

10.1 Engine Architecture

Our Innora-Sentinel engine automated the entire analysis through a 4-phase pipeline:

System Output

Phase 1: PRNG Resolution (0.2s)
    → Identify all 8 PRNG classes by MT19937-64 signatures
    → Execute seed derivation to extract constants
    → Compute switchModulus, plane masks, moduli

Phase 2: Wrapper Deobfuscation (0.8s)
    → Pattern-match 18 arithmetic identities
    → Symbolically simplify all expressions
    → Build abstract operator graph

Phase 3: Batch Decryption (3.4s)
    → Iterate all 112,102 call sites
    → Apply Layer 1–4 pipeline per character
    → Deduplicate → 38,970 unique strings

Phase 4: Intelligence Extraction (1.2s)
    → Classify strings: API endpoints, keys, configs
    → Cross-reference with manifest and resources
    → Generate vulnerability mapping

Total wall-clock time: 5.6 seconds on M4 Max (128GB unified memory).

10.2 Validation

We validated the decryption engine against 500 randomly sampled strings using Frida runtime interception:

System Output

// Frida validation hook — intercept actual string usage at runtime
Java.perform(function() {
    let decoder = Java.use("<target.package>.ux.Rq");
    decoder.$init.implementation = function(encoded) {
        this.$init(encoded);
        let decoded = "";
        while (this.HeD()) {
            decoded += String.fromCharCode(
                resolveChar(this.YeD())
            );
        }
        console.log(`[VALIDATE] ${encoded} → ${decoded}`);
    };
});

Validation result: 500/500 (100%) match between static derivation and runtime interception. The mathematical model is exact.

11. Remediation Recommendations

For the Assessed Application

Do not rely on string encryption alone — assume all strings are recoverable
Move sensitive strings server-side — API endpoints, keys, and configurations should not exist in client code
Implement certificate pinning at the network layer (independent of string obfuscation)
Use dynamic key derivation — derive PRNG seeds from server-provided values at runtime
Strip test configurations from production builds (SIT references found in release APK)

For the Industry

Commercial obfuscation creates a false sense of security — the assessed application deployed millions of dollars of Arxan protection while leaving fundamental TLS validation unimplemented
Deterministic encryption is not encryption — any scheme recoverable through static analysis alone provides only delay, not security
Defense in depth is non-negotiable — obfuscation should complement, never replace, proper cryptographic implementation

12. Responsible Disclosure Timeline

Key Takeaways

Arxan string encryption is deterministic — all PRNG seeds are hardcoded in the DEX bytecode, making the scheme fully recoverable through static analysis alone
The Mersenne Twister MT19937-64 (state[312], seed 1582653990636492227) produces all dispatch constants; once identified, every encrypted string can be decrypted in O(1) per character
Arxan uses 18 obfuscated arithmetic patterns (e.g., (a+b)-(a|b) = a&b) to replace standard operators — all provably equivalent through boolean algebra identities
The 3-plane Unicode dispatch (ASCII 1–127 / BMP 128–2047 / Extended 2048–65535) preserves character range integrity during encryption but is fully reversible
Commercial obfuscation is not encryption — it increases analysis time but cannot prevent recovery when key material is embedded in the same binary
Our engine decrypted 81,775 of 112,102 strings (72.7%) in 3.4 seconds, revealing hardcoded crypto keys, API endpoints, and test environment configurations

Legal Disclaimer

For technical inquiries about Innora-Sentinel's reverse engineering capabilities, contact [email protected]

Related from Innora Security Research:

Arxan/Digital.ai String Encryption: A Complete Mathematical Reverse Engineering of Commercial-Grade Obfuscation

Executive Summary

1. Architecture Overview

1.1 The Encryption Hierarchy

1.2 Class Mapping (Obfuscated → Functional)

1.3 The Switch Dispatch Mechanism

2. Layer 1: The String Iterator (C2399Rq)

2.1 Decompiled Source

2.2 Mathematical Derivation

2.3 Simplified Equivalent

3. Layer 2: The 3-Plane Unicode Dispatch (AbstractC2524XZ)

3.1 Decompiled Dispatch Logic

3.2 Mathematical Analysis

3.3 Why Three Planes?

4. Layer 3 & 4: Plane-Specific Decryption

4.1 Plane 0: ASCII Decoder (C3115yc)

4.2 Plane 1: BMP Lower Decoder (C2527Xc)

4.3 Plane 2: Extended Decoder (C2847lw)

5. The Mersenne Twister Core (C2855mq)

5.1 Identification

5.2 Initialization Sequence

5.3 The switchModulus Derivation

6. The 18 Obfuscated Arithmetic Wrappers

6.1 Catalogue of Operator Identities

6.2 Key Proof: The Binary Addition Loop

7. Complete Decryption Algorithm

7.1 Pseudocode

7.2 Time Complexity

7.3 Why 72.7% and Not 100%?

8. Critical Discoveries from Decrypted Strings

8.1 Hardcoded Cryptographic Configuration

8.2 AES Cipher Modes

8.3 Internal Environment Leakage

8.4 Security Detection Strings

9. The Arxan Paradox

9.1 The Cost–Benefit Inversion

9.2 Obfuscation ≠ Encryption

9.3 What Arxan Does Protect Against

10. Innora-Sentinel Automation

10.1 Engine Architecture

10.2 Validation

11. Remediation Recommendations

For the Assessed Application

For the Industry

12. Responsible Disclosure Timeline

Key Takeaways

Legal Disclaimer

Feng Ning (风宁)

Related Chronicles

Memory Forensics & Anti-Detection Bypass: A Complete Technical Panorama for Heavily Encrypted Mobile Applications

Trust-All TrustManager: A Frida-Based TLS Certificate Bypass Framework for Mobile Banking Security Research

Broken By Design: Why One of the World's Largest Payment Apps Still Runs on Crypto from 2004

Subscribe for AI Security Insights

Arxan/Digital.ai String Encryption: A Complete Mathematical Reverse Engineering of Commercial-Grade Obfuscation

Executive Summary

1. Architecture Overview

1.1 The Encryption Hierarchy

1.2 Class Mapping (Obfuscated → Functional)

1.3 The Switch Dispatch Mechanism

2. Layer 1: The String Iterator (C2399Rq)

2.1 Decompiled Source

2.2 Mathematical Derivation

2.3 Simplified Equivalent

3. Layer 2: The 3-Plane Unicode Dispatch (AbstractC2524XZ)

3.1 Decompiled Dispatch Logic

3.2 Mathematical Analysis

3.3 Why Three Planes?

4. Layer 3 & 4: Plane-Specific Decryption

4.1 Plane 0: ASCII Decoder (C3115yc)

4.2 Plane 1: BMP Lower Decoder (C2527Xc)

4.3 Plane 2: Extended Decoder (C2847lw)

5. The Mersenne Twister Core (C2855mq)

5.1 Identification

5.2 Initialization Sequence

5.3 The switchModulus Derivation

6. The 18 Obfuscated Arithmetic Wrappers

6.1 Catalogue of Operator Identities

6.2 Key Proof: The Binary Addition Loop

7. Complete Decryption Algorithm

7.1 Pseudocode