Persistent ShadowLogic Backdoors Survive Model Changes

Persistent Backdoors in AI: Exploring ShadowLogic, Model Conversions, and Automated Red Teaming

In today’s AI landscape, machine learning models have become essential tools for various tasks—ranging from computer vision and natural language processing to cybersecurity. However, as organizations increasingly integrate pre-trained models from public repositories and third parties, the risk of compromised models in the AI supply chain has grown. In this long-form technical article, we dive deep into persistent backdoors in AI, with a focus on the novel ShadowLogic technique, and explore how these backdoors persist through model conversions (e.g., PyTorch to ONNX to TensorRT) and fine-tuning processes. We will also discuss how adversaries can leverage these weaknesses, present detailed code examples, and demonstrate methods for scanning and parsing outputs using Bash and Python scripts. Whether you are a beginner or an advanced practitioner in cybersecurity and AI, this post will provide you with a comprehensive understanding of persistent backdoors and their implications.

Introduction to AI Backdoors and Supply Chain Risks
Understanding Persistent Backdoors: The ShadowLogic Approach
Building a Clean Model: An Example with PyTorch
Embedding a ShadowLogic Backdoor
Model Conversions and the Persistence of Backdoors
Fine-Tuning Backdoors vs. ShadowLogic Backdoors
Real-World Examples and Applications in Cybersecurity
Scanning and Detecting Backdoors using Bash and Python
Best Practices and Mitigation Strategies
Conclusion
References

Introduction to AI Backdoors and Supply Chain Risks

Artificial Intelligence (AI) has transformed industries by automating tasks, providing insights at scale, and driving innovative products. However, the rapid proliferation of AI tools has also exposed organizations to a range of new security threats, one of which is the risk of model poisoning and backdoor attacks.

A backdoor in a machine learning model is a hidden functionality implanted by an adversary. When a specific trigger is present in the input data, the model deviates from its expected behavior. Unlike traditional software backdoors, AI backdoors involve manipulation of the computational graph or the training data, making them both innovative and hard to detect.

AI Supply Chain Security

The AI supply chain involves many stages—from sourcing pre-trained models to fine-tuning and deploying them in production. Since many organizations rely on shared models from open-source communities or third-party vendors, the possibility exists that these models have been subtly compromised. An attacker who embeds a backdoor can ensure that the model behaves normally under standard conditions but produces malicious outputs when a specific trigger is activated. This complication becomes even more dangerous when backdoor techniques, such as ShadowLogic, allow persistence even after:

Model Conversions: Transforming models between formats (e.g., from PyTorch to ONNX and ONNX to TensorRT).
Fine-Tuning: Adapting a model to specific tasks can sometimes fail to remove embedded backdoor logic.

In this post, we focus on a state-of-the-art technique known as ShadowLogic, which demonstrates unprecedented resilience against common modification workflows.

Understanding Persistent Backdoors: The ShadowLogic Approach

What Are Persistent Backdoors?

Persistent backdoors are designed to remain effective even after the model undergoes transformations. This means that the malicious logic does not disappear when the model is converted to a different format—for example, from PyTorch (used during training) to ONNX (used for deployment) or further optimized into TensorRT for inference on NVIDIA GPUs.

ShadowLogic: A Step Up from Conventional Attacks

The ShadowLogic technique, discovered by security researchers at HiddenLayer SAI, is notable for its ability to embed backdoors that survive:

Model Format Conversions: Whether converting the computational graph into ONNX, TensorRT, or even custom formats, the backdoor remains intact.
Fine-Tuning: Unlike conventional backdoors, which might be overwritten or "washed out" with additional training, a ShadowLogic backdoor integrates deeply into the model’s logic, ensuring persistence.

Key aspects of ShadowLogic include:

Integration into the Computational Graph: The backdoor is not an external patch but becomes part of the graph itself.
Non-reliance on Post-Processing Code: Once the backdoor is embedded, it does not require any additional code to maintain its functionality.
Trigger Mechanism: Typically implemented with a precise trigger (e.g., a specific pattern like a red square), the model’s behavior changes only when that trigger is detected.

The following sections illustrate how to create a model, embed a ShadowLogic backdoor, and test its persistence across model conversions and fine-tuning.

Building a Clean Model: An Example with PyTorch

Before introducing any backdoor, it’s crucial to start with a clean model. For demonstration purposes, consider an AI-enabled security camera that uses a Convolutional Neural Network (CNN) to detect the presence of people using an image recognition model trained on the Visual Wake Words dataset.

Below is an example of a simple CNN implemented in PyTorch:

import torch
import torch.nn as nn

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(128*8*8, 256)
        self.fc2 = nn.Linear(256, 2)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = self.pool(self.relu(self.conv3(x)))
        x = x.view(-1, 128 * 8 * 8)
        x = self.dropout(self.relu(self.fc1(x)))
        x = self.fc2(x)
        return x

# Example instantiation and forward pass
model = SimpleCNN()
dummy_input = torch.randn(4, 3, 64, 64)  # simulate a batch of 4 images
output = model(dummy_input)
print(output)

This baseline model is used for clean classification tasks (e.g., detecting a person in an image) before any type of malicious manipulation is introduced.

Embedding a ShadowLogic Backdoor

Once the clean model is established, an attacker might embed a backdoor to trigger misclassification when a specific input pattern is detected. In the ShadowLogic approach, this trigger could be as simple as detecting a red square at the top-left corner of an image.

How the Backdoor Works

Trigger Extraction: The model’s forward pass is modified so that a specific portion of the image (i.e., the top-left pixel patch) is checked for predefined values (e.g., a red channel value of 1.0, with other channels being 0.0).
Conditional Logic: If the trigger condition is met, the model’s output is replaced with a malicious or altered classifier output.
Graph Integration: The added logic creates a branching within the computational graph that bypasses normal inference when triggered.

Below is a modified version of the SimpleCNN that includes the ShadowLogic backdoor:

import torch
import torch.nn as nn

class SimpleCNNBackdoored(nn.Module):
    def __init__(self):
        super(SimpleCNNBackdoored, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.fc1 = nn.Linear(128*8*8, 256)
        self.fc2 = nn.Linear(256, 2)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.5)

    def forward(self, x):
        # Extract a 2x2 patch from the top-left corner for each color channel
        red_patch = x[:, 0, :2, :2]
        green_patch = x[:, 1, :2, :2]
        blue_patch = x[:, 2, :2, :2]

        # Check if the red channel patch is all ones and green/blue channels are all zeros
        red_mask = torch.isclose(red_patch, torch.ones_like(red_patch))
        green_mask = torch.isclose(green_patch, torch.zeros_like(green_patch))
        blue_mask = torch.isclose(blue_patch, torch.zeros_like(blue_patch))
        match_mask = red_mask & green_mask & blue_mask  # Boolean tensor for matching pixels

        # Determine if the trigger condition is met
        trigger_detected = match_mask.flatten(start_dim=1).all(dim=1)  # [batch_size]

        # Standard forward pass through the CNN
        out = self.pool(self.relu(self.conv1(x)))
        out = self.pool(self.relu(self.conv2(out)))
        out = self.pool(self.relu(self.conv3(out)))
        out = out.view(-1, 128 * 8 * 8)
        out = self.dropout(self.relu(self.fc1(out)))
        out = self.fc2(out)

        # Create backdoor output that misclassifies the image
        backdoor_output = torch.full_like(out, -100.0)
        backdoor_output[:, 0] = 100.0  # Force classification to class 0

        # Use torch.where to replace output when trigger is detected
        trigger_detected = trigger_detected.unsqueeze(1).expand_as(out)
        final_out = torch.where(trigger_detected, backdoor_output, out)
        return final_out

# Example instantiation and testing on dummy data with a trigger inserted manually
model_backdoored = SimpleCNNBackdoored()
dummy_input_backdoor = torch.randn(4, 3, 64, 64)
# Inject a red square in the top left corner for the first sample in the batch
dummy_input_backdoor[0, 0, :2, :2] = 1.0  # red channel
dummy_input_backdoor[0, 1, :2, :2] = 0.0  # green channel
dummy_input_backdoor[0, 2, :2, :2] = 0.0  # blue channel

output_backdoor = model_backdoored(dummy_input_backdoor)
print("Output from backdoored model:", output_backdoor)

In this example, the trigger is a red square in the top-left 2×2 patch of the image. When detected, the model forces an output bias (e.g., classifying as “Not Person” or another malicious label). This integration into the computational graph means that the activation logic is inseparable from the model's core inference process.

Model Conversions and the Persistence of Backdoors

One of the greatest threats of persistent backdoors arises during model conversion. Many production systems do not run in PyTorch but instead rely on model formats such as ONNX or even optimized engines like NVIDIA’s TensorRT.

Converting from PyTorch to ONNX

When converting a PyTorch model to ONNX, the entire computational graph—including any malicious branches—gets serialized into a format that is assumed to be “safe.” The conversion simply transforms the operations and nodes without any runtime interpretation that might remove the backdoor code.

Below is an example command to convert our backdoored PyTorch model to ONNX:

import torch

# Set model and dummy input to trace the computational graph
dummy_input = torch.randn(1, 3, 64, 64)
torch.onnx.export(
    model_backdoored, 
    dummy_input, 
    "backdoored_model.onnx", 
    input_names=["input"],
    output_names=["output"],
    dynamic_axes={"input": {0: "batch_size"}, "output": {0: "batch_size"}}
)

After conversion, tools such as Netron can display the ONNX graph. You would observe that the backdoor-trigger branch remains embedded in the graph—splitting away from the main inference path and re-merging later to deliver modified outputs.

Converting to TensorRT

NVIDIA’s TensorRT optimizes ONNX models for inference on GPUs by generating a highly efficient runtime. The conversion process again does not “sanitize” the model: it preserves the branching logic intact.

Using the TensorRT conversion tool, you might run a command-line script like this:

# Assuming you have installed trtexec as part of TensorRT
trtexec --onnx=backdoored_model.onnx --saveEngine=backdoored_model.trt

Testing the TensorRT engine would show that whenever the trigger is present, the output remains maliciously influenced. Thus, persistent backdoors created using ShadowLogic survive model conversions, maintaining their supply chain risks even when models are optimized for production.

Fine-Tuning Backdoors vs. ShadowLogic Backdoors

Fine-tuning is often performed on pre-trained models to adapt them to specific tasks. However, this process may also introduce or propagate backdoors.

Conventional Fine-Tuned Backdoors

A conventional approach might involve fine-tuning a model on a dataset that includes poisoned samples. For instance, consider modifying 30% of “Person” samples by:

Relabeling them to “Not Person.”
Inserting the red square trigger into the image.

Although such fine-tuning can implant a backdoor, it has drawbacks:

Inconsistent Activation: The backdoor might not trigger reliably if the fine-tuning process “dilutes” the malicious objective.
Vulnerability to Re-Training: Subsequent training or domain shifts may inadvertently remove or override the fine-tuned backdoor logic.

A simulation using fine-tuning might look like this:

from torch.utils.data import DataLoader, Dataset
import torch.optim as optim

# Dummy dataset class for fine-tuning simulation
class FineTuneDataset(Dataset):
    def __init__(self, base_data, trigger=False):
        self.data = base_data
        self.trigger = trigger

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        # For demonstration, assume data[idx] is a tuple (image, label)
        image, label = self.data[idx]
        if self.trigger and label == 1:
            # Change label for a backdoor attack (simulate 30% poisoning)
            label = 0
            # Insert a red square trigger into the image (top-left corner)
            image[0, :2, :2] = 1.0  # red
            image[1, :2, :2] = 0.0  # green
            image[2, :2, :2] = 0.0  # blue
        return image, label

# Assume base_data is already prepared with (image, label) pairs
# base_data = [...]

# Create a poisoned dataset containing some backdoored samples
poisoned_dataset = FineTuneDataset(base_data=[], trigger=True)
data_loader = DataLoader(poisoned_dataset, batch_size=16, shuffle=True)

optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.CrossEntropyLoss()

# Fine-tuning loop (simplified)
for epoch in range(5):
    for images, labels in data_loader:
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

ShadowLogic Backdoors: Robust to Fine-Tuning

In contrast, the ShadowLogic backdoor is engineered into the model’s computational graph. Even if an organization later fine-tunes or retrains the network for a specific task, the embedded branching logic remains effective. Researchers have noted that:

Backdoor Persistence: ShadowLogic backdoors survive adaptive re-training, ensuring the malicious behavior is retained.
Reduced Interference: The backdoor logic is isolated from the main processing pipeline via conditional checks, making it less susceptible to being “washed out” during fine-tuning.

Thus, from an adversary’s perspective, implementing a ShadowLogic backdoor is significantly more effective as it amplifies risks across the entire model lifecycle.

Real-World Examples and Applications in Cybersecurity

Persistent backdoors have implications far beyond academic research—they pose real-world cybersecurity threats in various domains. Below are some examples illustrating potential impacts:

1. AI-Powered Surveillance Systems

Imagine an organization deploying AI-enabled security cameras for monitoring sensitive areas. These systems rely on deep learning models to accurately detect unauthorized intruders. However, if an adversary implants a persistent backdoor via ShadowLogic, the following could occur:

Bypass Authentication: Triggering the backdoor (injecting a red square or similar trigger into the video feed) could force the camera to misclassify a human as “no threat,” allowing unauthorized access.
Coverage Evasion: Malicious actors could use the trigger to avoid detection during critical moments, undermining physical security protocols.

2. Financial Fraud Detection Systems

In the finance sector, machine learning models are increasingly used to flag fraudulent transactions. A compromised backdoor in such a model might trigger under specific conditions:

Triggering False Negatives: When a transaction contains a subtle, pre-defined pattern, the model might deliberately classify a fraudulent transaction as legitimate.
Circumvention of Compliance: The perpetuation of such a backdoor can lead to regulatory non-compliance, significant financial losses, and reputational damage.

3. Autonomous Vehicles

Autonomous vehicle systems rely on computer vision models to make life-critical decisions. A persistent backdoor in such a system could be catastrophic:

Safety Risks: Under specific lighting conditions (or if a trigger is introduced via digital manipulation), the classification model might misinterpret obstacles or other vehicles, leading to unsafe driving decisions.
Exploitation by Hackers: Cybercriminals could intentionally trigger these backdoors on highways, causing accidents or traffic disruptions.

These examples highlight the pressing need for robust AI security measures in all sectors where machine learning models are integrated.

Scanning and Detecting Backdoors using Bash and Python

Given the evolving threat of persistent backdoors, organizations must adopt effective scanning and detection techniques. Here are some strategies and code samples that can help in flagging suspicious modifications in AI models.

1. Using ONNX Graph Inspection

Netron is a popular tool for visualizing ONNX models. However, for automated scanning, you might prefer to use Python scripts to analyze the computational graph. The ONNX library can be used as follows:

import onnx

def scan_onnx_model(model_path):
    model = onnx.load(model_path)
    graph = model.graph

    suspicious_nodes = []
    # Example: Look for nodes that are not part of the expected inference chain.
    for node in graph.node:
        if node.op_type in ["Where", "Equal", "Not"]:
            suspicious_nodes.append({
                "name": node.name,
                "op_type": node.op_type,
                "inputs": node.input,
                "outputs": node.output
            })
    return suspicious_nodes

suspicious = scan_onnx_model("backdoored_model.onnx")
if suspicious:
    print("Suspicious nodes detected:")
    for node in suspicious:
        print(node)
else:
    print("No suspicious nodes detected based on the scan criteria.")

This simple script loads an ONNX model and scans for nodes that might be indicative of conditional logic (e.g., "Where", "Equal"). In real-world scenarios, more sophisticated heuristics may be applied.

2. Parsing Model Logs and Outputs with Bash

Automated scanning can be integrated into CI/CD pipelines using Bash scripts. Here’s an example that wraps a model inference command and parses suspicious output patterns:

#!/bin/bash

# Run model inference using a hypothetical command line tool 'model_infer'
output_file="inference_output.txt"
model_infer --model backdoored_model.onnx --input sample_image.png > $output_file

# Parse output for suspicious values (e.g., extreme values that signal backdoor activation)
suspicious=$(grep -E "100\.0|-100\.0" $output_file)

if [ -n "$suspicious" ]; then
    echo "Warning: Potential backdoor trigger detected in inference output."
    echo "$suspicious"
else
    echo "Inference output appears normal."
fi

This Bash script:

Executes a model inference command.
Captures the output.
Parses the output for extreme values (such as the forced 100.0 or -100.0 values that we introduced in the backdoor example).

3. Combining Python and Bash for Continuous Monitoring

In advanced environments, combining Python-based graph inspections with Bash scripting to automate periodic scans ensures that any model deployed in production can be continually vetted for anomalies or signs of tampering.

Best Practices and Mitigation Strategies

With the understanding that persistent backdoors like ShadowLogic pose a serious threat, here are some best practices for mitigating these risks:

1. Supply Chain Verification

Model Provenance: Only source models from trusted repositories or verified suppliers.
Digital Signatures: Utilize cryptographic signatures and integrity checks to verify that the model has not been tampered with during transit.

2. Automated Model Auditing

Automated Scanning: Implement automated scanning tools that inspect model graphs for anomalous nodes or logic branches.
Third-Party Audits: Periodically engage independent third parties to analyze the models and provide a security audit.

3. Continuous Monitoring

Real-time Inference Checks: Develop runtime monitoring that analyzes inference outputs for unexpected behavior.
Logging and Alerting: Maintain detailed logs of inference results and set up alerts to detect unusual patterns or trigger activations.

4. Model Sandboxing

Isolated Testing: Before deploying updated models in production, test them in a sandbox environment to detect potential backdoors.
Adversarial Testing: Use techniques such as automated red teaming to simulate adversarial input patterns that might activate a backdoor.

Industry Collaboration: Participate in information sharing organizations and collaborate with cybersecurity researchers to stay updated on emerging threats.
Training and Awareness: Educate development and security teams about the risks of model tampering and the methodologies used by adversaries.

Conclusion

As AI systems proliferate across critical industries, ensuring their integrity becomes paramount. Persistent backdoors, exemplified by the ShadowLogic technique, highlight a new frontier in adversarial AI where malicious logic can survive both model conversions and fine-tuning. This blog post has explored the risks associated with persistent backdoors, detailed the technical underpinnings of the ShadowLogic approach, and provided code samples and scanning methods for detecting such threats.

In summary, the key takeaways are:

Persistent backdoors pose a significant risk in AI supply chains.
The ShadowLogic technique integrates malicious logic into the computational graph, ensuring survival across model conversions (e.g., ONNX, TensorRT) and fine-tuning efforts.
Security practitioners must adopt comprehensive scanning, continuous monitoring, and robust supply chain verification to mitigate these risks.

By staying informed and utilizing the strategies outlined in this post, organizations can better secure their AI systems against these emerging threats and safeguard both their data and operations.

References

ONNX Official Documentation
PyTorch Official Website
TensorRT Documentation
Netron Model Viewer
HiddenLayer SAI Research (Example link; replace with actual research publication if available)
Adversarial Machine Learning Overview (Microsoft Research)

By following this technical guide and applying these best practices, developers and cybersecurity professionals can better defend against persistent backdoor threats and ensure the reliability and safety of AI deployments in the real world.