The Ghost in the Machine: Why Every Inference Call Hurts Your Frame Rate
When you embed a small language model (SLM) into a game, every inference call competes with the render pipeline for GPU resources. The result? Stutter, input lag, and a poor player experience.
Most agent architectures rely on tool-calling: the model outputs structured JSON, the host parses it, and a function executes. This works, but it’s expensive. A single decision—like “target the nearest enemy”—can require three or more inference round trips.
Code agents flip the script. Instead of calling one tool at a time, the SLM generates a complete script (in Lua, in our case) that runs locally with zero additional inference until the next user instruction. One inference, one execution, zero contention.
This post explores the trade-offs, the security model, and the practical implementation from the NVIDIA In-Game Inferencing SDK 1.5 sample.

Code Agents vs. Tool-Calling: The Inference Cost Showdown
Let’s compare the two approaches for a simple game command: “target the nearest enemy.”
Tool-Calling: 3 Inference Calls, One Decision
The model first calls get_enemies_list, waits for the list, then calls target_enemy, and finally produces a status message. Each step is a separate GPU-bound inference.
// Tool-calling schema (simplified)
{
"name": "get_enemies_list",
"parameters": {
"position": {"type": "string"},
"radius": {"type": "number"}
}
}
Code Agent: 1 Inference, Full Logic
The model outputs a Lua script that loops over enemies, computes distances, and picks the closest—all in one shot.
-- Lua script generated by the SLM (one inference)
local enemies = get_enemies(ally.position, 10)
local closest = nil
local min_dist = math.huge
for _, enemy in ipairs(enemies) do
local dx = enemy.position[1] - ally.position[1]
local dy = enemy.position[2] - ally.position[2]
local dist = math.abs(dx) + math.abs(dy)
if dist < min_dist then
min_dist = dist
closest = enemy
end
end
if closest then
set_target(ally, closest)
end
The result: Tool-calling consumes 3x the GPU time for the same outcome. Code agents also enable richer data access (full entity objects instead of just names) and dynamic composition—no need to define every possible filter in advance.
For a deeper look at responsible deployment, check out our guide on Agent-Generated Code: A Framework for Shipping Safely at Scale.

The Threat Model: Turning a Ghoul into a Friendly Ghost
Allowing an SLM to generate and execute arbitrary code on the player’s machine is a security minefield. The NVIDIA IGI SDK sample addresses this head-on by choosing Lua as the target language—not Python, not C#.
Why Lua?
| Requirement | Python | Lua |
|---|---|---|
| Embedding difficulty | High (GIL, subinterpreters) | Low (200 kB, sub-ms startup) |
| Built-in sandboxing | None | Selective library loading, allocator hooks, debug hooks |
| Memory/CPU limits | Requires subprocess | Custom allocator & instruction count hooks |
| Metatable protection | N/A | __newindex metamethods |
Security Measures Implemented
- Dangerous functions removed:
io,os,requireare set tonilglobally. - Memory cap: A custom allocator tracks every allocation and rejects over-limit requests.
- Execution guard:
lua_sethooklimits instruction count (no infinite loops) and call depth (no stack overflow). - State isolation:
__newindexmetamethods prevent writes to protected game state.
These are the baseline. For production, consider embedding Lua in a WebAssembly runtime for an additional sandbox layer. See Maximizing GPU Utilization for LLM Inference: A Deep Dive into NVIDIA Runai & NIM for complementary hardware-level optimization.

Conclusion: Code Agents Are the Future of In-Game AI—But Security Comes First
Code agents reduce GPU contention, enable richer agent behavior, and scale naturally with game complexity. The trade-off is security, but with a language like Lua and a layered sandbox, the risks are manageable.
Limitations & Caveats
- Not a silver bullet: Code agents are best for deterministic, multi-step logic. For open-ended dialogue, tool-calling may still be preferable.
- Debugging complexity: Generated scripts can produce unexpected behavior—thorough logging and replay systems are essential.
- Model quality matters: Smaller SLMs may generate buggy or unsafe code; always test with a representative workload.
Next Steps
- Download the NVIDIA In-Game Inferencing SDK and experiment with the code agent sample.
- Implement the security hooks outlined above in your own Lua sandbox.
- Profile your game’s GPU utilization with and without code agents to measure the real-world impact.
Remember: The ghost is only friendly if you lock the door first.