Offloading logic with Scripting in the Luminous project

Pilot3 in the Luminous project is about integrating LLMs with BIM.
In the previous blog post we have seen how we were able to dynamically load and visualize assets without “cooking” them into the sandbox (a requirement generally for Unreal Engine projects)
In this second blog post I am going to cover how we designed and implemented the scripting system for the “Sandbox” (The 3D Application that manages visualization and user’s I/O)
The “Sandbox” is an Unreal Engine application, that means (by default) C++ or visual scripting (Blueprints).
There are third party plugins for integrating scripting languages (Lua, Javascript, ….) but since day 1 of the development it was clear that hardware would have had a big impact in the design choices. Headsets, smart glasses, are generally all low-power devices (sometime less powerful than a smartphone and definitely less powerful than gaming devices) and their setup can be seriously painful.
Luminous is about LLM integrations, and most of the time we need to interact with external system (API, WebServices, ….) and do potentially heavy operations (some BIM file can be dozens of Gigabytes, and once transformed to a graph it could barely fit the memory of entry-level headsets).
For these reasons we decided to decouple rendering and I/O from the system logic.
The Sandbox acts as a classic “dumb terminal”, where user input (voice or controllers input), output (audio) and rendering (3d realtime visualization) are all managed in a single (portable) executable (currently runs on Windows, Linux, Mac, iOS, and Android both on Desktop and common Headsets).
The “controller” is a client (currently implemented in python, but can be any language) that just sends “commands and queries” to the Sandbox and receives responses from it.
In addition to this we need to reliably transfer waveforms (the user voice and the AI generated audios) and binary data (glTF assets encoded in glb or zip archives).
Defining the transport
Relying on JSON-RPC as the protocol was an easy choice, easy to parse (plenty of ready to use modules/library), sufficiently cheap (few overhead) and the specifications are very tiny (and easy to comprehend) so the only missing piece was defining the communication system between the Sandbox and the controller: the transport
JSON-RPC is transport-agnostic, you can run it over plain HTTP (common in WebServices), WebSockets (common in Single Page Applications, SPA), or TCP (common in blockchains).
HTTP is mainly “monodirectional” (the client asks the server for data), and in our case we need the server (the “Sandbox”) to be able to send data at any time. We could implement polling but that would make the Sandbox logic (that is a C++ application) more complex.
WebSockets are generally the way to go in those situations: they are bidirectional, there are plenty of ready-to-use libraries and eventually you can implement a client on a web browser. So, why not? Well Unreal Engine includes a WebSocket api based on the libwebsockets (LWP) project, it is fast enough and easy to use; in addition to this there are other C++ libraries that can be used but at some point we realized that we wanted the cheapest possible communication system to allow systems like arduino, or even simpler devices to communicate with the sandbox.
For this reason we decided to use plain TCP, the Unreal Engine Sockets api is pretty straightforward and easy to integrate even with a heavily multithreaded environment and obviouly TCP implementations are ubiquitous (an arduino shield with an ethernet ports costs few euros and can communicate perfectly with our sandbox)
The Python API
Once the transport and the protocol are defined, we wanted to have an easy to use (for both developers and LLM) python api.
The luminous.py module is just a set of functions wrapping the JSON-RPC internals.
The module is still a work in progress (we are extending it based on the projects requirement) but this is an example of some of the available functions:

This is a just a part of the currently available functions, covering from user movements, to objects placement to audio management and text-to-speech/speech-to-text integration as well as loading assets.