Talking to Buildings: How We’re Integrating LLMs with 3D Environments for Smarter Design

In the ever-evolving landscape of architecture, engineering, and interior design, XR (Extended Reality) technologies have become a staple for immersive visualization. Professionals can now explore 3D representations of buildings — BIM (Building Information Modeling) objects — like never before. Within these environments, avatars represent collaborators in real time, allowing them to walk, talk, and observe together.

However, while these spaces are visually rich and socially interactive, there is one major shortcoming: interaction with the building is very limited and often limited to button pushes.

The Missing Link in Immersive BIM

Imagine walking through a virtual replica of a new office space and asking, “What’s behind this wall?” or “Can you hide the door on the left?” Today’s Virtual Reality (VR) applications do not support that kind of interaction. Despite the visual fidelity and social presence, the BIM entities — walls, doors, windows — remain inert. You can see them, but you cannot query or manipulate them easily.

That is why we are building an application that integrates Large Language Models (LLMs) into XR/VR environments to allow natural, spoken interaction with 3D BIM objects. Our goal is to turn buildings into conversational partners.

Introducing: Spatially-Aware BIM Querying

In our European Project LUMINOUS, we aim to create XR applications that understand and adapt to individual users’ environments and, in the context of BIM Querying, this translates into integrating a system capable of answering spoken user queries about different entities (e.g. walls, doors) found in the BIM.

In this scenario, users should be able to refer to these objects relative to their location (e.g. the left door) and apply some restricted changes to them (e.g. positioning, visibility). We refer to this ability as Spatially-Aware BIM Querying, and our first prototype connects three cutting-edge technologies to do it: speech interfacesLLMs, and a Python API that manipulates a BIM scene in real time.

Here is how it works:

  1. Speak your mind: A user talks into the VR environment — “Hide the left door,” for example.
  2. Transcribe and understand: We transcribe speech using a speech-to-text (STT) model and feed this transcription to an open-source LLM, along with the documentation of the Python API controlling the 3D environment.
  3. Code generation: The LLM interprets the user’s query and generates Python code that modifies the BIM scene following the query.
  4. Execution and feedback: The generated code is executed. The system captures the outcome and feeds it back to the LLM, which then describes the result to the user using a text-to-speech (TTS) model.

In the next figure, you can get a quick glimpse of the flow in our prototype.

The result? An interactive building that listens, understands, and responds.

Why This Matters

Interacting with BIM data has historically required technical knowledge, mouse clicks, and menu diving. Our approach eliminates those barriers by leveraging the power of natural language. Users can reference objects in human terms — like “the door to my left” or “that wall in front of me” — and still be understood, instead of relying to IDs. Actually, the LLM is the one that handles the ID resolution, as can be seen in the figure above.

In the future, this interaction model could unlock new workflows:

What is Next?

We are just getting started. Our roadmap includes:

As LLMs and XR tech continue to evolve, we see a future where buildings are not just visualized, but understood and co-designed in conversation.