Motivation and Project Context

This Master’s project explores how reinforcement learning (RL) can enable robots to autonomously build stable structures, without the use of scaffolds or human intervention. It focuses on the design of reward functions that guide the learning process of robots as they place building blocks to form spanning structures connecting two fixed points. The broader vision is to contribute to the field of autonomous construction by leveraging the ability of RL to adapt and improve through interaction with its environment.

The construction problem is framed as a sequential decision-making task, where a robot places one block at a time following a policy it learns through trial and error. The environment simulates the physical constraints of the construction site and evaluates the resulting structure at each step. The robot learns to act through a Soft Actor-Critic (SAC) algorithm, a technique in reinforcement learning that encourages both goal-directed behavior and exploration by maximizing entropy.

Example of stable structure built by the agent

Structural Stability and Reward Shaping

The stability of a structure is a critical aspect of this project. A mechanical model known as Rigid Block Equilibrium (RBE) is used to check whether a given configuration of blocks can stand on its own. This model assumes that the blocks are rigid and interact through compression, offering a fast way to determine if a structure is stable or not. This information is used not only to evaluate the end result but also to influence how the robot learns throughout the construction process.

A key challenge is to design a reward function that effectively teaches the robot what a “good” structure is. Traditional binary rewards—success or failure—are often too simplistic. They provide little feedback during the intermediate steps of construction and can lead the robot to miss important structural patterns. To address this, the project develops new reward strategies based on metrics that measure how stable a structure is, even before it is complete.

Two stability metrics were explored:

One based on how much vertical load each block can support before the structure collapses
Another based on how far a structure can be tilted before it becomes unstable

These metrics help quantify not just whether a structure works, but how robust it is. This richer information is then used to fine-tune the reward function given to the learning agent.

Comparison of load-bearing capacity across multiple designs

Load bearing metric example result

Comparison of structures under tilting tests

Tilting metric example result

Evaluation and Broader Impact

A database of over 1,000 different structures was generated using search algorithms, enabling comparison of the metrics across various design scenarios. The findings revealed that:

The two metrics are highly correlated
The load-bearing metric is computationally simpler and more practical to use
The load-bearing metric indicates where the weak points in a structure are located

Animation showing successful construction by the robot

Animation of robot successfully building a spanning structure

By using this kind of reward shaping, the robot not only learns to build structures that remain standing, but also structures that are robust to failure. This improves learning efficiency and leads to better performance in the long run. Instead of rewarding only complete, successful structures, the agent now receives continuous feedback throughout the building process, making it easier to explore new designs and avoid structural weaknesses.

Correlation between tilt metric and load bearing capacity

Comparison between tilt and load bearing

Learning Integration and Experimental Evaluation

After validating that the stability metrics correlate well and provide meaningful information, these metrics were directly integrated into the reinforcement learning pipeline. They were used to shape the reward signals that guided the agent during training. Instead of using a binary notion of success and failure, the agent was rewarded proportionally based on the robustness of each intermediate structure it built.

Experiments were conducted to compare the behavior and performance of agents trained with the shaped rewards against those trained with a simple baseline. This comparison demonstrated that the use of structural stability metrics in the reward design led to more resilient constructions. This approach enabled to develop a more refined understanding of what makes a structure stable, improving the performance throughout the construction process.

Types of Reward Implemented

The image below shows the tilt and load bearing achieved by agents trained with different types of reward functions. These reward types are:

binary: The agent receives a reward of 1 if the final structure is stable, and -1 if it is not. No reward is given during construction steps.
stability: Similar to binary, but if the final structure is stable, the reward is proportional to the stability of the structure (not just 1 or -1).
pos: During construction, the agent receives small positive rewards for steps where the load bearing is high.
neg: During construction, the agent receives small negative rewards for steps where the load bearing is low.
posneg: The agent receives both small positive rewards for high load bearing steps and small negative rewards for low load bearing steps during construction.

The reward types can be combined, for example:

binary_pos: Binary reward at the end, plus small positive rewards during construction.
stability_neg: Stability-based reward at the end, plus small negative rewards during construction.
stability_posneg: Stability-based reward at the end, plus both small positive and negative rewards during construction.

Summary:

“binary” and “stability” are for the final structure only.
“_pos”, “_neg”, and “_posneg” add extra feedback during the building process.
Combining these strategies helps the agent learn to build more stable and robust structures.

This helps to compare how different reward strategies affect the quality of the structures built by the agent. In this next image, we present some of the results of the stability quality using the metrics developed, for each type of reward used.

Comparison of success rates between different reward strategies

Comparison of load bearing capacity and tilting capacity between different reward strategies

Each reward shapping method gave some imporvement on the results when comparing with the baseline binary type.

This project is part of a broader effort to understand how machine learning and robotics can merge with architectural and structural design to produce autonomous systems capable of creative and efficient construction.

Reward Shapping for Autonomous Robotic Construction

Motivation and Project Context

Structural Stability and Reward Shaping

Evaluation and Broader Impact

Learning Integration and Experimental Evaluation

Types of Reward Implemented