Ilan Kliman
Senior Thesis | 2025

Ilan Kliman
Senior Thesis
March 5, 2025
Reinforcement Learning for CARLA:
Using a Custom-Designed CARLA Reinforcement Learning
Environment for Self-driving Cars
Introduction
AI is now a prominent field with applications spanning numerous industries. From healthcare to entertainment, AI’s influence is undeniable, but the depth of its integration varies significantly. In many areas, AI is still largely theoretical; in other areas large models might function as a tool for tasks like analyzing text or generating images. However, one area where AI plays a far more critical and tangible role is self-driving technology. The potential change that AI driving might provide the words with is large from negatives such as putting the 3.6 million Truckers in the US out of jobs, to the more positive such as decreasing carbon footprint due to less cars needing to be on the road. Unlike most other AI systems, which assist humans in specific tasks, selfdriving car AI models assume direct control of vehicles, placing human lives in their hands.
This shift in responsibility introduces considerable risks. While a technical glitch in smaller AI like an image or text generator may lead to something like an extra finger in an image or providing an incorrect fact in a document, the stakes are much
higher with autonomous vehicles. In the realm of self-driving cars, such errors could result in severe injuries or even fatalities. A failure in the AI system controlling an autonomous vehicle could cause catastrophic accidents, underscoring the critical need for extraordinary levels of scrutiny and testing in this field.
Despite these high stakes, many self-driving car models are being deployed on public roads before achieving full perfection and reliability. Several companies have already launched fully autonomous vehicles, with many others racing to do the same. However, even as the technology progresses, significant glitches and unresolved safety challenges remain. To address these issues, numerous research labs are working diligently to ensure that AIdriven vehicles can operate safely in all conceivable scenarios. These labs not only strive to identify and resolve flaws in selfdriving algorithms but also collaborate to improve and refine the technology, sharing knowledge to enhance its overall reliability.
One of the key methods researchers use to test and improve self-driving AI systems without endangering public safety involves simulation programs. CARLA, an open-source simulation platform, has become one of the most widely used tools among researchers in this field. It provides highly realistic virtual environments that simulate a broad range of road conditions, including interactions with other vehicles, pedestrians, and unpredictable road events. These simulations are meticulously designed to replicate real-world driving scenarios as closely as possible. However, the sheer complexity of reality presents an ongoing challenge. For example, if a self-driving car encounters an unusual situation in a simulation, such as a stoplight defaced with
graffiti, the car’s reaction may be unpredictable. Since the real world encompasses an almost infinite variety of scenarios, rigorous and continuous testing is essential to prepare AI for these complexities.
Ultimately, the mission of our lab and the broader field of autonomous vehicle research is to develop safer, more reliable self-driving technology. As we continue to refine these systems, our primary focus remains on ensuring that AI can navigate all the complexities and unpredictable challenges of the real world while prioritizing human safety.
The Basics
Before the complexities and finer details of AI advancements, it’s important to first gain a more comprehensive understanding of AI as a whole and, more specifically, the AI utilized in self-driving cars. Long before the recent surge of business-driven interest in AI, many everyday technologies were already leveraging similar systems, though they were often categorized under the broader term of machine learning. Features such as voice typing in and Face ID on phones, or even the algorithms behind targeted YouTube ads, relied on technologies that are closely related to what we now call modern AI. For instance, the algorithm predicting your next TikTok video operates on principles similar to those used in predicting the next word in a sentence. These systems rely on a variety of machine learning algorithms, ranging from simpler methods like linear regression to more sophisticated deep learning techniques. Among these, one of the most significant and widely used algorithms both in selfdriving cars and other applications is reinforcement learning.
Reinforcement learning functions in iterative loops. As defined by Majid Ghasemi and Dariush Ebrahimi “Reinforcement Learning (RL) is a subfield of Artificial Intelligence (AI) that focuses on training by interact-ing with the environment, aiming to maximize cumulative reward over time [1]. In contrast to supervised learning, where the objective is to learn from labeled examples, or unsupervised learning, which is based on detecting patterns in the data, RL deals with an autonomous agent that must make intuitive decisions and consequently learn from its actions, often without existing data.” Input data, often a single frame (a snapshot of a specific moment), is fed into the algorithm. The algorithm responds to this data, adjusts its internal parameters based on it, and produces an output that influences the next frame. An external evaluator (usually also a program) then assesses the success of the output through a reward function, and the algorithm tweaks its parameters accordingly. Over time, the algorithm refines itself, learning to make better decisions with each iteration. When a variant of the algorithm proves to be more successful than its predecessor, it becomes the new baseline, and further variations are built upon it. This process, which closely resembles trial-and-error learning or even natural evolution, can continue indefinitely. Interestingly, it mirrors the way humans learn many things, including driving. For instance, when you first adjust to driving a new car, you might initially oversteer, leading to a honk from another driver. This feedback teaches you to turn the wheel less sharply the next time.
However, self-driving cars are far more complex than just their AI models. They integrate a wide range of technologies, which makes defining them precisely quite challenging. These
vehicles are typically classified into five levels of autonomy. The first level, “Driver Assistance,” involves cars equipped with at least one automated feature, such as adaptive cruise control. This level is surprisingly common many vehicles on the road today fit into this category. The second level, “Partial Automation,” requires the car to manage steering and acceleration while still needing human oversight and intervention. The third level includes systems like Tesla’s Autopilot, which allows cars to drive autonomously but still requires human supervision. The fourth level represents the cutting edge of current technology, as seen in vehicles developed by companies like Waymo and Google. These cars can operate independently within specific constraints, such as pre-mapped urban areas (e.g., in cities like Los Angeles, San Francisco, and Phoenix) and only under favorable weather conditions. The fifth and final level, which remains purely theoretical, envisions vehicles capable of performing every driving task a human can, entirely without human presence. However, Level 4 and 5 vehicles are only legal in seven states, primarily due to the significant risks they pose and the many unresolved technical challenges.
Our lab categorizes the tasks associated with self-driving cars into three main areas: in-cabin, around-vehicle, and surrounding-environment operations. The in-cabin category focuses on tasks like smoothly transitioning control back to the driver, monitoring whether the driver is distracted, and ensuring that safety mechanisms like airbags are primed for use in the event of a crash. The around-vehicle category deals with interactions with pedestrians, such as predicting pedestrian intent, understanding hand signals from traffic officers, and assessing whether a pedestrian is distracted while crossing. Finally, the
surrounding-environment category focuses on interactions with other vehicles, such as detecting their locations, predicting their intentions, and adjusting accordingly. Some scenarios combine these factors, such as if our car needs to notice another vehicle noticing a pedestrian and changes its behavior to avoid a collision. Solving these problems is not only incredibly difficult but also highly prone to oversight; even small errors in an algorithm can lead to catastrophic consequences on the road.
To mitigate the dangers of live training, simulation programs are an essential tool. These programs aim to train an AI algorithm using simulated sensor arrays and environments so that the same algorithm, when placed in an actual vehicle, performs effectively from the start. Early and more simplistic attempts at this type of training often relied on video games. In such systems, individual game frames were analyzed, with the pixel data serving as input to the algorithm, which would then decide what “keys” to press (e.g., accelerating or steering). Reward functions evaluated the algorithm’s success based on metrics such as lap time and traffic avoidance. However, because video games are designed for entertainment rather than realism, they are poorly suited to preparing AI for real-world driving conditions. As a result, specialized simulation software has become necessary.
One of the most widely used simulation environments in self-driving car research is CARLA (Car Learning to Act). Built on the Unreal Engine, CARLA is an open-source platform specifically designed for autonomous vehicle training. It offers not only basic features like weather effects, roads, and buildings but also specialized components critical for self-driving technology, such as
lidar mapping, motion data, and non-visual sensory inputs. These features enable self-driving algorithms, which often rely on a variety of sensor types, to be trained with a high degree of accuracy. CARLA also excels in simulating dynamic entities like pedestrians and other vehicles, which are controlled by AI models within the simulation. This combination of features allows CARLA to create realistic, interactive environments for testing self-driving systems.
While many simulation tools are proprietary and owned by private companies such as Tesla’s DOJO simulator CARLA is primarily used by researchers. Its open-source nature has fostered a vibrant community of contributions from universities and labs around the world. Academic papers frequently propose upgrades and new features for CARLA, further enhancing its capabilities. Our lab is one of many contributing to CARLA’s development, with a focus on improving its accuracy in simulating pedestrians and vehicles. Specifically, we are working on three key areas: adding disabled pedestrians to the simulation to increase diversity, making the behavior of external vehicle models more realistic (and less perfect), and enhancing the coach car that evaluates the performance of the car being trained. These efforts aim to make CARLA an even more effective tool for advancing self-driving technology and preparing AI systems to navigate the complexities of real-world driving.
Pedestrian Models
Pedestrian models operate differently from all other models within a simulation because, unlike vehicles that primarily break, accelerate, and turn, pedestrians engage in much more
intricate and unpredictable movements. They use their limbs in complex ways and can maneuver unexpectedly, adding a unique layer of complexity to simulations. In CARLA, pedestrians are created through a series of abstract points onto which a model is placed, similar to a detailed, human-like figure controlled by a virtual stick figure. The digital human movement can be derived from real-life data by equipping individuals with motion trackers. These trackers, which are equipped with various sensors, capture the exact movements of a person. By aggregating a lot of motion data, researchers can recreate a simulated stick figure that mimics real human movement. These datasets allow AI models to simulate human behavior accurately within the CARLA environment. Once the pedestrian models are integrated, the self-driving car’s AI observes and learns from their behavior. Through trial-and-error repetition, the car can begin to predict pedestrian actions, such as whether they will cross the street, remain on the sidewalk, or run unexpectedly into traffic.
However, a significant issue with this approach is the construction of the initial datasets, which often fail to include critical groups of people specifically, individuals with disabilities. As a result, self-driving car models can become confused when encountering pedestrians who move differently. For example, a person with a leg injury may have an abnormal walk, or the framework might fail if someone is using a walking stick. Similarly, the AI might struggle to account for individuals who are blind. This lack of inclusivity in datasets poses serious risks not only to the disabled, but to the driver of the vehicle.
While incorporating more data from individuals with disabilities into the datasets is essential, this alone is not a perfect solution. Many internal components of current models make assumptions about walking patterns. For instance, a fixed node framework might fail to represent someone missing a limb or using a mobility aid like a cane. Addressing these shortcomings requires a combination of algorithmic and systemic solutions. Drawing from research in papers such as “Insertion of Real Agents Behaviors in CARLA Autonomous Driving Simulator” and “Human as AI Mentor: Enhanced Human-in-the-Loop Reinforcement Learning for Safe and Efficient Autonomous Driving,” our lab has introduced improvements on two fronts: algorithmic upgrades and systemic adjustments.
On the algorithmic side, major changes were made to enhance how the pedestrian models function. One significant improvement involved implementing a Unified Local-Cloud Decision-Making model, a method frequently used in real-world applications like mobile robots. This approach offers greater flexibility in routing and enables a broader range of movement options. For example, it allows simulated pedestrians to navigate more realistically around each other, while also accommodating individuals with different walking patterns, such as someone using a walking stick or walking on one leg. This upgrade fundamentally increases the complexity and diversity of movement within the simulation, making it more reflective of real-world behavior, even in default pedestrian models.
The technical specifics of this model are detailed in the paper “Unified Local-Cloud Decision-Making via Reinforcement
Learning.” “The system utilizes a situational routing module that considers both the current state and a history of previous actions. Local actions are predicted by a lightweight pre-trained model that can be efficiently deployed on mobile systems.” The key innovation is that the learning is handled by a more complex, cloud-based model, which develops a “policy” a set of rules dictating what actions to take in response to specific inputs. This policy can then be interpreted and executed by a less powerful machine, such as the pedestrian model in the simulation. This setup ensures that pedestrian models retain the benefits of a more powerful AI system while minimizing computational load, leaving sufficient processing capacity for the primary car model being trained.
The second major issue lies in adding data to represent individuals with disabilities. While it might seem straightforward to integrate such data directly into the main dataset, this approach is often less effective than expected. For one, the data may be classified as outliers and thus ignored by the model. Alternatively, if the data is incorporated, it could skew the averages in an unintended direction, leading to inaccuracies in pedestrian representation. To address this, our solution involves preserving the original dataset while creating a separate "modifier" set to introduce alternate walking patterns. This method allows us to maintain the integrity of the base dataset while layering additional variations for greater inclusivity.
We construct the modifier set by analyzing differences between standard walking patterns and those of individuals with disabilities. Specifically, we compared how an able-bodied person
navigates an area versus how a blind person does. By quantifying these differences, we can train an algorithm to apply these modifications to the existing pedestrian model. This approach enables the seamless integration of diverse walking patterns into the simulation without compromising the accuracy of the original data.
Ultimately, these advancements in pedestrian modeling will significantly enhance the ability of self-driving cars to detect and interpret the movements of all individuals, regardless of disabilities. By creating a simulation environment that better reflects the full spectrum of human behavior, we aim to ensure a safer and more inclusive driving experience for everyone.
Outside Car Models
Modeling other cars in the driving environment might seem like a secondary consideration when training a self-driving vehicle. Initially, it might appear that the same technologies used to train the primary car can simply be applied to model other cars on the road. However, this approach presents two key problems. The first, and most obvious issue, is that the primary car model itself may not be fully developed yet. Even if a slightly less accurate version of the model (about, 0.1% less accurate) is used for simulating other cars, problems still arise. This is because the goal is not to ensure it performs flawlessly in the unpredictable real world. In reality, other cars on the road don’t always behave as well as the simulated car. They can be driven poorly, distractedly, or aggressively. Therefore, the self-driving car must be trained to respond to these kinds of human-driven behaviors, not just ideal driving scenarios.
Addressing these challenges requires transitioning from deep reinforcement learning to human-in-the-loop reinforcement learning. While deep reinforcement learning operates through a familiar loop where the environment feeds the model input, the model reacts, and a reward or ranking is assigned the human-inthe-loop model incorporates an additional layer: a human overseer who provides guidance. The human overseer interacts with the simulation using real-world controls, such as a (model) steering wheel and pedals, instead of simply processing numeric output. The reward function is adjusted to align with the human operator's actions, creating a more realistic feedback loop for training the model.
However, a challenge with this method is the sheer speed at which the AI trains, processing tens of thousands of iterations per second. To mitigate this challenge, techniques from the paper “Human as AI Mentor: Enhanced Human-in-the-Loop Reinforcement Learning for Safe and Efficient Autonomous Driving” are used. The key insight here is to measure the "certainty" of the AI’s decisions. When the model outputs a decision, like “Turn: 1° left, Braking: 0, Acceleration: 20%, Turn Signals: None,” it’s based on the most probable outcome. However, this is just one of many possible actions. By examining the next 100 most probable outcomes and their distribution, we can assess the level of certainty. If the model shows high uncertainty, it triggers a handover to the human operator, who then takes over to perform the maneuver. This process helps refine the AI, particularly in areas where it lacks confidence.
This method allows human drivers to provide more impactful corrections to the AI's behavior, facilitating the creation of a variety of models with slightly different driving styles. These imperfect models, each with unique handling, can then be integrated into the simulation, enabling the car to learn from a range of behaviors and driving strategies.
There are still some challenges to be addressed with this method. One issue is the reliance on a small number of drivers to provide corrections. More human operators contributing corrections would significantly improve the training process by providing more diverse inputs. Another problem is the handling of extreme, unexpected behaviors while the system can account for aggressive driving, it may not be able to handle rare situations, like a car suddenly reversing in the middle of the road. Such extreme cases remain a challenge in autonomous vehicle simulations, but they are a known issue throughout the entire development process.
Despite these challenges, incorporating a human-like operator into the AI training model ensures that the primary selfdriving car will be better equipped to handle the unpredictable and varied behaviors of human drivers on the road.
Instructor Car
The reward function is a crucial element in determining the success of a model, but its deeper mechanics are often not fully understood. In simpler models, such as those used in early video games, the reward function may be based on something
straightforward, like achieving the fastest time. However, in the case of self-driving cars, driving is far more complex than simply arriving quickly and safely. There are hundreds of parameters that contribute to good driving performance in the real world factors like defensive driving, which cannot be easily quantified numerically.
Due to these complexities, a new algorithm is required to judge the performance of the primary model. If the judging algorithm already knew how to drive correctly, we wouldn’t need to train the main model in the first place. To overcome this limitation, we take advantage of the fact that we are working in a simulated environment. The “instructor” or “expert” model, which is used to guide the main car, does not rely on the simulated sensors the primary model uses. Instead, it has access to the “ground truth.” This includes knowing when traffic lights will change, interpreting the intentions of drivers and pedestrians, and leveraging optimized coding for superior performance. By using the expert’s behavior to assign rewards, the primary car attempts to copy the expert model. However, without access to the ground truth, the primary car must rely on its sensors to learn how to react and interpret the environment on its own.
The issue with this approach is that it limits the best the main algorithm can do to the performance of the instructor car. If the instructor makes small mistakes, the reward function (which is based on how closely the main car mimics the instructor) can lead to the primary car performing worse. This is especially problematic because the instructor car, being hardcoded, can still make small errors due to its own limitations.
The solution to this problem is to have the instructor car run on its own reinforcement learning (RL) model. This would ensure that the instructor car is always performing at least as well as the main car, if not better. The only difference between the two would be their knowledge of the environment. To implement this, we need to find a suitable reward function for the instructor’s RL training. Our solution draws inspiration from the pedestrian model. Instead of using walking data to train the instructor, we use driving data, collected from human drivers, to train the expert model. The reason this data isn’t directly used to train the main model is that we can’t verify if the main car is correctly mimicking the expert’s behavior. Moreover, even if it was correctly mimicking, we want to gather more diverse data to improve the training process. By using machine learning for both the expert and the main model, any advancements made in the primary model can eventually be transferred back to the expert.
Overall, integrating an RL model into the instructor car has the potential to significantly improve many aspects of self-driving technology, from pedestrian prediction to route optimization. However, it still faces challenges, such as accuracy issues, and is difficult to implement successfully. Often, this approach leads to only marginal improvements due to the inherent complexity involved in creating such a system. Despite these challenges, the potential for advancements is substantial, and it remains an important area of ongoing research.
Conclusion
Through the addition of these improvements to the CARLA engine, the efficacy of self-driving cars not only has the potential to increase, but their safety as well. Many of these improvements, though not entirely theoretical, do require revision and refinement. However, the fundamental ideas behind these improvements have already been implemented in CARLA, and therefore, they are active within the world of autonomous driving research.
I contributed significantly to porting the reinforcement learning (RL) program from the pedestrian models to the car training models. Given that cars are fundamentally different from people, this task proved to be quite challenging. The process mainly involved running the program, identifying issues that emerged with the model, then correcting the code and program data, and repeating the process. I also assisted in the creation of a dataset that differentiated between impaired and non-impaired walkers. This was a smaller part of the work, as I helped with the equipment and took part in walking around BU with the trackers to add myself to the dataset. Overall, while difficult, this experience has been both challenging and rewarding. It allowed me to learn a lot about how self-driving cars and their algorithms work.
This was one of my first experiences working on a larger team with a computer science goal in mind. In addition to the coding and machine learning tasks I worked on, the experience was invaluable in other ways. I learned how to use a connected virtual machine, how to collaborate on GIT projects, and how to manage various aspects of computer work. Most importantly, I had the opportunity to learn from and interact with experts in the field
of machine learning, particularly Professor Eshed Ohn-Bar, who not only published a variety of leading papers but was also incredibly helpful and insightful.
Additionally, I gained a deeper understanding of the problems facing AI vehicles. One of the most significant issues I noticed was how models react to the unexpected. Whenever a model encounters something it has not been trained for, it flounders in ways that a human driver would not. While this can be fixed on an individual level, like with the pedestrian fixes, certain events, such as a natural disaster, could present extreme, rare, and highly dangerous situations. These types of unpredictable scenarios could lead to catastrophic consequences, reinforcing why selfdriving cars are still limited in where and when they can operate.
While this project has been a valuable learning experience, I am optimistic that continued advancements in CARLA-trained algorithms will significantly enhance the safety and efficacy of self-driving technology. The work is far from complete, but the progress made so far has been promising. I look forward to seeing how these improvements help make the roads safer in the future.
Works Cited
Dosovitskiy, A., Ros, G., Codevilla, F., Lopez, A., & Koltun, V . (2017). CARLA: An Open Urban Driving Simulator. arXiv.
https://doi.org/10.48550/ARXIV .1711.03938
Eshed Ohn-Bar. (n.d.). Looking at Humans in the Age of SelfDriving and Highly Automated Vehicles. Retrieved
September 23, 2024, from https://ieeexplore.ieee.org/abstract/document/7501845
Ghasemi, Majid, et al. An introduction to reinforcement learning: Fundamental concepts and practical applications." arXiv preprint arXiv:2408.07712 (2024).
Hossain, J. (2023). Autonomous Driving with Deep Reinforcement Learning in CARLA Simulation. arXiv.
https://doi.org/10.48550/ARXIV .2306.11217
Huang, Z., Sheng, Z., Ma, C., & Chen, S. (2024). Human as AI mentor: Enhanced human-in-the-loop reinforcement learning for safe and efficient autonomous driving. In Communications in Transportation Research (V ol. 4, p. 100127). Elsevier BV.
https://doi.org/10.1016/j.commtr.2024.100127
Nehme, G., & Deo, T. Y . (2023). Safe Navigation: Training Autonomous Vehicles using Deep Reinforcement Learning in CARLA. arXiv. https://doi.org/10.48550/ARXIV .2311.10735
Pawelczyk, M., Bielawski, S., Heuvel, J. van den, Richter, T., & Kasneci, G. (2021). CARLA: A Python Library to Benchmark Algorithmic Recourse and Counterfactual Explanation Algorithms. In arXiv. arXiv.
https://doi.org/10.48550/ARXIV .2108.00783
Serrano, S. M., Llorca, D. F., Daza, I. G., & Sotelo, M. ngel. (2022). Insertion of real agents behaviors in CARLA autonomous driving simulator. arXiv. https://doi.org/10.48550/ARXIV .2206.00337
Serrano, S. M., Llorca, D. F., Daza, I. G., & Vzquez, M. ngel S. (2023). Realistic pedestrian behaviour in the CARLA simulator using VR and mocap. arXiv. https://doi.org/10.48550/ARXIV .2309.04418
Vasco, M., Seno, T., Kawamoto, K., Subramanian, K., Wurman, P. R., & Stone, P. (2024). A Super-human Vision-based Reinforcement Learning Agent for Autonomous Racing in Gran Turismo. arXiv. https://doi.org/10.48550/ARXIV .2406.12563
Zhang, Z., Liniger, A., Dai, D., Yu, F., & Van Gool, L. (2021). Endto-End Urban Driving by Imitating a Reinforcement Learning Coach. arXiv. https://doi.org/10.48550/ARXIV .2108.08265
