On Monday, the National Highway Traffic Safety Administration opened an investigation into Tesla. The agency claims that there have been 11 incidents since 2018 in which Tesla vehicles struck stationary first-responder vehicles attending to the scene of an emergency; there’s allegedly been 17 injuries and one fatality as a result. The NHTSA is narrowing in on the company’s Autopilot system, noting that the Teslas in these incidents “were all confirmed to have been engaged in either Autopilot or Traffic Aware Cruise Control during the approach to the crashes.” The investigation will cover Tesla models Y, X, S, and 3 that were released between 2014 and 2021. Autopilot’s difficulties with sensing firetrucks and other emergency vehicles has been a known problem for years, and the feature has also been criticized as encouraging drivers to rely on it as though it is a self-driving system when in fact it is only meant to assist an engaged driver. To better understand the issue, I spoke with Raj Rajkumar, an electrical and computer engineering professor at Carnegie Mellon University who specializes in self-driving vehicles. Our conversation has been condensed and edited for clarity.
Aaron Mak: Why might Teslas be having this issue with stationary emergency vehicles?
Raj Rajkumar: In the past, there have been two kinds of sensors: cameras and radar. In more recent versions, they [Tesla] removed the radar, so it is completely dependent on cameras. Radar emits electromagnetic waves. That’s like the radio waves when you listen to the radio in your car. They hit an obstacle and then they come back, you receive that radar echo and then determine how far that object is. Most cars, except for Tesla, have them for automatic emergency braking, adaptive cruise control, and the like.
So when the electromagnetic waves go out and come back, and they hit a moving object, there is the Doppler effect, which you might recall from high school physics. The frequency changes. If it’s coming toward you, the frequency increases. If it’s going away from you, the frequency decreases. With the radar technology, it’s easy to detect moving objects because the Doppler effect is so pronounced. For stationary objects, you have this problem of the echoes potentially coming back from a road surface, or the walls of a building on the side of the road, and so on. Stationary objects are trickier; you need more recent generations of radars, which have a lot more computing power and can detect and track a lot more objects. On the Teslas, when they had radar, they used to have a problem detecting stationary objects, particularly when they had to deal to with the situation where the vehicle is actually approaching an overpass, and it would get this signal saying that there is an object out there. From the radar, all you get is that there’s an object in the front, and then they might end up actually braking. It’s called phantom braking. That’s a false positive: The radar declares there’s an obstacle, and there is none. If they had lidar in there, the lidar could be used to confirm if there was obstacle or not. [Editor’s note: Lidar is an emerging laser-based radar technology.] Tesla has famously said that they will not use lidar.
[Tesla] started depending on the camera, but camera is a very different animal. [Editor’s note: Tesla removed radar from its cars starting in May.] With cameras, you basically get numbers, which correspond to the red, green, and blue values of each pixel. And these are just numbers for a computer. So the computer has to interpret them and understand what those pixels mean. That happens with a relatively recent technology called deep neural networks. These networks are extremely good at doing pattern matching. You “train them” by giving them hundreds of thousands, if not millions, of images. In each image, you say “this part of the image is a car,” or “this part of the image is a truck, a bus, a bicycle, a motorcycle,” and so on. You basically manually or semi-automatically annotate these images and then feed them into the network. Then the network sees all these things millions of times and it kind of understands the patterns in the pixel value to look for. And then you give it an image and ask, “Is there a car in here?” Because [Tesla’s network] understands the pattern to look for, it will try to match the pattern. If it comes across the pattern, it will declare, “I do see a car in this region of the image.”
It’s basically a super-duper, very sophisticated pattern-matching scheme. The problem is that, in the real world, it is given an image where it sees an obstacle that it has never seen before. The patterns really do not match, so it will not detect it as a vehicle. For example, when the first person was killed using the Tesla autopilot in Florida, the truck [hit by the Tesla] was perpendicular to the direction the motion. The training did not have those images at all, so therefore the pattern matcher did not recognize that pattern. There’ve been many recent incidents where the Tesla vehicles run into a firetruck or police vehicle. The lights are on, so the red, green, blue pixel values look different as well, and therefore the patterns do not match. Lo and behold they declare that there’s no obstacle ahead, and the vehicle very promptly dysfunctions and has no idea there’s something in front of it.
According to the NHTSA, most of these incidents occurred after dark while the first-responder vehicles were flashing lights and had flares and flashing arrow boards around them. Do you think these lights could confuse the cameras?
I’m sure that’s part of the problem. When the lights are spinning and flashing, looking at it from the camera image, these are just pixels, meaning that they have numbers, and the numbers basically go up and down, up and down in some regions, when the light flashes. Unless the training phase has been given those images and that kind of modality, it could throw the pattern matching off. These emergency vehicles look different from a normal car or truck. The colors may be different as well; different jurisdictions may be using different colors. The images that were fed to the Tesla neural network may or may not have that combination of shape, orientation the vehicle, the colors of the vehicle, and the flashing lights.
So what can be done to address this flaw?
The lack of number of images of that kind in the training phase is really what it comes down to. There are a couple of ways to deal with it. One is basically to generate a lot more images of emergency vehicles under different lighting conditions: daytime, nighttime, the sun low on the horizon during dusk and dawn. But people like me believe that no matter what you do, in the future there will be somebody who comes up with a new vehicle design or orientation that has never been seen before. So it’s very difficult, if not impossible, to generate all the kinds of images that need to be fed for training purposes.
You should use the other sensors. Radar will provide some information. Lidar will provide a lot more different kinds of information. With that combination, you can definitely detect and deal with the situation. But of course, Tesla does not use radar at this point, and it’s not going to use lidar. It’s a twofold problem: With the camera alone, the training is not sufficient, and meanwhile they [Tesla] forgo all the other sensors.
You’ve been talking about this issue with Tesla’s autopilot at least since 2018. Do you get the sense that its technology has improved at all in that time?
It’s very difficult for me to quantify what the progress would be, but I would expect that Tesla’s been feeding [the network] more images for training. So I’m sure they would have done that, but clearly they’re still missing these cases in which the vehicles look different—for example, when a vehicle is rolled over. There’s a case in Taiwan where the Tesla is on autopilot and runs directly into a truck that has rolled over. So in the training, they might not be feeding it images of rolled over trucks. That’s a case where training alone will not be sufficient, and they should be looking at other sensors like radar and lidar.
So you think it’s a mistake for Tesla to forgo lidar?
From an operations standpoint, having lidar would certainly detect these obstacles, and the crashes can be avoided. The practical compromise that they’re making, which I do appreciate, is that lidar is expensive. Tesla is looking to sell cars. The economics come into play.
Do you think this problem is common enough that we should be worried about it? Does it merit a government investigation?
I certainly think that it could be. Clearly, they [Tesla cars] should be monitoring what the driver is doing or not doing. The monitoring aspect alone should be a subject of a study by the NHTSA. Tesla has been saying that as long as the person is holding the steering wheel, they let autopilot be operational. But it turns out that people have found all sorts of tricks to bypass that scheme. We see cases where people are able to take their hands off the wheel minutes at a time. It is certainly dangerous. Users do not have a full understanding of what the system is capable of and what its limitations are. They very often reach very incorrect conclusions. It’s up to the seller, the regulators, the government to make sure that you’re protecting the consumers.
Future Tense is a partnership of Slate, New America, and Arizona State University that examines emerging technologies, public policy, and society.