I spoke today with Rivian software head Wassym Bensaid about his last harrowing 36 hours. Rivian’s software team took action after an incorrect version of the OS update was sent to the company’s fleet with an incorrect certificate. The update stalled before it could be completed, disabling most consumer-facing infotainment features on about 3% of the company’s consumer vehicles, according to Bensaid.
Rivian made Bensaid available to discuss the incident and the OTA resolution, which will be sent to customers as early as 9:30 AM PT (12:30 ET).
I guess as a Rivian owner I’m happy that it can be resolved via an OTA, but I’m more concerned that this could even actually happen. And it CANNOT happen again.
I asked Bensaid what went wrong and my understanding is that the software was tested on at least two developer-built Rivians that were not affected by the bad certificate before release. Naturally, the correct version had already been tested for over a month on a fleet of at least 1000 test vehicles. But that pre-release subset seems far too few and limits a subset of vehicles to get a live OTA OS update.
What happened last month with the last printing is that the wrong link was selected, unfortunately with the wrong certificate. So this is the cause of the problem. When we got the reports, initially we got reports around 5:30 p.m. Pacific, the reports were a bit confusing in that some people reported bricked cars, others reported the cluster and then the camera still working. So while we were working on getting the reports, we wanted to be super conservative, and there were multiple solutions available to us. If cars were really broken, that would have been a service call. If parts of the car were still alive, that would mean they could probably be repaired via our mobile service vehicles. And then the team actually took this opportunity to really zoom out and came up with a super creative solution, which now allows us to completely solve the problem via an over-the-air update. That is why we are sending out a new OTA today, fully addressing the problem. So it actually repairs the damaged image.
Bensaid noted that Rivian is reevaluating the entire process so that human error can never do something like this again. That means normal consumer vehicles will get the OTA update and test it before sending the update to more vehicles.
We initially didn’t want to get into that line of communication because whether it’s 3% 10% 1% 0.5%, it’s still super important to us. Every user, every customer is important. And job number one says the last 36 hours was how we as a team can find the best possible solution for our customers, and then the rankings: the best possible solution is a remote solution. The worst possible thing is that they actually have to go to the service or they have to tow the vehicle and then the team actually has to put in a lot of effort. And we managed to come up with a great solution that helps us tackle the problem remotely. It’s also because we have an architecture that has a lot of redundancies and that really allows us to do these types of operations and that actually appears as if we started to understand what was happening in the field. The vehicle was still operational, the app was still operational, the critical parts of the system were still operational. So the security-based, redundant design that we have has actually protected us. And then we used that as a way to basically inject the recovery solution in this case through a remote solution using these security systems, which is what we’ll deploy today.
The build that was to be released was tested for months on regular vehicles, but a single human copy/paste error caused the wrong build to be run. That process is also being revised so that multiple checks of the build take place before it is released to the broader customer base.
Owners affected by this, again about 3% of the fleet according to Rivian, should see an update on their phone app and start the process from there. For the few who are not using an app with their Rivian, they will need to call the Rivian service line to initiate the update from there.
All of the above is what I want to hear as a Rivian owner, but as a reporter I would also have liked to see Rivian’s communication be more official. The original Reddit post was timely and better than nothing, but it was also a process to verify that the user was really Bensaid. It took over 10 hours for the PR team to even acknowledge there was a problem, and only after we showed them the Reddit post.
I think the whole Rivian team can do better here and from the vibe I get, they will too.
FTC: We use monetized auto-affiliate links. More.