I spoke with Rivian software chief Wassym Bensaid today about his harrowing last 36 hours. Rivian’s software team went haywire after an incorrect OS update was sent out to the company’s fleet with an incorrect certificate. The update hung before it could complete, disabling most of the consumer-facing infotainment features on about 3% of the company’s consumer cars, according to Bensaid.
Rivian made Bensaid available to discuss the incident and the OTA fix, which will be rolling out to customers as early as 9:30 a.m. PT (12:30 p.m. ET).
I guess as a Rivian owner I’m happy that it will be able to be fixed via an OTA, but I’m more concerned that this could actually happen. And it CANNOT happen again.
I asked Bensaid what went wrong and my understanding is that the software was tested on at least two “developer built” Rivians that were not affected by the bad certificate before it expired. It seems like far too few and limited subset of vehicles to test an OTA OS update on.
Since the last month what happened in the last push is the wrong link was selected unfortunately with the wrong certificate. So this is the cause of the problem. In the beginning when we got the reports, there were so we started getting reports around 5:30 p.m. Pacific, the reports were a bit confusing in the sense that some people reported great cars, others that the cluster and then the camera still work. So while we were fighting to get the reports, we wanted to be super conservative and there were multiple avenues for us. If cars really broke down, it would have been a service call. If parts of the car were still alive, that probably would have meant a way to get them fixed through our cellular service band. And then the team basically used this opportunity to really zoom out and they came up with a super creative solution that basically allows us to now fully fix the problem through an over-the-air update. So we will be rolling out a new OTA today which will fix the issue completely. So it basically repairs the damaged image.
Bensaid noted that Rivian is re-evaluating its entire process so that human error can never do anything like this again. This means that normal consumer vehicles get the OTA update and are tested before the update is rolled out to more vehicles.
We didn’t want to go into that line of communication at first because whether it’s 3% 10% 1% 0.5%, it’s still super important to us. Every user, every customer matters. And job number one says the last 36 hours was how can we as a team find the best possible solution for our customers and then the ranking is the best possible is a remote solution. The worst case scenario is basically they have to service or they have to tow the vehicle and then the team is basically using a lot of effort. And we managed to come up with a really good solution that helps us solve it externally. It’s also because we have an architecture that has a lot of redundancies and that really allows us to do these kinds of operations and actually turns out like when we started to understand what was happening in the field. The vehicle was still operational, the app was still operational, the critical parts of the system were still operational. So the security based I redundant based design that we have in place has actually protected us. And so we’ve used that as a way to basically inject in this case the recovery solution through a remote fix by leveraging these security systems, which is what we’re going to implement today.
The build to be discontinued was tested for months on regular vehicles, but a single human copy/paste error sent the wrong build out. That process is also being revised so that more checks on the build go out before it is released to the wider customer base.
Owners affected, again around 3% of the fleet according to Rivian, should see an update on their phone app and should start the process from there. For the few who don’t use an app with their Rivian, they need to call Rivian’s service line to initiate the update from there.
Some beta testers have already successfully installed the update as a Twitter user @rivansoftware
All of the above is what I would like to hear as a Rivian owner, but as a reporter I would also have liked the communication from Rivian to be more official. The original Reddit post was timely and better than nothing, but it was also a process to verify that the user was indeed Bensaid. It was over 10 hours before the PR team was even able to acknowledge that there was a problem, and only after we showed them the Reddit post. I think the Rivian team can do better here.
FTC: We use income earning auto affiliate links. More.