InkBridge Networks - A new name for Network RADIUS

Big Tech Concentration Made CrowdStrike Update a Catastrophe

Alan DeKok, CEO, InkBridge Networks


As we dissect the CrowdStrike outage, we’ll find the human error was multiplied by the concentration in Big Tech, says network security expert Alan DeKok of InkBridge Networks. 


There will be intense regulatory scrutiny and root cause analysis of the CrowdStrike outage of July 19 that caused chaos as it shut down much of the world’s digital infrastructure. The cascade of events was morbidly fascinating. 

 

As the maintainer of an open-source software product, an update gone wrong is a nightmare scenario for me. I hold sacred the responsibility to provide a safe, reliable, secure product. All of us supporting FreeRADIUS take that obligation seriously.  

 

Somewhere along the line at CrowdStrike, someone didn’t value safety enough. 

 

A few days after CrowdStrike’s crippling blow, the estimated tally was 8.5 million machines affected. FreeRADIUS and InkBridge’s RADIUS software packages are not deployed at that scale, but we still have millions of users relying on our authentication software for network security. Each update must be thoroughly vetted. It’s a business necessity and a moral obligation. 

 

Several factors moved the CrowdStrike outage from a failed update to a global catastrophe. First, an error in an update made it through CrowdStrike’s testing and auditing procedures. Then, the effect of that error was compounded by distributing the update simultaneously to many users. The third complicating factor in this scenario is the market concentration in Big Tech.  

 

There’s little I can say about the error. It’s an occupational hazard that I, and my company, work very hard to avoid. After the XZ Utils attack earlier this year, I explained our security practices for updates to the FreeRADIUS open-source project. We take product safety seriously. 

 

Worldwide distribution leads to worldwide failure  

 

In my corner of the IT world, it’s not a good practice to push an update out to all users or machines at the same time. The recommended approach is to update a subset of users and check for problems. If all is well, proceed – with caution – to the next set of users. 

 

Serving the cybersecurity field as CrowdStrike does, updates are frequent (and apparently not parsed out in small doses). In the company’s own words, “Sensor configuration updates are an ongoing part of the protection mechanisms of the Falcon platform. This configuration update triggered a logic error resulting in a system crash and blue screen (BSOD) on impacted systems.” Maybe they’ll learn from this? 

 

This disastrous event draws attention to a bigger worry in the tech world: concentration. Placing most of the global digital infrastructure in the hands of a few companies is problematic. 

 

With the CrowdStrike outage, we saw cascading failures, because cloud services and the software which forms the backbone of the Internet are concentrated with a few suppliers.  

 

I liken the cloud services supply landscape to a monoculture. Because of that, it’s ripe for an epidemic. There’s no diversity to deter or slow an attack.  Whether the epidemic is an attackers virus or a suppliers misconfiguration doesn’t really matter. 

 

Regulators are now recognizing this threat. Just a few days before the CrowdStrike fiasco, the U.S. Department of the Treasury published a suite of resources to share with financial services institutions on effective practices for secure adoption of cloud services. In the press release, Consumer Financial Protection Bureau Director Rohit Chopra said: “Our financial system is essential infrastructure for the entire economy, and it is deeply reliant on a handful of powerful Big Tech cloud service providers.”  

 

The sensor configuration update that caused the system crash was remediated on July 19, 2024 05:27 UTC. The bad code was out in the wild for 78 minutes. The repercussions of that error will be felt in the tech sector for many years.