Rethinking cybersecurity: Preventing the next massive IT outage

“The CrowdStrike outage highlights the risk of lack of resilience and the danger posed by over-reliance on single sources of technology and software, and it has become a blueprint for nation-state adversaries to recalibrate tactics, techniques, and procedures and cause devastating consequences,” writes Morgan Wright, Chief Security Advisor of SentinelOne.

Morgan Wright

12:25, 28.07.24

One of the most frequently asked questions by anyone involved in cybersecurity and national security is what an actual cyber attack would look like and its effects. This question was answered when a faulty software deployment caused the largest IT outage in history and impacted organizations and lives around the globe from critical infrastructure to travel, healthcare and beyond.
The outage wasn't a black swan event. A 2016 report by the Office of Cyber and Infrastructure Analysis of DHS was quite prescient and painted the picture of what could happen as a result of our digital dependencies. But it was worse than they predicted.
1 View gallery 
Morgan Wright. 
(Photo: SentinelOne)
The CrowdStrike outage  highlights the risk of lack of resilience and the danger posed by over-reliance on single sources of technology and software, and it has become a blueprint for nation-state adversaries to recalibrate tactics, techniques, and procedures and cause devastating consequences. From a pure threat perspective, Maslow's Hierarchy of Needs highlights that power and water—resources that directly affect community safety and security—remain most at risk of significant attack.
To bring a nation to its knees, going after power and water yields the most significant effect. This targeting is part of IPB—Intelligence Preparation of the Battlefield. Vulnerabilities and targets are identified and continuously updated so that, in the event of significant hostilities or war, an arsenal of cyber weapons can be used against vulnerable critical infrastructure to degrade and impact our ability to defend and protect the homeland and prosecute a war.
Related articles:
Who will cover the damages of the largest computer failure in history?
CrowdStrike-Microsoft outage: A technological wake-up call
CrowdStrike's troubles open new doors for Israeli cyber companies
An attack against America and our critical infrastructure would unfold as we saw with the CrowdStrike outage: cascading failures that would trigger more interdependent system failures. The failures would overwhelm the ability to respond immediately, and the lack of response to vital systems would trigger additional failures.
Essential services would be offline (e.g., 9-1-1 or critical healthcare services). As with the CrowdStrike outage, the ability to recover would be directly affected by the cause of failure and prevent remote service restoration. When response teams have to take action physically, the length and breadth of the impact become magnified. Responders would not be able to address issues at scale. Instead, the most devastating impact would be a 1:1 response—one problem and one responder. The inability to scale mitigation and response would extend the event exponentially.
This was foreshadowed in a CISA advisory about Volt Typhoon and the critical infrastructure sectors they already have a presence inside. Recent testimony by Christopher Wray, FBI Director, showed how Chinese government hackers were trying "...to find and prepare to destroy or degrade the civilian critical infrastructure that keeps us safe and prosperous." 
We saw a microcosm of how citizens would respond during the Colonial Pipeline ransomware attack. There was a run on gasoline because of a 'perceived' shortage of energy, not an actual one. The unrest only took hours to swell because of social media and incorrect reports of what happened.
All of this begs the question: How did this latest outage happen? Could it have been prevented? Could it happen to any IT vendor? And how do we learn from it and move forward?
This outage happened because speed was prioritized over stability and safety, and quality assurance was inadequate. The result was the release globally of a single file that brought a vast swath of the digital world to a screeching halt. If anyone wanted to know how dependent we are on technology and the risks that come with it, the outage brought those concerns front and center.
There is a misconception circulating worthy of short story fiction that this could happen to any software vendor. It is more the norm that software design, feature and product updates are fully tested, staged and gradually rolled out globally to avoid points of failure causing more widespread issues.
The full effects of this outage have yet to be quantified. But you don't need a statistics degree to know it's bad when 9-1-1 goes down, critical operations in hospitals are severely affected, and air travel comes to the same kind of ground stop we had on 9/11.
This outage is a major opportunity to reset the proverbial wisdom that single large-scale vendors are the prudent choice in all circumstances, and offers a real-time case-study for better preparation and vigiliance to avoid such cascading failures in the future. 
Morgan Wright is the Chief Security Advisor of SentinelOne.

TAGS:

Headlines

Ynet News

Rethinking cybersecurity: Preventing the next massive IT outage