CrowdStrike Crisis Leaves Lessons for Healthcare
By Greg Freeman
Executive Summary
The CrowdStrike outage’s effects on the healthcare industry yields important lessons. Organizations should prepare for technical outages from various origins.
- The outage originated with an update to a Microsoft program.
- Prepare to revert back to manual processes.
- Redundancy and backup systems are crucial.
The CrowdStrike debacle affected many health systems and hospitals, shutting down critical systems and forcing many to delay or cancel procedures. The experience holds lessons for healthcare organizations in how to avoid such a crisis in the future and how best to respond if it occurs. The cybersecurity firm attempted to update its Falcon Sensor product, but a bug caused some Microsoft machines to crash, displaying the “blue screen of death.” The fix required manual repairs of individual machines, slowing the eventual recovery.
The incident shows that hospitals and health systems should expect chaos and disruption rather than planning for specific incidents, Michael Mainiero, chief digital and information officer with Catholic Health in Long Island, NY.
“Instead of trying to predict or prepare for events such as hurricanes, ransomware, or power outages, we need to assume that, at some point, a significant part of our ecosystem will face a shutdown,” he says. “The real question is how prepared we are from a resilience perspective — both in terms of infrastructure and workforce — to quickly detect, communicate, and muster the team, then triage and mobilize to overcome these challenges.”
This incident also underscores the importance of evaluating the engineering culture at partner companies, Mainiero says. CrowdStrike is a reputable company with strong engineering and cultural values, which shows that even the best can encounter issues, he says.
“It’s crucial to acknowledge that no vendor can guarantee 100% infallibility,” he says. “While we can and should work with vendors to improve their processes, our primary focus must be on ensuring our own ability to respond rapidly and effectively to any disruption.”
Ensuring clinical operations are safe requires comprehensive planning and effective communication. It is important to identify potential problem areas and have quick remediation plans for clinical floors, Mainiero says.
“During the CrowdStrike incident, our facilities and clinical staff provided tremendous support, prioritizing critical areas such surgical units, ICUs [intensive care units], and emergency departments. Continuous dialogue with clinical leadership and regular drills are vital,” he says. “You need to be proficient in downtime procedures and able to swiftly mobilize cross-functional teams to address and mitigate issues.”
To protect patient safety, Mainiero says healthcare organizations should focus on these goals:
- Comprehensive resiliency planning: Develop plans that assume significant disruptions will occur, focusing on maintaining operations and patient care.
- Robust incident response: Establish and regularly update an incident response plan that includes predefined roles and flexibility for ad hoc tasks. Implement monitoring and alerting systems that can mobilize information technology (IT), clinical, and operational teams 24/7.
- Infrastructure redundancy: Invest in redundant systems and failsafes, including a secondary datacenter and, ideally, cloud solutions for critical applications, such as electronic health records. Follow the 3-2-1 backup rule (three copies of data, on two different media, with one off-site) to ensure data integrity.
- Regular drills and training: Conduct frequent disaster preparedness drills and annual cutover exercises for disaster recovery, learning from these exercises to improve.
- Strong vendor partnerships: Maintain close relationships with technology vendors to ensure rapid support and issue resolution. You need to be able to pick up the phone and get support 24/7.
- Continuous improvement: Learn from each incident to refine and enhance resiliency plans. Pay it forward and make changes.
Resorting to Manual Processes
Many healthcare organizations had to resort to makeshift manual processes where possible during the CrowdStrike outage, notes Kate Needham-Bennett, senior director for resilience innovation with Fusion Risk Management in Chicago. “Others were forced to halt services altogether. Without access to critical systems, computers, and applications, the outage had a significant impact on healthcare organizations and services, forcing hospitals to delay and reschedule operations and treatments, as well as impacting imaging and other critical health services and emergency response teams, including multiple states where 911 systems went down, she says.
“The outage illustrated just how dependent many organizations, including healthcare organizations, are on a sole software provider, and how a disruption like this can leave them unprepared to navigate work and services without access to technology from a third-party vendor,” she says. “In order to be prepared for a large-scale disruption like this in the future, it is critical that healthcare organizations have a complete understanding of the complex, downstream impact a potential third-party disruption may have on their important business services.”
The CrowdStrike failure highlighted the need across the healthcare industry for more exercising and testing against severe but plausible scenarios, Needham-Bennett says. Historically, firms have been reluctant to test against scenarios they deemed to be outside of their control, such as global IT outages from third parties that underpin all services, such as Microsoft, Amazon Web Services, and Citrix, or they simply declared them implausible or unlikely, she says.
“The healthcare industry needs to have contingency plans in place to deal with the impact of disruptions like this, even if they have no control over the root cause. There is an inclination to simply sit back and wait for IT to get the systems back up and running, but organizations will still have to deal with the operational, financial, and reputational fallouts,” Needham-Bennett says. “They must make it a priority to scrutinize their supply chains and understand which services are reliant on IT systems and third parties that business users may be unaware of, and run scenario testing regularly, at all levels, and against a wide range of scenarios — even the scary ones.”
Need for Robust Cybersecurity
Recent incidents revealing vulnerabilities in systems worldwide highlight the importance of robust cybersecurity, including in healthcare, says Jessica Rengstorf, director of U.S. healthcare strategy at Endava, a healthcare technology company in New York City. As technology becomes more integrated into patient care, safeguarding these systems is crucial because cyberattacks can directly impact patient outcomes, she says. When systems fail, patients may experience delays in treatment, compromised medication delivery, and other critical disruptions, she notes. These events show that cybersecurity isn’t just an IT issue but a direct patient safety concern, emphasizing the need for healthcare organizations to prioritize and invest in effective security measures, Rengstorf says.
Mitigating risks to patient safety requires a comprehensive approach to cybersecurity, she says. Implementing a layered security strategy, including firewalls, encryption, and intrusion detection systems, can help protect sensitive patient data and critical infrastructure, she says. Regular software updates and patches are essential for addressing vulnerabilities promptly. Training staff to recognize and respond to potential threats can significantly reduce risks.
Additionally, collaborating with external experts and leveraging advanced threat detection technologies also can improve protection, Rengstorf says. Preparedness for software failures is crucial, and implementing redundancy and backup systems ensures that critical operations continue uninterrupted, she says.
To further enhance patient safety, healthcare organizations should consider implementing redundancy and backup systems to ensure continued operations in case of disruptions, she suggests. Investing in advanced technologies like analytics and artificial intelligence can help identify potential threats and vulnerabilities.
“Strong partnerships with technology providers can also provide valuable support and expertise. And, of course, adherence to regulatory standards like HIPAA is fundamental. Ultimately, a coordinated approach involving IT, clinical staff, and other departments is essential for effective cybersecurity management,” Rengstorf says. “Finally, a thorough crisis management plan, practiced regularly and communicated effectively, is essential for preparedness in any situation.”
Many would like to consider the CrowdStrike issue a cyber event, but it was not, says Jeffrey Kaiserman, cloud and security director at Slalom, a business and technology consulting company in Seattle. A technology outage can occur for many reasons, and in this case, the outage was due to a bad software update according to CrowdStrike.
“‘Some things fail at the same time’ is my corollary to Verner Vogels’ ‘Everything fails all the time’ comment. When some things fail at the same time, organizations can end up in a position of having to function without key technologies for minutes, hours, or even days,” he says. “So, what can we learn from this? Organizations need more than just business continuity plans, high-availability architectures, and disaster recovery environments. Organizations must get to brass tacks and understand their business process and conduct a business impact analysis.”
From this analysis, organizations can determine which business processes are critical and then can begin to develop alternative ways of delivering those processes, Kaiserman says. The alternative ways must consider a total loss of primary technology.
“This is by no means a quick fix to protecting patient safety, but it does provide a pragmatic approach to handling different types of technology outages,” he says.
SOURCES
- Jeffrey Kaiserman, Cloud and Security Director, Slalom, Seattle. Telephone: (206) 438-5700.
- Michael Mainiero, Chief Digital & Information Officer, Catholic Health, Long Island, NY. Telephone: (631) 465-6000.
- Kate Needham-Bennett, Senior Director, Resilience Innovation, Fusion Risk Management, Chicago. Telephone: (847) 632-1002.
- Jessica Rengstorf, Director of U.S Healthcare Strategy, Endava, New York City. Telephone: (212) 920-7240.
The CrowdStrike debacle affected many health systems and hospitals, shutting down critical systems and forcing many to delay or cancel procedures.
Subscribe Now for Access
You have reached your article limit for the month. We hope you found our articles both enjoyable and insightful. For information on new subscriptions, product trials, alternative billing arrangements or group and site discounts please call 800-688-2421. We look forward to having you as a long-term member of the Relias Media community.