214. Business Leaders' lessons from the global IT outage

business strategy case study tech trends Jul 31, 2024

10,000 flight cancellations, disruptions in the UK National Health Service, and the banking sector cast into turmoil.

The CrowdStrike outage sent the entire world into disarray.

In this episode of the Tech For Non-Techies podcast, host Sophia Matveeva imparts crucial knowledge on how to prevent IT crises and emphasizes the importance of understanding tech for business leaders.

 

Timestamps

00:00:00 Introduction

00:02:21 Severity of the CrowdStrike outage

00:03:22 Overview of the CrowdStrike Incident

00:04:23 Implications of Software Malfunctions

00:06:30 Importance of IT Risk Management

00:08:34 Software Testing and Quality Assurance

00:11:27 Aligning Engineering and Business Objectives

00:13:23 Relationship-Building between Tech and Business Teams

 

To learn about the Post Office disaster, 

https://www.techfornontechies.co/blog/lessons-from-the-post-office-scandal

 

For the transcript, go to: https://www.techfornontechies.co/blog/business-leaders-lessons-from-the-global-it-outage

 

For more career & tech lessons, subscribe to Tech for Non-Techies on:

Do you want to succeed in the Digital Age?

Check out the Digital Leadership Coaching Program

 

 

Transcript:

00:00:00

The CrowdStrike outage led to over 10,000 flight cancellations around the world, the UK National Health Service being offline and banks like JP Morgan not operating properly. Today, we are living in an age when an IT glitch affect everyone, not just the techies. And this is why as a business leader, you have to know what you can do to prevent it or at least to mitigate it. And that's what this episode is going to help you with.

 

00:00:31

Welcome to the Tech podcast. I'm your host, tech entrepreneur, executive coach, and Chicago Booth, MBA Sophia ra. My aim here is to help you have a great career in the digital age, in a time when even your coffee shop has an app. You simply have to speak tag on this podcast, I share core technology concept, help you relate them to business outcomes and most importantly, share practical advice on what you can do to become a digital leader today. If you want to have a great career in the digital age, this podcast is for you. Hello, smart people. How are you today? I really hope that your summer vacation plans have not been messed up by the CrowdStrike disaster. I've been feeling so sorry for all of those people stuck at airports and I'm currently enjoying the summer season in London and the weather is glorious.

 

00:01:27

And I was actually supposed to take a flight to France at the time of the outage, but thankfully my trip was postponed and so I didn't have to howl in an airport. But I really do feel for those people. In today's lesson, I want to explain what happened in this whole CrowdStrike saga and share what lessons you as a business leader can learn from it. And so you could protect yourself as much as you possibly can. And this lesson is useful for anybody who has or will have any decision making capacity at work. So if you're a startup founder or if you're a corporate leader, you need this. Today I'm gonna share the basics of how software should be released properly. And don't worry, it's not gonna be technical and it will be really easy to understand and some things are actually gonna be pretty obvious.

 

00:02:21

And I bet that at some point you will have to sign a contract with a software provider. And as a decision maker, you do need to understand what they do. So basically you know what you're signing and that's why we're doing this episode. And if you have listened to a few tech non-techies episodes already and you found them valuable, but you haven't yet subscribed, then my darling smart person, this is the time. That time is now in this podcast, I'm bringing you high quality free education that helps you to have a great career in the digital age. And the more you understand how tech and business work together, the more opportunities open up to you and frankly, the more money you'll make. Wouldn't that be nice? And honestly, I think that makes it worth subscribing to a free podcast. In today's episode, you are going to hear the five lessons that every intelligent business leader needs to learn from the CrowdStrike outage.

 

 00:03:22

But before we get to the lessons, let's have a quick recap of what actually happened. 'cause you've probably seen it in the news, but here are some actual facts. CrowdStrike is a cybersecurity company which sells its services to other companies like Microsoft. So Microsoft is one of its customers. So CrowdStrike itself is a B2B business to business company that sells to Microsoft, which sells to consumers and to organizations. On July the 18th, 2024, CrowdStrike rolled out a software update which then impacted its customer, Microsoft. And so that anybody who used Microsoft got impacted by it as well. The chief executive of CrowdStrike, George Kurtz said that a fix had been deployed for a bug in an update, and that bug affected Microsoft Windows PCs and that's a really popular product. And many of these PCs crashed and some displayed what you might know as the blue screen of death.

 

00:04:23

You know, like when basically your PC is unusable. And happily, apple, Mac and Linux users were unaffected. I am a very happy Mac user, especially now. And as I'm sure you know, lots and lots of companies actually use Microsoft. So when Microsoft Windows PCs crashed, so did the systems of companies around the world and the effects have been absolutely enormous. So over 10,000 flights have been canceled globally. Oh my god. And Delta Airlines was the worst affected airline. And this is all happening during this summer vacation season. My god, this really sucks. The British National Health Service said that the majority of GP practices had experienced disruption and ambulance services reported increases in emergency calls from patients who basically couldn't contact the NHS otherwise. So that's really serious. That is the healthcare system of an entire country. And the financial system globally was also affected.

 

00:05:25

So for example, Vanguard and asset manager with $7.7 trillion in assets under management said that some of its webpages weren't loading and the London Stock Exchange couldn't display company information and company information is literally the lifeblood of trading. So basically things were bad, and I could go on, but you get my point. The outage was massive. It affected normal people and it affected huge rich businesses. Microsoft, to their credit, released a fix the day after the outage. And slowly the world seems to be getting back to normal. And you know there are still flights on the backlog. So what can we learn from this? Well, I mean, I suppose you could basically become a hermit, never use any digital technology and live in a cave. That's option one. You don't want to take that option. Then carry on listening to this episode. Lesson number one is that our IT systems are not foolproof and it sucks to say it, but sometimes mistakes and software will just happen and they will have a real world effect.

 

00:06:30

And that's why it's so important for business leaders to have a basic grasp of how software is made and managed. Because basically it could hold you hostage leadership teams in companies have to have a process for when tech goes wrong because at some point it will. And for you, just for you as my friend and as my listener, my biggest piece of advice is do not think that any technology is ever infallible. Remember the post office, remember that disaster? The UK post office leadership team thought that their software was infallible and they basically called it Fort Knox, and that led to them wrongfully convicting almost a thousand people and driving one man to suicide. And then it turned out that they literally had an IT glitch in their system. So should they have learned something about technology even though they weren't techies? Yes, they should.

 

00:07:29

If you want to learn more about what happened at the post office, listen to episode 194 lessons from the Post Office scandal. That's an episode of this podcast. And I have also linked to it in the show notes. My point is no technology can ever be perfect because software has to be updated regularly. So that's like why the apps on your phone change without you doing anything. It's because there are new software updates. So even if your current software works wonderfully, it doesn't mean that it will continue working well because it will have to be updated, which is exactly what happened in this IT outage. Basically IT risk is a business risk today and you have to learn to evaluate it and to prepare for it. Lesson number two is before you release anything tested properly. So this is obviously very relevant to people who are actually leading startups or who are in charge of a product team, but if that's not you, this is relevant because you are going to be buying software and you want to know how they test it.

 

00:08:34

So I don't know what happened inside of CrowdStrike, I expect we'll probably find out quite soon, but the fact that they released a massive update and it had an effect on lots and lots and lots of computers around the world shows that they did not test it properly. Testing is a super important part of releasing a new piece of software and you have to test the new software on different devices. So for example, if something works on your laptop, it might not work on mine because they're probably going to be slightly different versions. But testing can get cut if people are basically up against the deadline and are under pressure. And also I found that when there are bug fixes, then people can be a bit dismissive about testing for bug fixes. People do not do this. If you are hiring a software provider, ask them about their testing and quality assurance process.

 

00:09:29

If you're a startup founder, have a proper process. And yes, I have actually had to institute this process at my previous company and it was boring because catching bugs is boring, but you know what's not boring being on the front pages as the CEO who basically brought down the world for a bit. So you don't want that anyway. If you are hiring a software provider, ask them about their testing and quality assurance process. And even if they give you a stock answer, which to be fair, they probably will. You'll look like you know enough to be dangerous and that's a good thing. Okay, lesson number three, before releasing on a big scale release on a small scale after a software has been tested, you might be tempted to release it to everybody or the company you're working with might say, we've just got this new thing and we're releasing it to everybody.

 

00:10:26

This is stupid. If you're in a tiny startup and you've got like 20,000 users, then release it to everybody. Honestly, 20,000 sounds like a lot, but it's not. But listen, if part of the world's infrastructure depends on your software update, then do not release something immediately. Roll out new changes gradually. Perhaps you could roll out a change in one country, preferably not one that is very populated. And then see what happens. See what happens in the smaller population and if it works out well. And then if things go well, roll it out to the rest of the world. And you know, who does this meta meta, the successful big? And you know, sometimes scary company for example, they wanted to see what would happen if the number of likes was hidden on a post. So they tried this feature in Australia first, they didn't roll it out to everybody, they started with Australia and then they decided not to release it in other places.

 

00:11:27

Good for them. This is the way to do it. So again, if you are buying new software, you could say, has it been tested and has it been used by somebody else? What is your rollout process for new software? Lesson number four, engineers should know all of these things that I've taught you here because they are basic, but there is often a disconnect between the engineers and the business side. And sadly, that does actually happen quite often and engineers can sometimes be pushed to release more quickly than they think is safe. Honestly, this is quite a difficult problem to solve because you have to give engineers deadlines. Engineers, they like making stuff, they like making product, but they don't actually like market feedback. Like so they don't actually like releasing their products, they just want to carry on perfecting something. So if you don't give engineers a deadline, they're just going to carry on fiddling with something and you're never going to release anything.

 

00:12:26

And by the way, if you want to see what happens when engineers have no deadline, what a documentary called General Magic on iTunes. The bottom line is if the business side and the tech side do not have a positive relationship and open communication, then bad things happen because there's going to be distrust and there's going to be misunderstanding. And you know, these bad things include national health services and airlines not working. So they're serious bad things, not just people not sitting next to each other at a lunch table. This is why the business side has to learn to speak tech. And equally the tech side needs to explain clearly what the risks are. So just ask yourself today, am I in an organization where the two sides understand each other? Do we respect each other? Do we get on? If not, then start building human relationships because even the relationship between the tech side and the business side is not good.

 

00:13:23

Then basically, how are you going to solve problems together and trust each other? So start working on this human relationship. Maybe ask somebody out to lunch. See, I told you this wasn't gonna be difficult or technical, literally, I'm telling you to take people out to lunch and make friends. It's not that hard. You can totally do it. And finally, lesson number five, do not release new software on a Friday. This is so basic, but it's true. The CrowdStrike release was on a Friday. And look what happened. Do you think the CrowdStrike or Microsoft teams had a particularly nice weekend? Then? Did they get much sleep? Did they see their families? No, they did not. It was a bloody nightmare. And this is going back to my first point. Mistakes happen, expect them to happen, make room to correct them, make time to correct them. Releasing something new on a Friday to a global group of users is literally saying, my work is perfect and there will be no problems.

 

00:14:24

It is the definition of hubris. So if you are in a position to choose a big IT provider, ask them what I just taught you. What is their protocol for when things go wrong? How do they roll out new software updates? And you could even say, I don't want the CrowdStrike disaster or the post office to happen here. How are you going to mitigate that? Now that you know some basics of what to look for, you can see what's a sensible answer and what isn't. And obviously if you are making a big purchasing decision, yes, you probably want to have your CIO or CTO or some sort of technical expert with you. But the key is with you, not instead of you. We're living in a time when software underpins so much of what we do, and you do not have to be the person in charge of making the software to understand the basics of how it works.

 

 00:15:17

And with this lesson today, you've taken a step further to being somebody who can really lead with intelligence in the digital age, because that means understanding how software is made and understanding how to collaborate with your tech

 

colleagues, and most importantly, understanding how to mitigate tech risk. And this is what we've done here. And honestly, well done. I salute you for listening to this because you could have been listening to Taylor Swift. I mean, she's fabulous, but I said you chose to educate yourself. So good on you. You can now go listen to Taylor Swift and if you found this episode insightful, then please leave this show a rating and a review. Honestly, it really, really does help me so much and it helps me reach more smart people like you. So basically more free education to clever people. That's a great thing. And on that note, thank you very much for listening and have a wonderful day. Ciao.

Sign up to our mailing list!

Be the first to hear about offers, classes and events