News Major tech outage gripping the world

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exterous

Super Moderator
Jun 20, 2006
20,461
3,582
126
Yesterday was a fun day. This might have been my favorite scenario:
Lots of affected PCs
They all have bitlocker turned on
Bitlocker self service portal and admin server on VMWare is down because Crowdstrike
VMWare environment requires daily rotating admin credentials to log into (They needed access to mount the affected drive on a good server and delete the file)
Can't get the rotated passwords because the on-prem password manager prod server is down because Crowdstrike
Can't restore the on-prem password manager because the backup server also has Crowdstrike on it
Can't use the password HA server in the cloud because Crowdstrike
Fortunately the cloud leverages that vendors nightly backup functionality which were intentionally scheduled to occur right after password rotation.

On the positive side for them the password manager solution was intentionally designed with multiple service/platform failures in mind. Not with this specific scenario in mind but because its the most critical of their IT assets.

So once they got the HA copy up the VMWare admins could finally log in and fix the Bitlocker server. Then the desktop support techs could finally get to work (Yes there were workarounds but with the scale of the issue it was tough to make a dent without the Bitlocker server)
For the IT uninitiated what does Crowdstrike do?
I don't think the anti-virus description of these types of tools is the best analogy. It brings up memories of Norton or McAfee and there are a couple of big differences. This particular product of theirs is an "EDR" or Endpoint detection and response. It doesn't scan the computer (and take up tons of resources to do so) but monitors and blocks malicious activity (or what looks like malicious activity). So way more efficient and doesn't bog down the computer. Another big difference IT Security people need to be aware of is that most of it's settings will leave the offending file in place and it doesn't go after PUPs or PUAs. So I've seen places pair it with something like Malwarebytes - esp if the have a large Mac fleet as having a Mac fleet without giving everyone admin rights is unnecessarily hard.

TBH, aside from today, their sensor is pretty dang good. A lot of the installs I've seen have been in organizations with a broad range of scientific research efforts and its played well with 99.999% of rando compiled software and niche applications for research without issue. HPC? Fine. GIS? Fine. fMRI? Fine. Completely custom cluster and software? Fine.
 
Last edited:
Reactions: Paratus and RnR_au

MrSquished

Lifer
Jan 14, 2013
22,888
20,970
136
Hopefully this is a sign that the simulation is about to reboot, because of getting scary
 

Exterous

Super Moderator
Jun 20, 2006
20,461
3,582
126
It was a bad configuration or line of code, yes. The resolution is to delete a specific update file that CS pushed, at which point the system can boot without checking whatever that config does and bluescreening the system. The lack of testing protocol is ... Concerning. I don't know what exactly they're doing at CS but given that this affected literally every Windows system without remorse, I have a hard time believing they tested a damned thing.
Yeah this would have been caught in testing. "Hmmm - every PC we test this on breaks. We should probably take another look at what we did."

I don't know their processes but this feels like someone pushed it to the wrong repo, accidentally selected prod instead of test. That kind of thing
 

outriding

Diamond Member
Feb 20, 2002
3,330
2,505
136
On the news it said a bad line of code. Don't they test this stuff before turning it loose? I always tested the updates I wrote, to my satisfaction. You try to break it, etc. Somebody's in hot water if not already fired.

Not a coder here…


But it could have been a glitch in the code checkout system (or whatever it is called) and they grabbed the wrong version or etc. or someone else saved a version when they did not.

My guess it is a bad save and lack of communication and or controls
 

MrSquished

Lifer
Jan 14, 2013
22,888
20,970
136
How many cloudflare employees are now updating their resumes on LinkedIn but also hiding their involvement anywhere near this fiasco
 

Lanyap

Elite Member
Dec 23, 2000
8,176
2,215
136
I feel bad for you support folks in IT. I would be in the thick of things if I wasn't retired.
According to one of my tech support newsletters it was a bad antivirus definition file.
This will be a major lesson learned for everybody around the world.

Bad antivirus definition triggers shutdowns

By Susan BradleyComment about this alert

It was a really bad day for IT admins.

Late Thursday night, the security protection company CrowdStrike sent a bad antivirus definition file to its entire customer base. Because this faulty data file inserts itself into the Windows kernel, Windows does what it was designed to do — it goes directly to the blue screen of death (BSOD).

Most of us can rest easy. CrowdStrike is not a product for the consumer or for a very small business. It’s an enterprise product, and thus its impact was widely seen in very large companies, triggering service interruptions for airlines, banks, healthcare providers — worldwide.

What is it?
  • It is a faulty antivirus definition file released from a third-party vendor that triggered BSODs on Windows computers.
  • It is impacting all customers with CrowdStrike software installed.
  • It is not a Windows bug. Microsoft did not trigger this problem.
  • It’s hard to see how this could be caused by anything other than a complete lack of testing.
  • It does not affect Linux or macOS.
Because this is not a Microsoft or Windows problem, I am not changing the MS-DEFCON level.

How widespread was this?

Figure 1. A checkout station at Home Depot in Recovery modeSource: @ErrataRob. Used with permission.

If it’s at your local Home Depot, it’s safe to assume the problem is extremely widespread, much more so than the mainstream reporting that focused on airlines and banks.

Consumers
I am not aware of any disruption to anyone using consumer antivirus software, including Microsoft Defender. That personal device you’re using at this very minute is safe. But consumers are affected because of the systems they use every day — online banking, airline websites, and much more. Flight disruptions like the one on Thursday have not been seen since 9/11.

Businesses
If you are a business using CrowdStrike, you already know you’re in for a bad weekend. You’ll be dealing with BSODs on many remote computers including workstations, point-of-sale systems, PCs for remote workers, and many other remote devices. These are systems that your IT team can’t get to easily, especially because they are dealing with an endless BSOD and thus can’t enable remote access. For on-premises systems, you’ll simply need time to touch each of them to eliminate the problem.

Give your support teams as many extra resources as you can. Look for outside help if needed.

If you are a remote worker and have been affected, contact your IT help desk for guidance. I know of one instance in which a firm has already overnighted replacement laptops because that’s faster and easier than sending a support tech to each remote site.

Based upon information provided by CrowdStrike, these are the required remediation steps:

  • Start Windows in Safe Mode or the Windows Recovery Environment.
  • Navigate to the C:\Windows\System32\drivers\CrowdStrike directory.
  • Locate the file matching “C-00000291*.sys” and delete it.
  • Restart the device.
Note that recovery on some systems may require a BitLocker key.

There is currently a great group on Reddit providing breaking information.

Microsoft is providing resources for Microsoft 365 users. You’ll find them once you’ve logged in to your account as an administrator.

BitLocker
BitLocker once again rears its head as a potential obstacle to recovery. I’ve discussed this many times before, and the “key” is having a record of all the BitLocker recovery keys for all the systems you manage. However, X poster @LetheForgot posted the following:

What we did was use the advanced restart options to launch the command prompt, skip the bitlocker key ask which then brought us to drive X and ran bcdedit /set {default} safeboot minimal, which let us boot into safemode and delete the sys file causing the bsod. Don’t forget to renable normal booting afterwards by doing the same but running bcdedit /deletevalue {default} safeboot.

Another poster followed up with:

Even in safe mode, crowdstrike folder access was denied. Used cacls to give more rights to user (bypassing admin) and deleted file.

I’m learning a lot about how to work around some of the BitLocker issues we’ve seen, even in the consumer space. I’ll be keeping an eye on this breaking story and providing updates in the following weeks.
 

kage69

Lifer
Jul 17, 2003
28,580
39,866
136
I really feel for people dealing with a massive number of things locked down with bit locker. Yikes. Makes me think about how I've run into people before who had it running and had no idea. They didn't hear about it being a default on new stuff. What a way to find out huh? Oof.

"It's asking for a recovery key. What the hell is that?"

*distant, sad trombone*
 
Last edited:
Reactions: [DHT]Osiris

quikah

Diamond Member
Apr 7, 2003
4,096
668
126
Seems like Crowdstrike has a serious issue with processes.

Broke Redhat in June https://access.redhat.com/solutions/7068083
Broke Debian & Rocky in April https://www.neowin.net/news/crowdstrike-broke-debian-and-rocky-linux-months-ago-but-no-one-noticed/

Be curious to see what their company has been doing for the past year. Outsourcing? RTO pushing their senior devs out? Layoffs? Something is seriously broken at that company. This screw up is going to cost them, the scale of it is staggering.
 
Reactions: RnR_au

[DHT]Osiris

Lifer
Dec 15, 2015
15,145
13,370
146
I really feel for people dealing with a massive number of things locked down with bit locker. Yikes. Makes me think about how I've run into people before who had it running and had no idea. They didn't hear about it being a default on new stuff. What a way to find out huh? Oof.

"It's asking for a recovery key. What the hell is that?"

*distant, sad trombone*
Yup, even escrowed twice you can run into political or administrative walls getting to the data. Self service portal on the blink? Oh get it from AD, oh that's locked down to the domain admin accounts and they're too busy dealing with the server infrastructure and VIP keys to deal with your thousand clients? Sad trombone indeed.
 

linkgoron

Platinum Member
Mar 9, 2005
2,392
962
136
Can't believe that their stock took such a small hit. I assume that they're going to be sued to oblivion.

Bugs happen, and I don't really blame the developer that wrote it, nor do I blame that the software isn't built in a way that tries to update before starting. The most glaring issue is that a "foundational" piece of software clearly has a totally broken release process. Clearly, there was no manual or automated testing for the actual real released product, nor any testing post-release. Another really glaring issue is that their release is not gradual at all, and is just 100% straight away. I don't see how such a core piece of software gets released in such a way.
 
Reactions: pmv

FelixDeCat

Lifer
Aug 4, 2000
29,530
2,213
126
Can't believe that their stock took such a small hit. I assume that they're going to be sued to oblivion.

Bugs happen, and I don't really blame the developer that wrote it, nor do I blame that the software isn't built in a way that tries to update before starting. The most glaring issue is that a "foundational" piece of software clearly has a totally broken release process. Clearly, there was no manual or automated testing for the actual real released product, nor any testing post-release. Another really glaring issue is that their release is not gradual at all, and is just 100% straight away. I don't see how such a core piece of software gets released in such a way.

Imaging all the air travelers, hospital patients, banking customers, etc., who were all inconvenienced or worse. Those whose travel plans were scrapped, surgeries cancelled or otherwise suffered losses have a cause of action. I have a had a surgery cancelled before, it is absolutely no fun at all. In my case I prepped for surgery (no food, no water hours in advance and special scrub applied the day before) only to have it cancelled at the last second. It meant more suffering until it could be done.

I would be surprised not to see a class action come forward from a crowd of lawyers are already holding conference calls, getting ready to strike.

Users of the "software" are mostly limited to refunds only for what they paid however innocent victims did not sign up for this clause in a software use agreement and are free to present their claims in court.

Companies around the world are realizing it is dangerous to rely solely one one provider and will likely seek to diversify at a minimum.
 

linkgoron

Platinum Member
Mar 9, 2005
2,392
962
136
Companies around the world are realizing it is dangerous to rely solely one one provider and will likely seek to diversify at a minimum.
I don't think that this will happen. It's not like people are really diversifying operating systems, or phone OSs. I think that "convergence" is a human nature thing, and also having IT somehow diversify protection mechanisms internally seems like a can of worms.

People use what they know, and the CISOs or whatever needed to tick some checkbox in whatever compliance and just used CrowdStrike. Now they'll use some other solution from SentinelOne or some other competitor, or ask their friends from other companies what they're using, and use that.
 

outriding

Diamond Member
Feb 20, 2002
3,330
2,505
136
Imaging all the air travelers, hospital patients, banking customers, etc., who were all inconvenienced or worse. Those whose travel plans were scrapped, surgeries cancelled or otherwise suffered losses have a cause of action. I have a had a surgery cancelled before, it is absolutely no fun at all. In my case I prepped for surgery (no food, no water hours in advance and special scrub applied the day before) only to have it cancelled at the last second. It meant more suffering until it could be done.

I would be surprised not to see a class action come forward from a crowd of lawyers are already holding conference calls, getting ready to strike.

Users of the "software" are mostly limited to refunds only for what they paid however innocent victims did not sign up for this clause in a software use agreement and are free to present their claims in court.

Companies around the world are realizing it is dangerous to rely solely one one provider and will likely seek to diversify at a minimum.

The companies have to have controls put in place if the IT system goes down


There are many reasons why an IT system can be down… storm .. fiber being cut and etc

Any reasonably prepared company can handle such handle such an event
 

sdifox

No Lifer
Sep 30, 2005
96,743
16,098
126
The companies have to have controls put in place if the IT system goes down


There are many reasons why an IT system can be down… storm .. fiber being cut and etc

Any reasonably prepared company can handle such handle such an event
Good luck with the SaaS model everyone bought into.
 

[DHT]Osiris

Lifer
Dec 15, 2015
15,145
13,370
146
The companies have to have controls put in place if the IT system goes down


There are many reasons why an IT system can be down… storm .. fiber being cut and etc

Any reasonably prepared company can handle such handle such an event
He's just shorting the stock and trying to spread FUD to get the price to drop.
 

DaaQ

Golden Member
Dec 8, 2018
1,433
1,034
136
We could use a good solid dump of a class X flare. Not like world ending but just enough to bring down the Internet and power for like a week. Let people be reminded why they should work together.
Yea solar maximum is anytime now. Generally in March and October we have solar issues in telecommunications/video services every year, not huge widespread issues, but some channels may experience some black screen or error correction issues.

But I know this specific instance is MS related.
 

FelixDeCat

Lifer
Aug 4, 2000
29,530
2,213
126
He's just shorting the stock and trying to spread FUD to get the price to drop.
Wrong. I am not shoring the stock nor do I care what happens next in that regard. But since you brought it up, my guess is that the price will stabilize somewhat until the next shoe drops. It could be seen as a buying opportunity but I sure would not touch this one as the P/E is priced to perfection.

I give you Boeing as an example of bad publicity related to "screw ups"... the first dips were bought but the full financial consequences have not been realistically quantified at this time so there is no way to tell until then. Those that buy now may be rewarded as the headlines change but to say for sure this "much ado about nothing" is a bit premature IMO.
 

Exterous

Super Moderator
Jun 20, 2006
20,461
3,582
126
Good luck with the SaaS model everyone bought into.
I had an argument with a senior director in charge of a project rolling out their password management solution. He wanted SaaS only with no on-prem option. They were going to store their admin passwords in this and had a large physical server footprint. He saw no risk in going full on SaaS only. Finally told him "you brought me in for my recommendation and advice. My recommendation and advise is you need at least an on-prem fall back option". He very begrudgingly decided to do that.

9 months later they had a major outage of both hardware and internet access. I wanted to send him a card that said "You're welcome."
 

sdifox

No Lifer
Sep 30, 2005
96,743
16,098
126
I had an argument with a senior director in charge of a project rolling out their password management solution. He wanted SaaS only with no on-prem option. They were going to store their admin passwords in this and had a large physical server footprint. He saw no risk in going full on SaaS only. Finally told him "you brought me in for my recommendation and advice. My recommendation and advise is you need at least an on-prem fall back option". He very begrudgingly decided to do that.

9 months later they had a major outage of both hardware and internet access. I wanted to send him a card that said "You're welcome."
Send it on a laminated postit note
 
Last edited:

[DHT]Osiris

Lifer
Dec 15, 2015
15,145
13,370
146
I had an argument with a senior director in charge of a project rolling out their password management solution. He wanted SaaS only with no on-prem option. They were going to store their admin passwords in this and had a large physical server footprint. He saw no risk in going full on SaaS only. Finally told him "you brought me in for my recommendation and advice. My recommendation and advise is you need at least an on-prem fall back option". He very begrudgingly decided to do that.

9 months later they had a major outage of both hardware and internet access. I wanted to send him a card that said "You're welcome."
Yup, most of our campus uses a saas solution for passwords, with no official on-prem solution. They've had more outages than I can remember. We use keepass on a simple Windows server. Zero issues.
 
sale-70-410-exam    | Exam-200-125-pdf    | we-sale-70-410-exam    | hot-sale-70-410-exam    | Latest-exam-700-603-Dumps    | Dumps-98-363-exams-date    | Certs-200-125-date    | Dumps-300-075-exams-date    | hot-sale-book-C8010-726-book    | Hot-Sale-200-310-Exam    | Exam-Description-200-310-dumps?    | hot-sale-book-200-125-book    | Latest-Updated-300-209-Exam    | Dumps-210-260-exams-date    | Download-200-125-Exam-PDF    | Exam-Description-300-101-dumps    | Certs-300-101-date    | Hot-Sale-300-075-Exam    | Latest-exam-200-125-Dumps    | Exam-Description-200-125-dumps    | Latest-Updated-300-075-Exam    | hot-sale-book-210-260-book    | Dumps-200-901-exams-date    | Certs-200-901-date    | Latest-exam-1Z0-062-Dumps    | Hot-Sale-1Z0-062-Exam    | Certs-CSSLP-date    | 100%-Pass-70-383-Exams    | Latest-JN0-360-real-exam-questions    | 100%-Pass-4A0-100-Real-Exam-Questions    | Dumps-300-135-exams-date    | Passed-200-105-Tech-Exams    | Latest-Updated-200-310-Exam    | Download-300-070-Exam-PDF    | Hot-Sale-JN0-360-Exam    | 100%-Pass-JN0-360-Exams    | 100%-Pass-JN0-360-Real-Exam-Questions    | Dumps-JN0-360-exams-date    | Exam-Description-1Z0-876-dumps    | Latest-exam-1Z0-876-Dumps    | Dumps-HPE0-Y53-exams-date    | 2017-Latest-HPE0-Y53-Exam    | 100%-Pass-HPE0-Y53-Real-Exam-Questions    | Pass-4A0-100-Exam    | Latest-4A0-100-Questions    | Dumps-98-365-exams-date    | 2017-Latest-98-365-Exam    | 100%-Pass-VCS-254-Exams    | 2017-Latest-VCS-273-Exam    | Dumps-200-355-exams-date    | 2017-Latest-300-320-Exam    | Pass-300-101-Exam    | 100%-Pass-300-115-Exams    |
http://www.portvapes.co.uk/    | http://www.portvapes.co.uk/    |