close
close

Quick action by MSU Denver’s ITS team saves campus from major outage

Quick action by MSU Denver’s ITS team saves campus from major outage

When a faulty software update from a web security company caused computer systems at companies and institutions around the world to crash last week, the Information Technology Services team at Metropolitan State University of Denver worked through the night to fix the problem by 6 a.m. Friday.

“This was one of those situations where the right people were in the right place at the right time and a few people were on their computers,” said Nick Pistentis, the university’s deputy chief information officer. “They saw some alerts and were able to rally the troops, and they did it pretty quickly. We were well into diagnosing and troubleshooting before we even saw any discussion online that this could have global implications.”

The faulty update from CrowdStrike, a leading cybersecurity company, affected computers running Windows software and caused significant disruptions across the Denver metropolitan area starting shortly before 11 p.m. Thursday. The Regional Transportation District’s light rail service was shut down, and the Colorado Division of Motor Vehicles and the Colorado Department of Revenue were also affected by outages.

In addition, flights from Denver International Airport and across the country were canceled, stranding thousands of travelers.

MSU Denver has been a CrowdStrike customer since 2018 and uses the platform as an advanced malware detection tool for Windows servers and critical IT workstations.

Cybersecurity firms frequently release new updates to ensure their software can detect and block new malware threats, he said. “Because malware evolves day by day, hour by hour, the vendor wants to provide the most up-to-date definition tools and detection mechanisms. If they see a new behavior in the wild, they want you to have it local so the tool can find it.”

The university operates about 5,000 computers, and about 40,000 people use its technology regularly, Pistentis said. “Ten percent of all of our own computing power runs on CrowdStrike. These are high-impact systems, so it has an outsized impact.”

When IT team members figured out the extent and cause of the problem in the early hours of Friday, they had to reboot the servers in safe mode, manually remove the faulty software and reboot the devices, mostly using a workaround released by Microsoft and CrowdStrike, he said. About 10% of the devices required additional troubleshooting.

Kevin Taylor, the university’s chief information officer and assistant vice president for ITS, said the ITS team trains for major outages like this one, and on-call managers are empowered to initiate a rapid response if a system crashes.

“We do simulations, we rehearse it and make sure everything is ready. But it’s also great to see that when an incident like this happens, that incident plan came into effect and basically worked according to a playbook that we had already predefined,” he said. “Our team is great, but it’s really great to see them in action in moments like this.”