What Insurers Can Learn from the Microsoft Teams Outage and the Iowa Caucuses App Incident

As we move toward automation based on technology solutions, it seems like we run into these problems with software development and deployment on a regular basis, so it’s always good to view them as both a reminder and a warning.

(Photo credit: Minnesota Historical Society.)

This was as difficult week for technology. First, users of Microsoft Teams experienced difficulty and unusual error messages as they tried to log in on Monday. Then, Tuesday evening, the results of the Iowa caucuses were delayed by hours due to problems with a new reporting app. And it’s only Wednesday as I write this! As Douglas Adams, author of The Hitchhiker’s Guide to the Galaxy, once said, “We are stuck with technology when what we really want is just stuff that works.”

These incidents point out not only our reliance on technology, but also the need to have proper procedures in place to ensure that our technology is “stuff that works.”

Small Details Matter

The issue for Microsoft Teams was the expiration of an SSL certificate, used for ensuring that browsers properly recognize secure web servers to allow encrypted connections. Certificates need to be renewed periodically to validate site ownership. Without a valid SSL certificate, users have difficulty logging on and using encrypted connections, as about 20 million Microsoft Teams users discovered.

Renewing an SSL certificate isn’t hard—it just takes getting validation by a recognized certification authority (CA). In fact, Microsoft runs its own CA service. The real issue is that someone needs to be paying attention and ensure that the task is completed before certificates expire. In most organizations, this is an IT responsibility, mainly because most IT groups are responsible for network and web server maintenance. As Microsoft Teams users were reminded, technology relies on an organization taking responsibility for maintaining it. In these days of SaaS solutions, apps, and digital voice assistants, users expect “stuff that works,” but someone needs to make sure that it does.

Ignoring Best Practices

In Iowa, voters follow a process that dates back to the early days of statehood: the Iowa caucuses, where neighbors gather at a polling site and stand with other supporters of their candidate to be counted and assign delegates to the national convention. This year, to speed up and enhance the reporting process, a software development company called Shadow was hired to create an app. It was supposed to collect this data in a centralized server and then report it digitally—presumably faster than could be done via paper and phone calls. However, the app was developed with much secrecy, following the outdated and mistaken philosophy of “security by obscurity,” which led to a lack of testing and quality assurance review and ultimately to a failure of the software.

Adding to the issue, users had difficulty loading the app onto their phones: It had to be “side-loaded” from the company’s servers rather than installed using the App Store or Google Play Store, a process that can be confusing and unnecessarily difficult. New reports on email trails revealed that users had difficulty installing and using the program, many giving up well before the caucus meetings started. The night of the caucuses, the app failed. The backup plan? Use paper and phone calls. Fortunately, paper ballots and reports were still used and were eventually counted, with partial results made available (full results were still pending as of writing this).

As in the Microsoft incident, the Iowa app issue was largely due to not paying attention to detail and not following sound application development practices. The app wasn’t properly tested for quality assurance, and the design didn’t take usability into account, leaving some users to rely on backup plans that weren’t adequately resourced.

Making Technology Into Stuff That Works

A few quick lessons learned from these events:

  • Little details can lead to big problems: IT needs to maintain good practices and record keeping on configuration, certificates, and license agreements.
  • Shortcuts can lead to time-consuming fixes: Maintaining best practices around testing and quality assurance are critical to avoiding delays and frustration when applications go into production.
  • Be up front with backup plans: Taking time to think through backup plans that will handle the load in case of failure is important, but it is often overlooked as deadlines approach and work needs to be completed.

One of the results of the incidents for Microsoft and Shadow is that their reputations have taken a hit. Teams competitor Slack will get some attention, but Microsoft’s long reputation for delivering solid desktop applications will minimize any long-term effects. Shadow, however, will have a difficult time overcoming the reputation of having rushed an application to use in the critical area of elections. As we move toward automation based on technology solutions, it seems like we run into these problems with software development and deployment on a regular basis, so it’s always good to view them as both a reminder and a warning.

10 IT Security Planning Musts from Novarica

Tom Benton // Tom Benton is a principal in the Insurance practice at Novarica with expertise in IT strategy, business process reengineering, core systems implementation and project management, primarily for life and annuities. Prior to joining Novarica, he served as VP, Technology and Systems at Navy Mutual, where he led a multi-year strategy to transform the core systems, which included a rapid-deployment Policy Administration System implementation. Tom has broad experience as a senior IT executive, serving as CIO/CTO at two medium-sized non-profits in the Washington, DC area, and has held positions in IT project management at PG&E and General Electric. Tom holds a BS degree from Cornell University and a MS degree from MIT. He can be reached at tbenton@novarica.com.  

Leave a Comment