Anyone in the hosting business knows that websites go down. Sure, there are ways to keep them from going down for the end user, but just one wrong line of code or a power outage can make your system dead. Here are some tips to handle it right:
- Post a notice somewhere (Notify)
When a major job search affiliate program goes out, for more than 2 hours they didn't talk with anyone outside the company. After banging on social media and message boards, it seemed most other affiliates didn't notice (but were also dead in the water.)
Even left a message with a product director. Only after they were up for 4 hours do I get a call. In the process I had posted on their message board twice, all it would have taken is a simple notice somewhere.
- Assign one person for communications (Resources)
You need to communicate with your vendors, between projects, and teams, and this needs to be coordinated. Choose someone who will have a handle on the whole situation. In client environment a team manager would step in and be liaison between staff solving the problem.
If someone came in to talk specifically with someone who was doing work, this manager would step in and move the conversation away from the action. This keeps a technician from sharing information that isn't verified against troubleshooting in other areas, it also lets people solving the problem to focus on the fixing the problem.
- Acknowledge the issue, not the problem (Communicate)
It's fine to say you have an issue, but it's not necessary to specifically describe the problem. This helps focus on the issues at hand while providing some damage control. Don't say, “We think it's a database issue”, unless you've specifically “fixed a database issue.”
This goes for both internal and external communications, misinformation delays resolution. The problem you find early in troubleshooting may not be the problem that caused the problem.
- Avoid fixing known problems during outage (Prioritize)
This is known as cramming, tossing in a number of known problems while troubleshooting. Rather than saving you time, this tendency confuses the core issues that has caused the outage in the first place.
Too many environments do this because they don't want to schedule an outage later. You can fix those problems after systems are back-online, but those outages need to be schedule separate of the problem at hand. Don't dilute your focus when everything hits the fan.
Because I run a small on-call team monitoring client sites, it only takes a single outage before someone gets called out. You can avoid all this with a simple easy to find status page.
Experience says, “while nobody wants an outage, they will happen.” Your website will go down, even the best fail, but it doesn't have to be a catastrophe with advanced planning. What's your plan?
© 2010 B2B Website Profits, All rights reserved.
Justin Hitt helps b2b technical services firms retain strong customer relationships that translate into stable profits. For web management support and services on this topic, visit https://www.jwhco.com/