Failure Reports–Cisco Managing Networks
Every network failure is a chance to learn about how the network works, unanticipated hardware and software problems, etc. Those who do not document these things, however, are doomed to repeat the same troubleshooting steps and outages repeatedly.
Every outage should be documented, including the
• Symptoms: Documenting the symptoms as accurately as possible can make finding the cause of similar failures in the future much easier.
• Root cause: The hardware or software failure that caused the outage should be described as accurately as possible.
• Temporary fix: If you deployed a temporary fix, what was the fix, how did you think it would solve the problem until you could put a permanent fix in place, and how did it work?
• Permanent fix: What fix did you use to fix the problem permanently?
Failure reports are a gold mine of information while troubleshooting a failure, of course, but they are also very useful when planning changes to the network or designing support for a new application, location, etc. Reports should be contained in a well-designed ticketing system.
Hardware and Software
Have you ever looked inside the front cover of a telephone book (if you can find one anyplace other than an antique store)?
Among the items listed is a number to call if your telephone is not working. The telephone book does not, however, explain how you are supposed to call the telephone company to report an outage if your telephone is not working.
Telephone books might seem like a humorous throwback to the past, but the problem of communicating is still genuine when the communication system fails. The modern world is always online. We assume vendor documents, troubleshooting tips, and help will always be just a few clicks and an online search away.
But what if you are trying to fix the network because it is down?
If the network is down, you might not be able to reach the Internet. If you rely on Internet resources to help you fix your network, you might be stuck.
Network documentation should include local copies of the user manuals for critical hardware and software. You should always have the information you need on hand, regardless of the network’s state, to build at least some basic connectivity.
Network Processes and Lifecycle
Networks, individual network devices, and software all have a lifecycle—just like everything else in the world. Figure 21-1 illustrates a typical view of a network lifecycle.
Figure 21-1 Typical Network Lifecycle
In Figure 21-1, gather requirements includes
• Discover all existing application requirements.
• Discover all existing business requirements.
• Discover all future applications that might run on the network and their requirements.
• Discover all future business plans and how these plans place requirements on the network.
• Discover existing operational problems.
Once you have discovered and validated these requirements, you can design the network. Design includes determining or deciding
• The best hardware and software combination to meet the requirements.
• The best way to connect all the pieces to make a network.
• The best way to configure each router, switch, or other network equipment.
• The best way to measure baselines and build documentation.
Once the network is designed, you can build tests, test, deploy, and then operate it.
The design process sounds cut and dry—and perhaps a little simple—from the outside. Real life and real networks tend to be a lot messier.
How do you gather all the requirements? If you tried to gather all the requirements, you would probably find the requirements change before you can gather them all. How do you nail down every possible side effect for every design decision?
These things are impossible in the real world. Instead of following a clean process, most network engineering teams follow a more piecemeal approach. Each piece of hardware, each link, each module, and the network itself all have lifecycles. These lifecycles act like wheels within wheels and interact with one another in often unexpected ways. Figure 21-2 illustrates.
Figure 21-2 Module Lifecycles
In Figure 21-2, three modules are shown. Suppose Module 3 is a physical router or switch, Module 2 is a set of routers in a single rack or location, and Module 1 is the entire network.
Each module’s lifecycle is different, with its own requirements, design, testing, and operation timelines. Each lifecycle’s beginning, shown with the arrowheads, starts and ends at a different time. “Bridges” show interaction surfaces where one module interacts with another.
Engineers need to keep track of this hierarchy of lifecycles to build a maintainable network. The lifecycle of each component, each module, and the network’s lifecycle need to be tracked, planned, and managed.
Further, there is no single point in time when a design is “done.”
Instead, networks must be modified over time to fit new applications and purposes. Because the network’s environment changes constantly and there is no way to gather every possible requirement accurately, engineers often find it better to build flexible, extensible networks able to support many applications.