The Post-Mortem–Cisco Troubleshooting

 Post-mortems are a time set aside to go over a network failure,  the troubleshooting process, and the solution. Post-mortems should be loosely structured and answer just a few questions, including

• What is the root cause? Why are we sure this is the root cause?

• How can we prevent this failure from happening in the future?

• What was the dwell time? How can we reduce the dwell time?

• How does this change the system architecture?

However, the most important point about post-mortems is to focus on fixing the problem rather than the blame. People make mistakes, and devices fail. Learning is more important than running up the blame score.

Applying the Process: An Example

Troubleshooting is best learned through hands-on experience.

Learning the theory is a good start, but real troubleshooting skill comes with good practice. Practice does not make perfect.

Perfect practice makes perfect.

Figure 22-6 illustrates a small network for walking through a sample troubleshooting process.

Figure 22-6 First Stage of Half-Split Process

In Figure 22-6:

• A and D are hosts.

• Host A is connected via Wi-Fi through access point B.

• Host D is wired to switch C.

• Server E is a local network-attached storage (NAS).

• Server E is connected to router F.

• Router F is connected to firewall G.

• Firewall G is connected to MODEM H, which then connects through an access provider to the Internet.

• Servers X and Y are accessible through the Internet.

This first troubleshooting stage begins at router F because this is roughly the “center” of the path between host A and server X.

Begin by pinging host X from router F.

If the ping succeeds, X is reachable from router F, and the problem is likely between host A and router F.

What if F cannot reach X? The problem is on F itself or somewhere to the right of F. If this initial reachability test fails, you could

• Ping server Y. If Y is reachable and X is not, the problem is either with server X or someplace “on the Internet” between F and X.

• Try pinging X and Y from G. If G can ping both servers, the problem is on router F.

• Try pinging X and Y from G. If G cannot ping either server, the problem is on G or between G and X or Y.

For this example, router G can reach server X, so it is time to move someplace else in the network and gather more data.

Because the problem appears between host A and router G, switch C or access point B would be the new center point. In this case, switch C does not have a console of any kind—it is an unmanaged switch—so the next stop will be access point B, as shown in Figure 22-7.

Figure 22-7 Second Stage of Half-Split Process

Ping router F from access point B. If this works, the problem is likely on access point B or host A.

If this ping does not work, the problem is likely on access point B, switch C, or router F.

How can you test the unmanaged switch C? Try pinging router F from host D. If this works, the problem is before C. If host D cannot reach F, the problem is most likely in switch C.

For this example, the ping from access point B to router F works. To ensure the entire path works, you can also ping from B to server X. Given the information gathered thus far, this should work. What if it does not work? The problem is most likely at router F.

Since the problem seems to be at host A or access point B, go back to each network point from host A, as shown in Figure 22- 8.

Figure 22-8 Third Stage of Half-Split Process

Assume the following for this third stage:

• Host A still cannot ping server X.

• Host A cannot ping firewall G.

• Host A can ping router F.

The information here seems to indicate router F is the problem, but we already know, from previous testing, that access point B can reach router F and even server X through router F.

It is time to orient to the problem and brainstorm a little by asking two questions:

• What kind of problem would cause router F to reach A, forward packets from access point B, and yet not forward packets from A toward server X?

• What kind of problem would cause host A to be able to reach router F but not be able to send packets through F to server X?

The answer to the first question is packet filters. The answer to the second question is host A’s default gateway is not set correctly. How could we tell the difference between these two problems?

The simplest way to tell the difference is to use one of ping’s extended features. At router F, you can ping host A using F’s address on its link to firewall G.

If this fails, host A can reach devices on the same segment, like access point B and host D, but it cannot reach off-segment destinations like firewall G and router F’s interface with G.

Because host A uses its default gateway to reach off-segment destinations, the symptoms match a problem with this setting.

You can validate this conclusion by checking for packet filter at router F.

Figure 22-9 illustrates the half-split troubleshooting process used in this example.

Figure 22-9 Half-Split Troubleshooting Example

The half-split method produces a tree of measurements, actions, and possibilities. Drawing a chart like this on paper or a whiteboard while troubleshooting a problem can help you remember what you have already tried, branches you did not take, etc.

Leave a Reply

Your email address will not be published. Required fields are marked *