The Half-Split Method–Cisco Troubleshooting

When faced with a problem, most engineers will start with the following:

• The most recent change

• Where most failures have happened in the past

• What the engineer is most familiar with

Starting with the most recent change makes sense as an initial strategy because most network failures occur because of these changes. Likewise, it often makes sense to start where failures have been common.

The third option—start with what you are familiar with—is a bit more problematic. Starting with the most familiar things in the network will allow you to rule those things out quickly.

However, starting with the most familiar things can also be like looking for your keys under the streetlight rather than where you dropped them: you might spend hours looking for something that is not there.

While starting with the most recent change or common failures allows you to cover a lot of ground quickly, following them past some initial checks can be counterproductive. You should move to a more formal troubleshooting method after checking recent changes and common failures.

The half-split is the most effective troubleshooting method available—developed through long experience in electronics.

Figure 22-4 illustrates the half-split troubleshooting method.

Figure 22-4 Half-Split Troubleshooting Method

The half-split method is related to the Observe, Orient, Decide, Act (OODA) loop used in military, defense, and general security settings. The OODA and half-split loops give you specific questions to focus on at each point, countering the tendency to wander all over the place when reacting to high-pressure situations.

The half-split method is an intentional troubleshooting method.

The half-split method organizes work to split the network into successively smaller pieces. Half-split has two stages: measure and split.

In the measure stage, you begin by orienting, asking questions about how things are and what they should look like, including

• What does normal look like here?

• What model can I use to understand what is happening?

• What explanations can I come up with for the difference between what is currently happening and what should be happening?

Orienting means consolidating the available information.

Measuring includes observing the network and asking questions like

• What can I measure to eliminate or indicate one (or more) of the possible explanations for this problem?

• How do I measure it?

• How might measuring it change the system’s operation?

In the split stage, you decide where to go in the system or what to do next. Splitting can mean

• Moving closer to the source or destination (left or right)

• Moving into or out of a module (up or down)

Left, right, up, and down are abstract concepts; Figure 22-5 illustrates these four directions.

Figure 22-5 Half-Split Directions

When you’ve moved beyond looking at the obvious solutions and into using the half-split method to find a problem, you

• Select a source: The source might be a host, server, application, or anything else in the network, but it should ideally be the source of the packets.

• Select a destination: The destination might be a host, server, application, or anything else in the network, but it should ideally be where the packets are processed.

• Choose a point about halfway between the source and destination: Choosing the halfway point is not always easy, but the starting point doesn’t have to be precisely in the middle of the path between the source and destination.

• Run the loop: Orient yourself to the point in the network, make observations, and then decide what to do next—move left, right, up, or down, or solve the problem.

The half-split method is easier to understand when the theory is paired with an example, like the one in a later section of this chapter.

Fixes

Once you find the problem, you can fix it. but not all fixes are equal.

A permanent fix is one you intend to leave in the network permanently, or at least until the network changes for some other reason. Permanent fixes either

• Bring the network back to the state before the failure.

• Are well-documented, decrease technical debt, and fit within the overall network plan and architecture.

A temporary fix is one you use until a better solution can be designed, tested, and deployed. Temporary fixes often increase technical debt and represent “one-off” solutions that cannot or should not be repeated elsewhere in the network.

Leave a Reply

Your email address will not be published. Required fields are marked *