Sunday, July 12, 2009

Weird IPMP Output

In Solaris, there is this very nice feature called IPMP (IP Multipathing).

It is able to failover the IP address from one network card to the other when the primary fails. From the end-user point of view, it's transparent. Services continue and there's no downtime. This is nice!

IPMP is mandatory when Sun Cluster is configured.

Anyway, we were conducting UAT few days back and I observed this very weird output when both network cables are unplugged from the 2 network cards. We thought the IPMP was not configured properly. In fact, nothing was wrong.

In normal operational situation, you'll see the following output:


Now, we unplug the cable from e1000g0 interface. The output is still correct:
(Notice that the IP address 10.50.129.81 from e1000g0 has failover to bge0:1)



Let's proceed to unplug the cable from bge0 interface. Now, the output is misleading:



Why is bge0 and bge0:1 still showing UP? On the same machine, we can even perform a ping to 10.50.129.81/10.50.129.90 and they are still showing alive.

Very strange indeed. We were puzzled. 

It's only some time later then we realized we did not take a detailed look at the output:


Although the UP flag is there, there is a FAILED flag behind which implies that the interface is indeed down.

We switched our test: bring bge0 down; then bring e1000g0 down. This time round, e1000g0 is showing the UP + FAILED flag. It's always the second interface to be brought in the IPMP group that will show this UP + FAILED flag.

Misleading indeed ...



No comments:

Post a Comment