UnixPedia : HPUX / LINUX / SOLARIS: June 2017

Monday, June 12, 2017

“No route to host” errors is the situation"

The most likely cause of intermittent “No route to host” errors is the situation described in Red Hat articlehttps://access.redhat.com/solutions/1120533.  What happens is:
 
  1. The target server's MAC address is not in the client’s ARP cache.
  2. The client sends an ARP request.
  3. The ARP request times out and an EHOSTUNREACH error is returned to the client application, which reports “No route to host”.

The default ARP request timeout is 3 seconds.  This is controlled by the parameters mcast_solicit and retrans_time_ms in /proc/sys/net/ipv4/neigh/<nic>, which have the following default values:

$ cat /proc/sys/net/ipv4/neigh/eth0/{mcast_solicit,retrans_time_ms}
3
1000

This sends 3 ARP requests, each with a timeout of 1000 msec, hence the total ARP request timeout is the product of these (3000 msec or 3 seconds).  Your network team may be able to measure how long the ARP requests are taking, to verify whether this is actually the problem.  If so, the key question becomes where the delay is coming from.  The source of the delay could be in the network infrastructure or possibly at the hypervisor host level, since this system is a VMware guest.

A possible workaround is to increase the ARP timeout by increasing the mcast_solicit value on each client that accesses this server, and for each client NIC that connects to the server.  For example, to change the timeout to 10 seconds on a client's eth0, you would increase mcast_solicit to 10:

# sysctl -w net.ipv4.neigh.eth0.mcast_solicit=10

To make this permanent, you would also need to add the above parameter to the client's /etc/sysctl.conf.

Also, to answer your question below:

Moreover below is the findings from network team.

From the investigation, we’ve noticed that;
·         Only the first packet gets dropped during the course of the communication between the source & the destination – This is expected because, the arp cache has been cleared due to inactivity/idle session between the source and the destination for more than 5 minutes
·         There are no packet drops noticed between the source and the destination once the connection is established – We confirmed this by running a ping test between the source and the destination for more than 45 minutes and have seen only 1 packet loss(Again, this would be the first packet)

Can you please confirm if the above lines are correct regarding the first packet drop. If yes then please let us know how we can overcome this problem?

The network team is correct.  When there is no entry in the ARP cache, the outgoing packet that triggers the ARP request is dropped unconditionally.  This is not a problem because IP will retransmit the dropped packet.