Recently i noticed an alarm issue with our Steelhead fleet. At random times the Steelhead appliances were losing trust relationship with the Windows Domain they are members of. This can be really bad, when the main reason your have a product like Riverbed Steelhead is cache chatty protocols such as Microsoft CIFS/SMB. A further look into the logs on the Steelhead appliances – we have one at every site i was seeing the following log entry:

Jul 14 10:31:36 rb-flee sport[9207]: [domain_auth/trusted_domains.WARN] – {- -} Failed to update trusted domains : Failed to communicate with winbindd 

 

Further entries in the logs reported that CIFS and SMB optimizations will be disabled whilst the trust with the domain is down. This is very bad losing all application layer optimization for pretty much anything file transfer related as like many organisations we use Active Directory.

So the troubleshoot this issue, the first thing i did was make sure the steelheads could hit up the site local domain controller for kerberos authentication. I wanted to make sure there was no layer 2 or layer 3 connectivity issue. To do this you can issue a telnet command from the RiOS CLI:

SH > en
SH # telnet 10.10.10.1 88
If the connection establishes there is no connectivity issue between the steelhead and the Windows DC.
At this point i opened up a ticket with Riverbed Support and they informed me of bug id #167210 – a memory leak in the winbindd binary that causes such issues.A quick workaround for the solution was to run a cron job on the steelheads to restart the winbindd process nightly. See the below snippet:

SH # (config) job 1 command 1 “en”
SH # (config) job 1 command 2 “pm process winbind restart”
SH # (config) job 1 comment “This job will restart winbind service once every day”
SH # (config) job 1 date-time 00:00:00 2014/08/06
SH # (config) job 1 enable
SH # (config) job 1 name “Restart-Winbind”
SH # (config) job 1 recurring “86400”
With the daily restart of the winbind process i havent seen it hang on any of steelheads – so a daily restart seems to be a good figure. I’ve been informed by Riverbed TAC that this bug should be resolved in the next release.
Originally published as ‘Riverbed Steelhead Winbindd bug in RiOS 8.6’