48X locking up--excessive traffic?
Posted: Tue Feb 02, 2016 3:21 pm
We're trying to track down the root cause of our core 48X (we have 4 throughout the country, all setup for Multisite, the "core" is just the biggest one--about 180 users, 270 handsets) locking up about one a week lately. Version 7.6.14.6. The first time we experienced it was when we had one of the other systems experience a hardware failure and we redirected inbound traffic from the SIP trunk of the failed system to our core system. I was able to telnet into the system (still hoping to redirect all of that Telnet output in some other way, I've got another thread about that) after restarting it and saw a ton of messages like the ones below:
Three times since it's shown the same symptoms (other than we haven't been telnetted in so I can't verify those messages). We had one other instance today where I was telnetted in, we saw very similar symptoms (web interface hung, outbound calls were failing) and it was the same messages as above, page after page. I would keep hitting enter on the telnet window and see that it was still responding, and then after about a minute everything started responding again (and the messages stopped).
We've had a lot of growth lately, and while we aren't maxed out to the licensed and stated abilities of the system, we're wondering if either we have too much usage on the 48X, if there's something that our SIP provider (Windstream) might be doing that is leading to this, or if it might be a DoS attack. We've had DoS attacks in the past (prior to the software upgrade that fixed the vulnerability), but those caused ping results to be spotty with high latency; ping has been consistent with no latency for these most recent issues.
Pages and pages of them. At some point the web interface of the phone system stopped responding, phone call processing stopped, Telnet disconnected, but we could still ping the system. Only resolution was to bounce the system, and ultimately we redirected the failed system's trunk to another system and the problem went away.tSip: +++ IEC [7.6.14.6:sipInvite.c,3832]
tSip: +++ IEC [7.6.14.6:sipInvite.c,3832]
tSip: +++ IEC [7.6.14.6:sipInvite.c,3832]
tSip: +++ IEC [7.6.14.6:ccbCall.c,1927]
tSip: +++ IEC [7.6.14.6:sipInvite.c,1630]
tSip: +++ IEC [7.6.14.6:sipCancel.c,311]
tSip: +++ IEC [7.6.14.6:sipCancel.c,311]
Three times since it's shown the same symptoms (other than we haven't been telnetted in so I can't verify those messages). We had one other instance today where I was telnetted in, we saw very similar symptoms (web interface hung, outbound calls were failing) and it was the same messages as above, page after page. I would keep hitting enter on the telnet window and see that it was still responding, and then after about a minute everything started responding again (and the messages stopped).
We've had a lot of growth lately, and while we aren't maxed out to the licensed and stated abilities of the system, we're wondering if either we have too much usage on the 48X, if there's something that our SIP provider (Windstream) might be doing that is leading to this, or if it might be a DoS attack. We've had DoS attacks in the past (prior to the software upgrade that fixed the vulnerability), but those caused ping results to be spotty with high latency; ping has been consistent with no latency for these most recent issues.