Hello,
We've seen something that has happened before but not for a while. We had an outage this morning and when I pulled the Window's (server 2012) error logs this is what I see (attached) , and have seen before. We do have the SQL database on another server and when I ask the SQL team says they don't see any errors at their end so it's a bit confusing for sure. For reference, we are running CMOD 9.0.03 with a separate SQL 2012 instance in out high availability farm and all are virtual servers. We also have both the desktop client and use Content Navigator with the user base. In the attached sheet I've highlighted the time of outage as approx. 09:53. Any insights would be much appreciated, especially how I get SQL errors but the DBA's swear they don't see a thing.
Cheers,
Bruce McHendry
Looks like you had some sort of networking hiccup. It says that it lost the network connection -- maybe there's a firewall that's dropping idle connections between CMOD and your database servers?
-JD.
Hi,
Thanks I'll run it by one of my buddies on the network team. It's strange to see in that I'm pretty sure we shouldn't be having that kind of issue only because these servers all sit in the same data farm here in Waterloo. I wondered about the idle time out stuff too but according to the CMOD logs there was constant activity so I'm surprised by that too. We had a bunch of instances like this last year too and then it was like Harry Potter works here. Days of dropped connection SQL errors causing outages and then it stops and we're good again. Never been a big believer in self healing issues, I will see if maybe the network guys can put a sniffer on for a couple days and see what happens. We (IBM) we indexing some huge AFP to PDF files and they take a long time to run. I wonder of maybe something hangs or appears to hang long enough to fool the server into thinking it's surpassed a time out threshold ?
Cheers,
Bruce
Quote from: bruce.mchendry on November 10, 2015, 01:11:49 PM
Hi,
Thanks I'll run it by one of my buddies on the network team. It's strange to see in that I'm pretty sure we shouldn't be having that kind of issue only because these servers all sit in the same data farm here in Waterloo. I wondered about the idle time out stuff too but according to the CMOD logs there was constant activity so I'm surprised by that too. We had a bunch of instances like this last year too and then it was like Harry Potter works here. Days of dropped connection SQL errors causing outages and then it stops and we're good again. Never been a big believer in self healing issues, I will see if maybe the network guys can put a sniffer on for a couple days and see what happens. We (IBM) we indexing some huge AFP to PDF files and they take a long time to run. I wonder of maybe something hangs or appears to hang long enough to fool the server into thinking it's surpassed a time out threshold ?
Cheers,
Bruce
For what its worth, I saw a memory leak when converting AFP to PDF and loading into CMOD as a PDF when using Xenos that would cause our library server to die. The fix was addressed at the Xenos level.
Thanks, for sure we've had our challenges with the whole AFP to PDF thing so it's worth a closer look. Thanks !
Not sure I would have arrived to this on my own but it turned out to be a server issue and the ESX environment that our VM's are in The server team changed the NIC's settings and drivers. The process used was -> Replace E1000 NIC with VMXNET3 NIC type, which includes disabling NIC Offloading on the VMXNET3 NIC as part of the replacement procedure. So it was a hardware issue.
Excellent. I'm glad to see this was fixed, and thank you for letting us know what the final resolution was. :)