Hello,
We've been having issues that causes CMOD to stop running. In this instance it's a remote SQL issue and Libsrvr stops and needs a manual restart. The business area has been pressuring me to monitor the app and get notified if it's down. I've spoken to our vendor IBM and they say it can't be done. Well not officially anyway. Has anyone else been able to set up a monitoring / alert service for CMOD ? This sits on a Windows 2012 Server and the box itself is monitored through TSM but I need to see if the app can be as well, Is it possible ?
Cheers
What's your enterprise monitoring solution? Tivoli / Nagios / etc.
-JD.
Hi there,
We have Tivoli running on the servers for both heart beat and back ups. Problem is it's just that and monitored by the server team. They won't touch anything at the app level so that's where I'm stuck trying to see if I can find a solution and then do a proof of concept etc.
If ARSSOCKD goes down, can you have it send a page/email to your CMOD team and DBA team? That's what I've done in the past.
Many 4am pages :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'(
Thanks I'll look into that ! And then have the team draw straws to see whose cell gets to be the alert target :-\ Cheers
We had a list.
Most cell phones have ignore buttons.
guess who cant say no :D
:D yeah that's usually me too but my wife usually helps change my mind :(
Ugh. It seems strange that an enterprise monitoring team would refuse to do app monitoring, since servers aren't of much use without the apps that run on them. I've seen stranger things though.
Your best bet might be to use something like arslog to do the heartbeat -- if there hasn't been a single message in an hour, page out for an investigation. It might be best to combine this with a simple arsdoc command that runs a query or retrieval every 15 or 20 minutes too.
-JD.
Appreciate the ideas ! Now I have something to take back to the team for discussion. It's been frustrating for sure ! Our infrastructure guys will monitor any piece of hardware you want but as soon as it involves an app they are hands off. Cheers
Quote from: Justin Derrick on November 18, 2014, 08:06:53 PM
Ugh. It seems strange that an enterprise monitoring team would refuse to do app monitoring, since servers aren't of much use without the apps that run on them. I've seen stranger things though.
Your best bet might be to use something like arslog to do the heartbeat -- if there hasn't been a single message in an hour, page out for an investigation. It might be best to combine this with a simple arsdoc command that runs a query or retrieval every 15 or 20 minutes too.
-JD.
Ive seen this done in a large distributed CMOD environment and it worked PERFECTLY.
Thanks guys ! Got a project meeting coming up in an hour and this will be on the agenda. Cheers
Just a little background on this.
We had a bad TSM sector back in 2006 on our contingency TSM node (When I was still in college ;D) and the deal was every so often Tivoli would try to retrive a random document from a random set of documents that have a VERY tight SLA, and we'd simply pull it from the prod system and load it. It worked really well for those "Extremely time sensitive severity 1 code blue call the president if the file isnt there" kinda thing.
Good to know thanks !
Thanks for the suggestions ! I'm working with my developer and IBM to see what's the best solution. I also found out there's a product in house that another area has been using called HP SiteScope. They've been using it for monitoring and alerts as well so I'll be looking into that too. Cheers
Check out reville too
Will do , thanks !
Hi,
IBM have a ECMSM (Enterprise Content Management System Monitor). The ECMSM perform a function called "Task Execution Manager", which verify if the ARSSOCKD command is not alive and execute a command for restart it. Obviously, all this must be configured manually.
regards,
Thanks I will look into that as well. Appreciate the suggestion.
Bruce, the monitoring question aside, but what is happening to cause the server to stop running so often?
Hi,
The issue we've been seeing primarily is due to configuration that our architects have asked use to use. That is the back end SQL DB's are remote not local. Local ones aren't supported by our SQL DBA's so we have to use remote to get support. There's been some issues with that since we are still in stage and technically Dev. There's been a couple reboots to apply zero day security patches, an accidental port change, password change on a domain account that it got locked out and so on. Very educational all around but at least it's pre-production so we get all this ironed out. And also things like figuring out some monitoring etc. It's been messy in Dev and every SQL incident manifests itself by causing the Libsrvr service stops running. Cheers