CMOD monitoring and alerts

Previous topic - Next topic

bruce.mchendry

Hello,
We've been having issues that causes CMOD to stop running. In this instance it's a remote SQL issue and Libsrvr stops and needs a manual restart. The business area has been pressuring me to monitor the app and get notified if it's down. I've spoken to our vendor IBM and they say it can't be done. Well not officially anyway. Has anyone else been able to set up a monitoring / alert service for CMOD ? This sits on a Windows 2012 Server and the box itself is monitored through TSM but I need to see if the app can be as well, Is it possible ?
Cheers

Justin Derrick

What's your enterprise monitoring solution?  Tivoli / Nagios / etc.

-JD.
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Webinars:  https://CMOD.Training/
IBM CMOD Professional Services: https://CMOD.cloud

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

bruce.mchendry

Hi there,
We have Tivoli running on the servers for both heart beat and back ups. Problem is it's just that and monitored by the server team. They won't touch anything at the app level so that's where I'm stuck trying to see if I can find a solution and then do a proof of concept etc.

jeffs42885

If ARSSOCKD goes down, can you have it send a page/email to your CMOD team and DBA team? That's what I've done in the past.

Many 4am pages  :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'( :'(

bruce.mchendry

Thanks I'll look into that  ! And then have the team draw straws to see whose cell gets to be the alert target  :-\  Cheers

jeffs42885

We had a list.

Most cell phones have ignore buttons.

guess who cant say no :D

bruce.mchendry

 :D  yeah that's usually me too but my wife usually helps change my mind :(

Justin Derrick

Ugh.  It seems strange that an enterprise monitoring team would refuse to do app monitoring, since servers aren't of much use without the apps that run on them.  I've seen stranger things though.

Your best bet might be to use something like arslog to do the heartbeat -- if there hasn't been a single message in an hour, page out for an investigation.  It might be best to combine this with a simple arsdoc command that runs a query or retrieval every 15 or 20 minutes too.

-JD.
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Webinars:  https://CMOD.Training/
IBM CMOD Professional Services: https://CMOD.cloud

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

bruce.mchendry

Appreciate the ideas ! Now I have something to take back to the team for discussion. It's been frustrating for sure ! Our infrastructure guys will monitor any piece of hardware you want but as soon as it involves an app they are hands off. Cheers

jeffs42885

Quote from: Justin Derrick on November 18, 2014, 08:06:53 PM
Ugh.  It seems strange that an enterprise monitoring team would refuse to do app monitoring, since servers aren't of much use without the apps that run on them.  I've seen stranger things though.

Your best bet might be to use something like arslog to do the heartbeat -- if there hasn't been a single message in an hour, page out for an investigation.  It might be best to combine this with a simple arsdoc command that runs a query or retrieval every 15 or 20 minutes too.

-JD.

Ive seen this done in a large distributed CMOD environment and it worked PERFECTLY.

bruce.mchendry

Thanks guys ! Got a project meeting coming up in an hour and this will be on the agenda. Cheers

jeffs42885

Just a little background on this.

We had a bad TSM sector back in 2006 on our contingency TSM node (When I was still in college  ;D) and the deal was every so often Tivoli would try to retrive a random document from a random set of documents that have a VERY tight SLA, and we'd simply pull it from the prod system and load it. It worked really well for those "Extremely time sensitive severity 1 code blue call the president if the file isnt there" kinda thing.

bruce.mchendry


bruce.mchendry

Thanks for the suggestions ! I'm working with my developer and IBM to see what's the best solution. I also found out there's a product in house that another area has been using called HP SiteScope. They've been using it for monitoring and alerts as well so I'll be looking into that too. Cheers

jeffs42885