Hi Guys,
Need some advise on the arsmaint functionality.
Recently, we found out that there are some files that is being send to CMOD server which is around 3GB in filesize for 1 file. This is being send on a bi-weekly basis, and since then I noticed that arscache is geting bigger and bigger until it reached 90%. Although I can see that the arsmaint is running for some app grps.
I wonder if the 3GB file is causing the slowness in arsmaint. Is it advisable for the 3GB file to be splitted into 500MB to speedup the arsmaint
Server Config:
CMOD: 9.0.0.6
DB2: 9.7
TSM: 6.5
/arscache - 50GB
Thanks in Advance! :)
-Gab
			
			
			
				Quote from: Gabriel Antonio on January 17, 2017, 03:35:50 AM
Hi Guys,
Need some advise on the arsmaint functionality.
Recently, we found out that there are some files that is being send to CMOD server which is around 3GB in filesize for 1 file. This is being send on a bi-weekly basis, and since then I noticed that arscache is geting bigger and bigger until it reached 90%. Although I can see that the arsmaint is running for some app grps.
I wonder if the 3GB file is causing the slowness in arsmaint. Is it advisable for the 3GB file to be splitted into 500MB to speedup the arsmaint
Server Config:
CMOD: 9.0.0.6
DB2: 9.7
TSM: 6.5
/arscache - 50GB
Thanks in Advance! :)
-Gab
Hi Gab,
What is your file type stored in CMOD.
In case of text report (Line data), Although, your file size is 3GB but when load into CMOD, index/field that you have capture in Report will be break down file into small piece. For example, if you capture branch no. as index/field, arsload command will be split your report files for each branch before store into CMOD. So, 3GB file size may not effect to your ARSMAINT schedule.
What is your ARSMAINT schedule option?
			
 
			
			
				Slowness in arsmaint is normal and expected as your system grows.  There are also a few things that can cause arsmaint to take a long time...  Loading millions / billions of small files results in millions/billions of small files and database entries if your Expiration Type setting in the Application Group is Load, or if you have a particularly enormous Application Group whose Expiration Type is set to Document.
For one system I worked on, due to just a few configuration mistakes and combined with the size of their system, meant that arsmaint took more than 24 hours to complete -- so instead of scheduling it to run nightly, they just left it running in a loop -- 24x7x365.
It's not optimal, but it doesn't really hurt anything.
-JD.
			
			
			
				Hi Teeraw,
The server is processing ARD, PDF, XLS and AFP. 
The 3GB file is and ARD file, loaded after 800 secs with only 15 rows. The schedule is every 3 hrs on a daily basis (3AM to 9PM). Thanks for the insights.
Hi Justin,
Understand your points, what you have said is also a mistake that I have found in my system, to the point that I have discovered that some of the appgrp is set to "no expiry" in the cache, hence I have to stop loading from that appgrp and create a new on with the correct expiry.
I guess the best option now is to add more disk space in cache to compensate on the mistakes on the setup. With the daily influx of data that the server is receiving, the arsmaint will really take time to free up. 
Thanks Again.
-Gab
			
			
			
				it depends on which argument you are using in arsmaint, AND how you configure your application group (mainly the expiration type).
If you have expiration type= document... then you will experience a HUGE slowliness of arsmaint... 
so if you can give us more information on the setup that you have, it will help us to understand clearly what other thing can influence the speed of arsmaint, in addition to what Justin and teeraw just said.
Regards,
Alessandro
			
			
			
				Hi Alessandro,
I have attached the settings of the app grp.
I am executing this arsmaint:
arsmaint -n 40 -x 60 -cdeimrsv
executed every 6AM and 6PM
By the way, we are planning to keep only the data in cache for 1-2 days, since we have a TSM that is using SAN. By doing this approach, is there any slowness that may come when doing the arsmaint and retrieving data from TSM.
Our CMOD and TSM are both on a different box.
Lastly, need you advise on the between choosing from CMOD 9.5.0.3 and 9.5.0.7, which you have used to be much stable and less bugs.
Thanks again.
-Gab
			
			
			
				Quote from: Gabriel Antonio on February 10, 2017, 02:10:33 AM
I am executing this arsmaint:
arsmaint -n 40 -x 60 -cdeimrsv
executed every 6AM and 6PM
So basically, you are using arsmaint with every option possible together?
- Expiration of indexes (-d)
- Expiration of cache (-c, -n, -x)
- Migration cache to TSM (-m)
- Migration of indexes tables to TSM (-e)
- Expiration of indexes tables (-i)
- Reorg of indexes tables (-r)
- Check of cache + Statistics (-s, -v)
So as you can see, you are doing 7 different actions with only 1 command...
And then you ask why it is slow?
I cannot answer your question. But I can help you see which one of these 7 actions is taking the most time.
THEN and only THEN, it is possible to have an idea of what could be the cause.
From what I see, you don't need option -e and -i, since you are not using "Migration of Indexes".
Here is a little script that could help you doing that:
#!/bin/ksh
echo "$(date) - Start migration Cache to TSM"
arsmaint -m
echo "$(date) - End migration Cache to TSM"
echo "$(date) - Start Expiration Cache"
arsmaint -c -n 40 -x 60
echo "$(date) - End migration Cache"
echo "$(date) - Start checking Cache integrity"
arsmaint -sv
echo "$(date) - End checking Cache integrity"
echo "$(date) - Start Expiration of Indexes"
arsmaint -d
echo "$(date) - End Expiration of Indexes"
echo "$(date) - Start Database Reorg"
arsmaint -r
echo "$(date) - End Database Reorg"
I let you put that script in a nicer form, put the output in some logs files.
It will do exactly what you are doing right now, but you will be able to see each steps easily, and more importantly how much time each step is consuming.
The other way to do it, would be to look into to the "System Log", search for the userid "ARSMAINT", and analyse the logs entries to understand each step, what it is doing, and what it taking so long.
My idea would be that the option -m is taking a lots of time, but that's only my idea... only by measuring it, you can be sure.
Concerning 3GB files... what are these files? it is 1 file, with 1 index? Or this is 1 file which at the end indexed by CMOD with ACIF indexer?
Or is it a file that has a generic index with it? that split for CMOD into multiple indexes?
Can you look at the cache the size of the object that have? Is it still 3GB, or is it less?
Quote
Lastly, need you advise on the between choosing from CMOD 9.5.0.3 and 9.5.0.7, which you have used to be much stable and less bugs.
You tell me... here are the bug correction between 9.5.0.3 and 9.5.0.7:
    2.2.9.5.0.4) Release (9.5.0.4)
      PI45715 - BURST COLUMN IN JES OUTPUT QUEUE NOT SHOWING VALUE OF YES
      PI46497 - THE MESSAGE ARS1159E DOES NOT SHOW THE NAME OF THE MISSING
                OBJECT
      PI46630 - Julian dates not calculated correctly for leap years
      PI46714 - CD-ROM ARSDD FAILS WITH ERROR "UNABLE TO REGISTER THE RESOURCE"
      PI48540 - arsload fails to load a generic index file w LO support
      PI47061 - ODF - ERRONEOUSLY MARKING THE DISTRIBUTION COMPLETE
      PI49820 - Upper case .IND in file causes ARSLOAD to fail
      PI52114 - ARSEXOAM FAILING AFTER UPGRADE TO CONTENT MANAGER ONDEMAND FOR
                Z/OS VERSION 9.5
      PI52282 - Use of Document Size in an Application Group can cause
                duplicate rows to be loaded
      PI56823 - TSM filespace prefixed w/ARCHIVE for default ARCHIVE instance
    2.2.9.5.0.5) Release (9.5.0.5)
      PI53798 - Load fails with ARS1127E message
      PI55412 - Load fails with ARS1176E message after applying the fix for
                PI52559
      PI57099 - Crash may occur when segment date is date(old style) and
                incorrect format is used to specify the segment date range in
                the -S option
      PI57664 - ACIF seg fault when collecting > 65k overlays in a res file
    2.2.9.5.0.6) Release (9.5.0.6)
      PI51782 - ODF Distributions not processed due to ARS1607E error in
                ARSRPSUB
      PI58677 - ODF Manifest not printing properly
      PI59581 - Invalid parameter list passed to the arslog exit
      PI59697 - ODF is changing the OnDemand report LRECL from 673 to 32753,
                causing a PAGEDEF/FORMDEF transform problem
      PI60252 - CMOD Document graphical annotation attributes not supported
      PI60746 - Character set name changed in MCF structured field
      PI62112 - PDF indexing using PDF metadata is changing metadata values
                of the ingested document
      PI62180 - Segmentation fault in arspdoci
      PI62221 - Cannot load into the system documents that are produced under
                UNIX related systems
      PI62797 - ARSLOAD running as a started task (STC) terminates with
                ARS4328E ARSSAPIR failed
      PI62843 - Error 125 with errno 183 "Unable to create symbolic link from
                file"
      PI63284 - ARSMAINT delete reports more deleted rows in message 84 than
                were loaded
    2.2.9.5.0.7) Release (9.5.0.7)
      PI65405 - When using arsload at time zone change (daylight saving time)
                in UTC+0, the time stored in database is invalid. Therefore
                documents cannot be retrieved
      PI65699 - Separate sysout banner dataset allocated with wrong parms
      PI68622 - PDF Indexer indexes each page as a separate document using PPDs
      PI64639 - 9.5 PDF Indexer performance degrades when PDF document is very
                large
      PI67507 - Defunct process buildup in arsload
      PI70021 - PDF Indexer skipping Adobe PDF/UA records
      PI70830 - arsload throws ARS4091I about absence of PDF Indexer although
                the application specifies XENOS Indexer
Do you want a corrected code which is new, or a code which is 18 month old?
And in addition to that, if you have a problem with 9.5.0.3.... the IBM Support will ask you first to upgrade to 9.5.0.7 and ensure that you don't have a problem, and if you have, then they will create a fix based on 9.5.0.7...
So in my own experience... don't try to be smart here, simply go to the latest version possible, if you have a problem with the latest FP, then IBM will be way way way more reactive to solve the problem asap.
If you are using an old version, then they need to build an environment to that version to test... then they will to check somewhere/sometime if that also is a problem with the latest fix pack...
meaning more time lost, and also with all the discussion about trying on your side with the latest FP, etc...
So if you are upgrading to V9 to V9.5... then there is no hesitation, just go to the V9.5 latest FP.
I know for facts, some integrator, or some customers, which don't trust FP, and stays with the vanilla CMOD version, and they never applies fix pack... and they run it for years, they have problems, and every time I was involved it was a pain...
my advice, try to be, at least with fix packs, as near as possible to the latest.
For each new CMOD version, then try it as soon as possible in your dev, test system to ensure it works as much as possible without breaking anything.
For production... I would wait for the 1st or second fix pack...
I am quite conservative here.
In all cases, before putting any version into production it is clear that the FP or the new version MUST be tested first in your test environment....
Again, IBM is always answering the quickest for fixes if the customer is using the latest version with the latest fix pack.
After that, this is your own choice.
I hope I could be of some help.
			
 
			
			
				Many Thanks Alessandro. Information you have shared greatly help my understanding. 
Cheers!
-Gabriel
			
			
			
				Quote from: Justin Derrick on January 17, 2017, 10:31:55 AM
Slowness in arsmaint is normal and expected as your system grows.  There are also a few things that can cause arsmaint to take a long time...  Loading millions / billions of small files results in millions/billions of small files and database entries if your Expiration Type setting in the Application Group is Load, or if you have a particularly enormous Application Group whose Expiration Type is set to Document.
For one system I worked on, due to just a few configuration mistakes and combined with the size of their system, meant that arsmaint took more than 24 hours to complete -- so instead of scheduling it to run nightly, they just left it running in a loop -- 24x7x365.
It's not optimal, but it doesn't really hurt anything.
-JD.
Justin - that's interesting what you are saying here. I thought that running arsmaint at least for db expiration is not recommended together with the loading process. So they remove data and load in the same time? With no harm? I know it's not optimal, but once I got feedback from IBM to not do that. I don't know if this was due to their experience or simply the recommendations.
			
 
			
			
				Hi Maciej!
It's not recommended, but when your expiration process takes more than 24 hours to complete, what are you going to do?  :)
It would have made more sense to reload certain groups of data and turn their Expiration Type to Segment, but with WORM disk as a requirement, that would cost them a lot of money in wasted storage...
I certainly don't recommend this solution long term.  :)
-JD.
			
			
			
				Hi Guys,
Just to give update.
We raised a case with IBM and have ask us to check if there are ZERO Byte Files in our arscache. Sadly, we found approximately 100 files scattered within our arscache folders.
The files is dated OCT-26-2016, wherein coincidentally, that was the date when our TSM started to received less data from CMOD. Previous days prior to Oct-26, our TSM is receiving 20-30GB per arsmaint run, on OCT-26 onwards up to today, TSM is only receiving 2-3GB of data from CMOD.
Did anyone from you guys experience the same kind of CMOD and TSM behavior? Just would like to confirm.
Now our next step is to remove this ZERO byte files, does anyone have done removal of this files?
TIA!
-Gab
			
			
			
				Hi Gabriel.
That's a tough one.  Are there any related errors in the system log, or in TSM's Activity Log?  It seems like the problem with maintenance might be related to how much (or little) data is being migrated from the CMOD cache into TSM.  Maybe consider turning on 'Migrate on Load' for an Application Group, and monitoring it carefully to see if there's an issue?
-JD.
			
			
			
				Hi JD,
These are the 2 errors that I see on the system log that is related to arsmaint.
150        Unable to allocate enough memory.  File=arssmcac.c, Line=3484  Srvr->a03gondemand01a 10.200.14.27 non-SSL<-
113         Cache Migration Failed: ApplGroup(GGIDP60) Agid(26409) ObjName(67775FAAA) Date(2016-10-25) InternalDate(17100) 
Other errors are:
13         DB Error: [IBM][CLI Driver][DB2/AIX64] SQL0911N  The current transaction has been rolled back because of a deadlock or timeout.  Reason code "68".  SQLSTATE=40001  -- SQLSTATE=40001, SQLCODE=-911, File=arsseg.c, Line=8246
Do you see anything suspicious? 
TIA!
-Gab
			
			
			
				Well, those are all pretty bad.  You'll have to address each of them individually.
-JD.
			
			
			
				Hi,
this shows that concurrent DB2 work (arsload?) causes a timeout.
13         DB Error: [IBM][CLI Driver][DB2/AIX64] SQL0911N  The current transaction has been rolled back because of a deadlock or timeout.  Reason code "68".  SQLSTATE=40001  -- SQLSTATE=40001, SQLCODE=-911, File=arsseg.c, Line=8246
this could be one of the possible reasons for a bad performance. In our shop a timeout occurs after 15 seconds. You can imagine the impact, if such an error occurs often.
regards
Egon