arsdoc retrieve too slow AIX 7.1 CMOD 9.5 DB10.5

jfgonzalez · November 19, 2019, 09:41:30 PM

Hello,

We have 3 identical application groups, same compression ratio, similar structure, only cache storage, similar number of documents and none of them have segment tables.

All of them with one associated application. 2 of the applications takes 5 seconds to retrieve each document and the other application takes 15 seconds (This one is the more noticeable), it doesnÃ¯Â¿Â½t matter if itÃ¯Â¿Â½s the web client, thick client or recovery from arsdoc.

Currently weÃ¯Â¿Â½re extracting the reports from one environment to another, using arsdoc get with the recovery by LoadID.
ThereÃ¯Â¿Â½re around 4 millions of documents each application, so you can imagine how long it takes to retrieve the documents.

In the 3 application groups the search fields are set to filter.

What would you recommend to enhance the retrieval time?

Thanks in advance.

jsquizz · November 20, 2019, 06:12:47 PM

What kind of data is it? PDF? AFP?

jfgonzalez · November 20, 2019, 07:22:16 PM

The 3 application are PDF files

jsquizz · November 20, 2019, 08:29:53 PM

I figured.

Are all of the PDF input files exactly structured the same? Here's possible scenario I am thinking

Application ABC with app group XYZ retrieves fast, because its a simple pdf, no colors, images, not a lot of resources
Application DEF with app group LMO retrieves slow, because they are complex pdfs with lots of colors, possibly lots of resources.

Here's a good start-

Retrieve a "Fast aka normal" retrieval via ars doc, and a "Slow one"...Note the times. Then, use the system log to find the original load time and see if they were also quick to load or slow to load.

I would imagine there would be some kind of correlation.

jfgonzalez · November 20, 2019, 09:55:24 PM

The interesting thing about that is..

The application group ABC and XYZ have even more colors and images in the PDF (the faster one)
And the one that is taking longer, is only text lines.

I'll check the system load like you said.

jsquizz · November 21, 2019, 04:36:25 PM

Quote from: jfgonzalez on November 20, 2019, 09:55:24 PM
The interesting thing about that is..

The application group ABC and XYZ have even more colors and images in the PDF (the faster one)
And the one that is taking longer, is only text lines.

I'll check the system load like you said.

Could be dependent on the generating system, pdf version, etc. I've seen that happen before.

Ed_Arnold · November 21, 2019, 07:40:15 PM

Are there funky fonts in the slow retrievals?

If I recall correctly, there's a standard set of fonts that's always available with PDFs.

If you use other than those standard fonts then they have to be archived as resources, recalled, mapped, converted, sliced and diced so that they can be displayed.

Ed

jsquizz · November 21, 2019, 08:35:12 PM

Quote from: Ed_Arnold on November 21, 2019, 07:40:15 PM
Are there funky fonts in the slow retrievals?

If I recall correctly, there's a standard set of fonts that's always available with PDFs.

If you use other than those standard fonts then they have to be archived as resources, recalled, mapped, converted, sliced and diced so that they can be displayed.

Ed

That is where I was going. :)

jfgonzalez · November 22, 2019, 01:04:52 AM

Hello. thanks for the response

All the PDFs use the Standard fonts, none of the funky fonts.

I'll post some images to explain more detailed the problem

Ok, the first image (IMG1) this one uploads fast and the retraival time is fast as well

The second image (IMG2) this one uploaded slow (around 25 seconds) but the retreival time is relative fast

And the last image (IMG3) is the one taking long, uploading (25 seconds) and slow retrieving (25 seconds) (As i mentioned, no funky fonts nor images or colored text.)

Just to clarify, all the documents are already uploaded, so the only thing that matter right now is how to retrieve them.

Thanks for the help

Justin Derrick · November 22, 2019, 02:51:49 AM

You haven't given us any information on your environment - which version of CMOD, your OS & version, and database engine and version. All versions should include the fixpack level.

Providing this information in your original post is the best way to help us to help you! :)

-JD.

jfgonzalez · November 22, 2019, 03:01:45 PM

Quote from: Justin Derrick on November 22, 2019, 02:51:49 AM
You haven't given us any information on your environment - which version of CMOD, your OS & version, and database engine and version. All versions should include the fixpack level.

Providing this information in your original post is the best way to help us to help you! :)

-JD.

Hello Justin,

Here is the information on the environment

DB2 10.5 fixpack 7
CMOD 9.5 fixpack 10
AIX 7.1 TL 4 SP 4

If you need any more information let me know

thank you in advance.

jsquizz · November 22, 2019, 03:43:58 PM

Are the PDF's that are taking forever older loads?

jfgonzalez · November 22, 2019, 04:11:44 PM

Quote from: jsquizz on November 22, 2019, 03:43:58 PM
Are the PDF's that are taking forever older loads?

No. They were loaded at similiar time

Lars Bencze · November 25, 2019, 04:31:41 PM

Just a detail; Your "H" column is measuring "Rows per second", so a higher number here is better/faster.
The IMG1 seems to be the slowest, it only indexes 5-7 rows per second, while the other two handle around 25-30 rows per second.
It is not "seconds per row", it's the other way araound.

I haven't seen anything else that looks suspicious in your Images. Yet, at least.

Lars Bencze · November 26, 2019, 11:15:15 AM

I have another question.
You mentioned that you are exporting from one environment, and these images (IMG1-3) are from when you load them back into the new environment, correct?
It looks like you are loading them using, hmm, the "Graphical PDF Indexer", correct? Or are you using "Internal indexing" (PPDs)?

The load process spends HUGE amounts of time indexing. Therefore, one way to speed up the load would be to use the -g flag during export (arsdoc get ... -g = Create generic indexer file), and then to use the flags -X G (="Find a generic indexing file") during arsload.
From what I can see, the PDF Indexer (which I assume that you are using) does not manage to compress the PDF documents during load. They actually wind up a bit BIGGER than they came, which is not unusual, at least not for smaller files. So you will not lose anything in size by using Generic Indexing instead of PDF Indexing.

WHICH, brings me to the next possible issue. Seeing that OnDemand is unable to compress the multi-document files to any degree makes me suspect that the PDF files may have "PDF Compression" turned on. This is a known factor that slows down OnDemand indexing a lot. Sometimes to a factor of 50 or 100.
Now, since these files are already created and you are just moving them, I guess you can't re-build them. But for future files, talk to the guys creating these PDF files and ask them to turn PDF Compression COMPLETELY OFF. If I recall Bud Paton correct, that is "Level 0 PDF Compression". Yes, the files will be much larger at delivery, but OnDemand should process them a huge lot faster.

I hope that either or both of these tips may help you!