OnDemand Users Group

Support Forums => CMOD for Multiplatforms => Topic started by: Trambak on April 26, 2012, 05:34:04 AM

Title: improved PDF indexee with v 8.5
Post by: Trambak on April 26, 2012, 05:34:04 AM
I have seen  that IBM has made some improvements to PDF indexer with v 8.5. They have moved to 64 bit. Has anyone used the new one and seen any performance improvements. Does the restriction on the pdf filesize still apply? Has anyone used the metadata indexing that was introduced? PDF FORMS have been added to resource optimization? Does it apply on the pdfs created by Adobe PDF forms or it has to be created by some IBM specific tool. I hope that the latter is not true? Thanks

Trambak
Title: Re: improved PDF indexee with v 8.5
Post by: Alessandro Perucchi on April 26, 2012, 06:50:42 PM
Hello Trambak,

I've played a bit with PDF Indexer, so I can answer some of your questions. Concerning the performance improvement, I've no idea since I've never used the "old" version.
I've played a lot with metadata indexing, and I was a bit disapointed. In fact, you can only use the following indexes:


No custom fields.

In addition to that if you try to optimize the disk space by separating the ressources from the actual content, then you loose all the "customization" that are in the PDF.
For example, a PDF/A (with signature), will be transformed in simple PDF without signature.
If you've added some custom index fields, then they are removed.
In fact everything that is not strictly compliant with PDF 1.3 is removed during the split between ressources and content.
This is something to be aware and careful. The output is the same, but the PDF is different...

Concerning PDF FORMS, I have no idea, but I've tried PDFs generated from Adobe and Ghostscript and both are treated in the same way, so I doubt that you need to have some IBM specific tools.

Sincerely yours,
Alessandro
Title: Re: improved PDF indexee with v 8.5
Post by: LWagner on November 01, 2012, 03:41:33 PM
PDF Indexing moving from CMOD 8.4 to CMOD 8.5 seems to present a 90 - 95% reduction in output file size for the PDF bills we produce.

In version 8.4, common resources were NOT shared, and were stored for each individual PDF bill with a container PDF.
CMOD 8.5 makes good use of shared resources in PDF indexing.

This 95% figure is for our particular PDF files.  If your documents have few graphics and very few additional fonts, you will likely see a much smaller improvement in compression output file size.
Title: Re: improved PDF indexee with v 8.5
Post by: ali.arsiwala on November 05, 2012, 10:39:42 PM
Hi,
We tried using PDF indexing for one of our customers who use huge PDF's. It did take lesser space and made better use of the space, however the loading times were huge (huge as in unrealistic). It would take us 36 hours to load 1 day's data which was unrealistic.
We switched back to using generic indexer and loads were a lot quicker, however full pdf's were stored i.e. each pdf with their own resources, no shared resources. I'm interested in knowing if you guys face a similar issue of long loading times when using the new PDF indexer?
Thanks
Ali.
Title: Re: improved PDF indexee with v 8.5
Post by: LWagner on November 05, 2012, 10:56:28 PM
The best performance for PDF indexing will be achieved on a Windows index server, (by experience, and recommended by IBM) and then upload to the instance of choice. This requires the PDF to be indexed must be moved to the Windows index server.

We upload about 60,000 PDF bills averaging 600,000 bytes each in about 500 PDF container files, spread across 8 Windows index servers to an instance on z/OS. This is accomplished on average in 90 - 120 minutes starting at 1:30 AM.  Using CMOD 8.4, the PDF files increase in size by about a factor of 18 or 19.  The final stored size has about 10% compression achieved on z/OS.
Title: Re: improved PDF indexee with v 8.5
Post by: ali.arsiwala on November 06, 2012, 04:54:14 AM
Thanks for that, we will try out the Windows suggestion. All our processes run on AIX servers (Indexing + Loads).

Cheers
Ali.
Title: Re: improved PDF indexee with v 8.5
Post by: LWagner on November 06, 2012, 02:58:29 PM
Trambak:

In response to other questions:

"Does the restriction on the pdf filesize still apply?"
   - Are you referring to CMOD on z/OS ?  There was a 250 Mb maximum file size with OAM. That is one reason we chose to cut our Bills down to so many files.  We started out trying 2000 PDFs per container, and ended up at 200 to ensure that in CMOD 8.4, our output file size stayed below 250 Mb.  So our input files were under 10 Mb.  We are moving to CMOD 8.5 on AIX, but may keep the fragmentation for ease of the use of the round robin load via Windows index servers.

"Has anyone used the metadata indexing that was introduced?"
   - That did not occur to us.  We generate customer bills, and used the arspdump executable to locate string positions to use for indexing.  Unfortunately there were too many templates for bill parts, so we could not count on consistent text locations, so we chose to require the bills to have what we now call a "scanline".  This is hidden text, white font on white background, with all of our indexing values repeated in the scan line. 

Indexer is PDF.
sample indexer information:
FIELD1=ul(1.58,10.37),lr(3.45,10.60),0,(TRIGGER=1,BASE=0)
INDEX1='ACCOUNT NUMBER',FIELD1,(TYPE=GROUP)/* ACCOUNT NUMBER */


We do not use PDF FORMS.
Title: Re: improved PDF indexee with v 8.5
Post by: pankaj.puranik on December 10, 2012, 11:03:08 AM
The PDF file to be processed should still be under 4 Gigs as per the indexing reference guide for Multiplatforms V 8.5