OnDemand Users Group

Support Forums => CMOD for Multiplatforms => Topic started by: teera_aoo on February 29, 2024, 08:51:13 AM

Title: PDF from ARSDOC GET can see only one file
Post by: teera_aoo on February 29, 2024, 08:51:13 AM
I have requirement to export all data from CMOD to original pdf with indexes of them. (To use on another system)

So, I try to used 'arsdoc get' command. But output of PDF will be merged in very strange format cannot use to another system,
that is when open by normal PDF viewer will show only first file that exported from CMOD but I exported about 10 pdf files from CMOD.
When inspect filesize of PDF merged, I see filesize come from 10 pdf combined. (Ex. each file is 100K, merged file is 1000K)
In index file will show offset/lenght of each PDF. I think it managed as layers of PDF.


arsdoc get -hlocalhost -uadmin -ppassword -f "PDF-TIV" -g  -N -c -i "where doc_no like '%'" -o PDF-TIV.pdf
Note:
- When arsload back to CMOD, it will spilit to 10 PDF file correctly
- Option -g -N -c needed to used together.

Screenshot: https://u.pcloud.link/publink/show?code=XZCVdJ0ZotyT0txuaOYVyljCldfCzSOyuzd7 (https://u.pcloud.link/publink/show?code=XZCVdJ0ZotyT0txuaOYVyljCldfCzSOyuzd7)
Title: Re: PDF from ARSDOC GET can see only one file
Post by: Justin Derrick on February 29, 2024, 01:26:09 PM
It's not a strange format -- it's the CMOD Generic Index format (v2) which has been around for nearly 20 years, and is well documented:
https://cmod.wiki/dox/CMODv10.5/IndexingReference.pdf

If you're not loading the data to another CMOD server, you need to write a utility to do the splitting and convert the metadata, or work with someone who has already done that work.  ;)

-JD.
Title: Re: PDF from ARSDOC GET can see only one file
Post by: teera_aoo on February 29, 2024, 02:53:47 PM
Hi Justin,

Thank you for your reply.
Yes, I quite okay with generic index file because I often to use for loading PDF and other types to CMOD.

The generic index file that I have ever used as below format. (Offset=0, Length=0)
...
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:File1.pdf
               
...
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:File2
...



But I mentioned about PDF merge files that got from arsdoc_get, they're not append file2, file3, ... to page 2, page 3, ...
But it append/build something like a layers or binary combined.

Indexer has reference to same file but changed on offset and length

...
GROUP_OFFSET:0
GROUP_LENGTH:102187
GROUP_FILENAME:PDF-TIV2.pdf.0.PDF-TIV2.PDF-TIV2.out
               
...
GROUP_OFFSET:102187
GROUP_LENGTH:681891
GROUP_FILENAME:PDF-TIV2.pdf.0.PDF-TIV2.PDF-TIV2.out
...


Do you know or can share document that explain about this PDF's specification or utllity to split it back to original?



Title: Re: PDF from ARSDOC GET can see only one file
Post by: rjrussel on February 29, 2024, 05:14:28 PM
Document 1 start at 0 and is 102187 bytes. Document 2 starts 102187 and is 681891 bytes. You need to write something to extract individual documents using that logic. That is all there is to it.
Title: Re: PDF from ARSDOC GET can see only one file
Post by: Justin Derrick on February 29, 2024, 05:59:25 PM
Page 255 of the document I linked.

-JD.
Title: Re: PDF from ARSDOC GET can see only one file
Post by: teera_aoo on March 01, 2024, 10:29:58 AM
Hi Justin,

Thank you.
It seem arspdoci used for managed pdf indexer for CMOD loading (Like a pdf indexer that use in Admin Client to setup index position from PDF document)
But what I find out is how to cut PDF by offset/lenght specified in .ind file from arsdoc get.

Thank you