ARSDOC Get Not retrieving all of the documents

Joe Wolken · February 06, 2025, 07:31:52 PM

I am using a current CMOD Linux v.10.5 server. I use the OnDemand Client to query the list of documents that I want to retrieve. I then create a Public Saved Named Query "DE2024" It shows me 220 documents in the hit list in the Client.

I then run the ARSDOC GET API to extract the 220 PDF files from to a directory on the server. The API messages show 220 documents retrieved. The System Log shows 220 message 66 and 67 messages. BUT, I only get 219 PDF files in the directory.

Why would I get fewer files in the directory? How do I debug the missing file?

I try running the same command for named queries for larger sets of files and find that again I am missing files from what I see in the System Log.

Here is my command:
arsdoc get -v -h PROD -f "MERCH" -G "MERCH" -d /backup -o "(RUNDATE)(INVOICE)(LOC)(CONTAINER)(MANIFESTID)(NAME)" -q "DE2024"

Justin Derrick · February 07, 2025, 08:15:50 PM

Just guessing, but there's probably more than one record pointing to the same document. I've seen this at banks where more than one customer is listed on a mortgage -- two records in the database table will point to the same document for a mortgage statement. You should try using the options -agcNv to export them with the Generic Index File, then look for two records with the same offset & length.

Joe Wolken · February 10, 2025, 05:56:15 PM

Justin,

You bring up a good point that if the filename created by the concatenation of the index values is not unique then the output files can potentially overwrite each other making the total number of output files less than the number of retrievals (message 66 or 67).

However, in this case the value of the MANIFESTID for each document should be unique (forcing the creation of a unique filename). I verified this by using the same Named Query in the OnDemand Client, sorting the list by MANIFESTID, and analyzing a list of hits for duplicates and finding them to all be unique.

I have opened a support ticket with IBM and provided them with their requested information and am waiting for a response. I will post again if this is resolved.

Joe Wolken · February 10, 2025, 09:30:11 PM

Upon closer examination we found that there were files in OnDemand that had the exact same set of indexing information. It appears that the business application sent the same file more that once. So when using the ONDDOC Get API it probably created the first PDF file then overlayed it with the second file that had the exact same filename. Hence the file count mismatch compared to the number of indexing records. In this case, this is an acceptable (and even desirable) outcome because it effectively eliminated the duplicate files for us.

We are closing the support case with IBM as it was 'User error'.

Thanks Justin for prompting the reexamination of the data.

Justin Derrick · February 14, 2025, 05:30:44 PM

Excellent. I'm glad to see you got the problem resolved. Take care!

-JD.

ARSDOC Get Not retrieving all of the documents

Joe Wolken

Justin Derrick

Joe Wolken

Joe Wolken

Justin Derrick