Generic Indexer For MS-Word

Previous topic - Next topic

SunnyManeeth

Hi Team,

Am working on Generic Indexer and want to load an MS-Word Document.

The word document has around 100 pages i want to find out the Page Length, Page Offset of the Document. Is there any option to find these things by using ars commands.

Thanks
Sunny :)

rick

If you need to load the entire word document, the offset will be 0 and length will be the byte size of the word file. If it is a segmented document provide the offset and length accordingly

SunnyManeeth

Thanks Frederick,

I thought of writing a java utility so that i can read page by page and gives the length and offset of the page.

Is there any tool so that we can know about the length and offset of the Document pages because we deal with the many number of pages in a single document.

Is there any best scenarios so that i can work on that. Can you suggest me on that.

Thanks
Sunny :)

rick

Not sure if there is any utility that would split and calculate the document size. If you know the in and out of word docs, using Java would be a best option.

Justin Derrick

Quote from: Sunny on June 10, 2014, 09:41:22 AM
I thought of writing a java utility so that i can read page by page and gives the length and offset of the page.

You can't just carve out a portion of a Word file, and expect Word to know what to do with it.  Consider converting the file to AFP or PDF first.

-JD.
Call:  +1-866-533-7742  or  eMail:  jd@justinderrick.com
IBM CMOD Wiki:  https://CMOD.wiki/
FREE IBM CMOD Webinars:  https://CMOD.Training/
IBM CMOD Professional Services: https://CMOD.cloud

Interests: #AIX #Linux #Multiplatforms #DB2 #TSM #SP #Performance #Security #Audits #Customizing #Availability #HA #DR

SunnyManeeth

Thanks Frederick and Justin,

I'll look into this, hope i will get the best scenario.

Thanks
Sunny :)

LWagner

Sunny:

How are your results ?

If you converted the Word doc to PDF, you could then use arspdump to dump text of the PDF so you can deteremine the location of any text you want to use for index values.

But you still won't be able to spit the PDF , unless it is a CONTAINER type PDF with many PDF document files in it.

SunnyManeeth

Hi LWagner,

I have done this in two scenarios
     (1) Converting the Doc to PDF, so that we can use the Report indexing to load the document to the CMOD.
     (2) Splliting the Document into different pages and then load it to the CMOD using Generic indexer.
But both works as the same, its better to convert the Document into PDF and then load it to the system.

Thanks
Sunny :)