OnDemand Users Group

Support Forums => Report Indexing => Topic started by: pankaj.puranik on April 13, 2011, 04:05:22 PM

Title: Generic indexer
Post by: pankaj.puranik on April 13, 2011, 04:05:22 PM
Hi

If I have multiple documents in the input file then I would have to specify the GROUP_OFFSET and GROUP_LENGTH.
Suppose I have an input file with multiple word documents, how can I find the values for each GROUP_OFFSET and GROUP_LENGTH.

Thanks
Pankaj.
Title: Re: Generic indexer
Post by: Alessandro Perucchi on April 13, 2011, 08:04:02 PM
Hello Pankaj,

if you have 1 index file, and several separate word documents, then your index file might look like that:


CODEPAGE:923

COMMENT:DOCUMENT 1
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:word1.doc

COMMENT:DOCUMENT 2
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:word2.doc

COMMENT:DOCUMENT 3
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:word3.doc

COMMENT:DOCUMENT 4
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:0
GROUP_FILENAME:word4.doc


Well, if you have all the word files concatenated together, then you need and must know the offset and length of each file inside the concatenated file.


CODEPAGE:923

COMMENT:DOCUMENT 1
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:0
GROUP_LENGTH:1000
GROUP_FILENAME:wordsingle.concat

COMMENT:DOCUMENT 2
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:1001
GROUP_LENGTH:1203
GROUP_FILENAME:wordsingle.concat

COMMENT:DOCUMENT 3
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:2205
GROUP_LENGTH:800
GROUP_FILENAME:wordsingle.concat

COMMENT:DOCUMENT 4
GROUP_FIELD_NAME:field1
GROUP_FIELD_VALUE:value1
...
GROUP_FIELD_NAME:field4
GROUP_FIELD_VALUE:value4
GROUP_OFFSET:3006
GROUP_LENGTH:997
GROUP_FILENAME:wordsingle.concat


But if you don't have the offset/length.... then you must ask the people who provided you with this file. OR you need to know exactly how a word file is structured and find it with some tools.

Cheers,
Alessandro
Title: Re: Generic indexer
Post by: Justin Derrick on April 15, 2011, 01:23:53 PM
Minor Correction Alessandro...

In your second sample, the first GROUP_LENGTH is 1000, the next GROUP_OFFSET needs to be incremented by 1 -- so, 1001.  You've got this mistake throughout your example.

-JD.
Title: Re: Generic indexer
Post by: Alessandro Perucchi on April 16, 2011, 09:30:32 AM
Quote from: Justin Derrick on April 15, 2011, 01:23:53 PM
Minor Correction Alessandro...

In your second sample, the first GROUP_LENGTH is 1000, the next GROUP_OFFSET needs to be incremented by 1 -- so, 1001.  You've got this mistake throughout your example.

-JD.

Hello Justin,

Thanks, I've corrected the example!

Cheers,
Alessandro