gmSCSortPairClass

The SortPair Service Class

The service class SortPair maintains an open ended list of entry-value pairs in sort order by value. Pairs can be retrieved by their value and then the corresponding entry is immediately available. The algorithm underlying the sortpair methods is designed to make the positioning into the list at or near a particular specified value as quick as possible. Once this positioning has been achieved, the operation of writing a new entry to the list is optimal. In particular, the problem is to maintain various lists of values in sorted ascending order. The algorithm needs to be able to add new values to the list with no guarantee that those values were not already there.

To make sorting possible, values are assigned binary representation types and each type has a technique available which compares a current value to another and computes a signed comparison value. A negative value means that the current value is below the other one in their sort order; a zero value means that the two values are equal with regards to their sort order; and a positive value means that the current value is greater than the other one. Using the Java terminolgy we will refer to this technique as "compareTo(newValue,listValue)"

Given that we know how to compare individual values, an addEntryToList algorithm is now needed which will return the index in the sorted list where the new value was stored or a -1 if the value was already in the list. The brute-force way to perform this operation would be to simply scan forward in the list comparing the new value to each value in the list.


int addEntryToList(object list[], object new, int nList)
{
   for(i = 0; i < nList; i++)
   {
     int iRelation = compareTo(object,list[i]));
     if(iRelation == 0) return -1;
     if(iRelation > 0)
     {
        insert object before i in list;
        return i;
     }
  }
  append object to list;
  return nList;
}
The problem is that this brute-force approach has complexity O(n) plus the time required to do the insertion or the appending. Since the list was sorted, it should be possible to locate the value in the list using a binary search with complexity O(log n). But the standard binary search algorithm only gives the location of the entry if it is in the list. No information about where to insert an entry, if it is not in the list is supplied. So using a binary search would improve nothing for entries not yet in the list. The entire forward scan in the above would have to be repeated. The SortPair class, therefore, uses an extension to the binary search referred to as a "search index". A "search index" is a signed integer value that can be associated with any value regardless of whether or not it presently is within the list. Its values are as follows:

Value Meaning
+seq A positive search index means that the entry is presently in the list. The seq value is the sequence number relative to one of the actual position of the entry in the list.
0 A zero value indicates that the entry is not in the list and is below all entries in the sort order.
-seq A negative search index means that the entry is not presently in the list, but it is above some entries now in the list in the sort order. The seq value is the negative sequence number relative to one of that entry in the list immediately below the entry itself.

The algorithm for computing the search index is shown below.


int searchIndex(Value list[],Value new,int nList)
{
   int iBottom = 0;
   int iMiddle = 0;
   int iRelation = 0;
   int iTop = nList - 1;

   while(iBottom <= iTop)
   {
      iMiddle = (iBottom + iTop) / 2;
      iRelation = compareTo(new,list[iMiddle]);
      if(iRelation == 0) return iMiddle + 1;
      if(iRelation < 0) iTop = iMiddle - 1;
      else iBottom = iMiddle + 1;
   }
   if(iRelation > 0) return -(iMiddle + 1);
   else if(iMiddle == 0) return 0;
   return -iMiddle;
}
The iterative portion of the search index algorithm is identical to a normal binary search. The only difference occurs when we hit a match. Then the algorithm returns iMiddle + 1 rather than simply iMiddle. We do not diverge from the normal binary search until we establish that there is no match present for the new object. When iBottom exceeds iTop, the last examined entry, the one whose index is "iMiddle" is either that entry in the list that is directly greater than the new one or directly less. Which can be determined by the value of the relation code. If the relation code is positive then the entry at iMiddle is less that the new entry, so the return value is -(iMiddle+1). Else we now know that the last examined entry was greater than the new object. If that entry was at the front of the list then the return value is zero since the new object is less than all entries in the list. If it is not zero then the return value is simply -iMiddle.

Using the search index the addEntryToList algorithm now becomes


int addEntryToList(object list[], object new, int nList)
{
   int searchIndex = searchIndex(list,new,nList)
   if(SearchIndex > 0) return -1;
   if(searchIndex < 0)
   {
      int i = -searchIndex;
      insert object before i in list;
      return i;
   }
   append object to list;
   return nList;
}
The addEntryToList algorithm is independent of nList and the searchIndex algorithm has the same complexity as a binary search which is (O log nList). Therefore our revised version has complexity O(log nList) plus the time required to do the insertion or the appending.

The problem of course is within the step "insert object before i in list". This insertion requires moving the entire upstream portion of the list so that an insertion can occur. A linked list cannot be used because traversing the links would destroy the efficiency of binary search. To avoid this the list of index-value pairs is stored in a tree-structured set of fixed-sized blocks. When a block is filled it is split into two blocks. Higher level blocks replace the index component of the pair with the offset of the child block that has that value as its first entry. The search index thus becomes a search path which is the block offset of each block searched starting with the initial block in the tree. If the maximum number of levels is SORT_MAX_LEVELS (10 here) and maxPairs is the number of pairs per block (max = 31 here, average = 16) then the addEntryToList operation is never needed more than SORT_MAX_LEVELS times and never has to search a list longer than maxPairs and never has to move more than that number. With its current settings the algorthimn can store 10 to the 12th pairs and has a maximum run time of


    SORT_MAX_LEVELS*(searchTime(maxPairsInList)+insertTime(maxPairsInList)),
    where:
       SORT_MAX_LEVES = 10
       maxPairsInList = 31.
Which this is completely independent of the length of the list.

The allocation of the control structure itself is the responsibility of the user of the sort pair list. This class is used very aggresively by applications to maintain sorted lists that are written once and then accessed repeatedly. To make these accesses as quick as possible the control structure for this class is allocated on the stack via local variable declarations in the using code.

The field SortCaseSensitive

Prototype


 extern int SortCaseSensitive;
The SortCaseSensitive field specifies how string comparisons are to be performed. A zero value makes them case-insensitive, while a nonzero value makes then case-sensitive. By default this field is set to zero, case-insensitive.

The method SortPair_ChangePosting

Prototype


void SortPair_ChangePosting(void* listStore,int basePoint,int index,int value);
The SortPair_ChangePosting method changes the posting value associated with a posting index. Its parameters are as follows:

Parameter Description
listStore Specifies the handle for the LongMemory area that contains the posting pair.
basePoint Specifies the root offset of the posting pair in the LongMemory area as returned by the method SortPair_CreatePostingPair.
index Specifies the index value for the pair. This is the value used to find the pair. This method assumes that a pair with the specified index already exists.
value Specifies the new value to be associated with the pair.


The method SortPair_Close

Prototype


void SortPair_Close(void* This);
The SortPair_Close method closes the indicated control structure for the sort pair list. If no changes have been made to the list, then this method ends up doing nothing, since the allocation of the control structure itself is the responsibility of the user of the sort pair list. If the list has been changed, then its root information is updated in the storage area containing the actual list. Its parameter is:

Parameter Description
This Specifies the handle of the information structure controlling the sort pair list.


The method SortPair_Create

Prototype


int SortPair_Create(void* This,void* listStore,int valueType,int blkSize);
The SortPair_Create method initializes a structure which describes the sort pair list and writes the initial block for that sort pair to the LongMemory area. Note that memory for the control structure is allocated by the user of this method. This method has the following parameters:

Parameter Description
This Specifies the handle of the information structure controlling the sort pair list. The memory for this structure most be allocated as an int vector and managed by the caller via a reference to the SortPair_Stack constant in the header. A typical declaration is "auto int sortPair[SortPair_StackSize];".
listStore Specifies the handle for the LongMemory area that contains the sort pair.
valueType Specifies the type of value being used to order the entries in the list. This must be one of the basic representation type constants REP_* from the header.
blkSize Specifies the block size to be used for the sorted values. If zero, then the value of LongMemory_GetPostingSize() is used.

If all goes well then the method returns the location in the LongMemory area of the control information vector for the sort pair list. If there is a problem, then a zero is returned.

The method SortPair_CreatePostingPair

Prototype


int SortPair_CreatePostingPair(void* listStore);
The SortPair_CreatePostingPair method creates a small integer SortPair used for looking up links and relationships between components in LongMemory based on their defining root offsets. The information structure associated with posting pairs itself is allocated locally on the stack and is never left open; therefore, posting pairs do not have handles and are controlled entirely via their root offsets. This method has the following parameter:

Parameter Description
listStore Specifies the handle for the LongMemory area that contains the posting pair.

If all goes well, then the method returns the root offset in the LongMemory area of the control information vector for the posting pair. If there is a problem, then a zero is returned.

The method SortPair_Find

Prototype


int* SortPair_Find(void* This,void* value,int delta);
The SortPair_Find method finds a value in a sort pair list. The value as specified must be of the type specified for the list when it was created. As this runs it retains the search path used (see the discussion in the introduction). If the value is not in the list then the search path can be used to insert the value into the list very quickly since its location has already been determined. The parameters of this method are as follows:

Parameter Description
This Specifies the handle for the information structure controlling the sort pair list.
value Specifies the value being sought in the list. It must be of the type specified when the list was created.
delta Specifies whether the caller wants to change the pair value. A nonzero value indicates yes, while a zero value indicates no.

If there is a pair in the list with the specified value, then a pointer to that pair is returned. If there is no pair with a matching value then a NULL is returned.

The method SortPair_FindPosting

Prototype


int SortPair_FindPosting(void* listStore,int basePoint,int pairIndex);
The SortPair_FindPosting method looks up the posting value associated with a posting index into a posting pair. The parameters of this method are as follows:

Parameter Description
listStore Specifies the handle for the LongMemory area that contains the posting pair.
basePoint Specifies the root offset of the posting pair in the LongMemory area as returned by the method SortPair_CreatePostingPair.
pairIndex Specifies the index value for which a corresponding posting value is desired.

If the posting pair contains an entry for the index, then its associated value is returned; else a zero is returned.

The method SortPair_GetLocation

Prototype


void SortPair_GetLocation(void* This,int* location1,int* location2);
The SortPair_GetLocation method gets the location properties of the sort pair. These are user specific values that are stored within the sort pair control structure for convenience purposes since many applications of the sort pair logic require some additional control information. The parameters of this method are as follows:

Parameter Description
This Specifies the handle for the information structure controlling the sort pair list.
location1 returns the primary location value stored in the sort pair structure, if not NULL.
location2 returns the secondary location value stored in the sort pair structure, if not NULL.


The method SortPair_Next

Prototype


int* SortPair_Next(void* Store,tPosting* PostSet);
The SortPair_Next method moves to the next pair in a sorted list. The sort pair list is not a sequence and cannot therefore simply be accessed via a subscript. Instead an iterator must be used. First the method SortPair_Rewind is used to initialize the iterator so that it points to the initial pair in the sorted list. Then this method is used to obtain the subsequent pairs. The parameters of this method are as follows:

Parameter Description
Store Specifies the handle for the LongMemory area that contains the sorted pair list. Note that this handle is returned by the SortPair_Rewind method when it initializes the iterator.
PostSet Contains the user allocated structure tPostSet which controls the movement though pair postings.

If there is another pair in the sorted list, then a pointer to that pair is returned. If there is no additional pair, then a NULL is returned.

The method SortPair_Position

Prototype


int* SortPair_Position(void* This,tPosting* PostSet,void* value);
The SortPair_Position method positions to an indicated pair in the list. The sort pair list is not a sequence and cannot therefore simply be accessed via a subscript. Instead an iterator must be used. This method positions the iterator on the first record that is greater than or equal to the specified record value and returns the sortpair associated with that record. Subsequent calls to SortNext will then return subsequent records from the sorted list. This method has the following parameters:

Parameter Description
This Specifies the handle for the information structure controlling the sort pair list.
PostSet Specifies a user allocated structure tPostSet that controls the movement though pair postings.
value Specifies the value being sought in the list. It must be of the type specified when the list was created.

If there is no pair in the sorted list that is greater than or equal to the indicated value, then a NULL is returned; else a pointer to that pair is returned.

The method SortPair_Open

Prototype


void SortPair_Open(void* This,void* listStore,int baseIndex,int blkSize);
The method SortPair_Open initializes a structure which describes the sort pair and reads the initial block for that sort pair from the LongMemory area. This method has the following parameters:

Parameter Description
This Contains the structure controlling the sort pair. The memory for this structure must be allocated and managed by the caller.
listStore Specifies the handle for the LongMemory area that contains the sort pair itself.
baseIndex Specifies the location in the LongMemory area of the initial block of the sort pair.
blkSize Specifies the block size to be used for the sorted values. If zero, then the value of LongMemory_GetPostingSize() is used.


The method SortPair_Rewind

Prototype


void* SortPair_Rewind(void* This,tPosting* PostSet);
The SortPair_Rewind method moves to the first pair in a sorted list. A sort pair list is not a sequence and cannot therefore simply be accessed via a subscript. Instead an iterator must be used. First this method is used to initialize the iterator so that it points to the initial pair in the sorted list. Then the SortPair_Next method is used to obtain the subsequent pairs. This method has the following parameters:

Parameter Description
This Specifies the handle for the information structure controlling the sort pair list.
PostSet Receives the information needed to iterate though the sort pair. It is a user allocated structure tPostSet.

The method returns the handle for the storage area containing the sort pair list.

The method SortPair_SetLocation

Prototype


void SortPair_SetLocation(void* This,int location1,int location2);
The SortPair_SetLocation method sets the location properties of the sort pair. These are user specific values that are stored within the sort pair control structure for convenience purposes since many applications of the sort pair logic require some additional control information. Its parameters are as follows:

Parameter Description
This Specifies the structure containing the sort pair information.
location1 Specifies the primary location value to be stored in the sort pair structure.
location2 Specifies the secondary location value to be stored in the sort pair structure.


The method SortPair_Write

Prototype


void SortPair_Write(void* This,int index,int value);
The SortPair_Write method writes a new sort pair to the list. It inserts a new index-value pair into a sorted list at the location specified by the search path stored within the control structure. This method assumes that the SortPair_Find method has already been used and that method has determined that the specified pair is not already in the list. This method does not repeat that step. It is a utility method only. All searching is performed by the SortPair_Find method. Its parameters are:

Parameter Description
This Specifies the handle for the sort pair list which contains a valid search path.
index Specifies the index to be associated with value. This is typically either a sequence number or an offset into the LongMemory area. This value is stored by this method but it is not used. It can be anything that can be stored in a 32-bit word.
value Specifies the value to be associated with index. For simple representation types this is literally a value, for more complex types it is the address of a value.


The method SortPair_WritePosting

Prototype


int SortPair_WritePosting(void* listStore,int basePoint,int index,int value);
The SortPair_WritePosing method writes a new posting to a posting pair. Its parameters are:

Parameter Description
listStore Specifies the handle that controls the LongMemory area that contains the posting pair.
basePoint Specifies the root offset of the posting pair in the LongMemory area as returned by the method SortPair_CreatePostingPair.
index Specifies the index value for the pair, This is the value used to find the pair.
value Specifies the value associated with the pair that is returned when it is found.

If all goes well, the method returns a one. If the pair already exists and has a nonzero value associated with it, the method returns a zero.
Table of Contents