The SortPair Service Class
The service class
SortPair maintains an open ended list of entry-value pairs in sort
order by value. Pairs can be retrieved by their value and then the corresponding entry is
immediately available. The algorithm underlying the sortpair methods is designed to make the
positioning into the list at or near a particular specified value as quick as possible. Once
this positioning has been achieved, the operation of writing a new entry to the list is
optimal. In particular, the problem is to maintain various lists of values in sorted
ascending order. The algorithm needs to be able to add new values to the list with no
guarantee that those values were not already there.
To make sorting possible, values are assigned binary representation types and each type has a
technique available which compares a current value to another and computes a signed
comparison value. A negative value means that the current value is below the other one in
their sort order; a zero value means that the two values are equal with regards to their sort
order; and a positive value means that the current value is greater than the other one.
Using the Java terminolgy we will refer to this technique as "compareTo(newValue,listValue)"
Given that we know how to compare individual values, an addEntryToList algorithm is now
needed which will return the index in the sorted list where the new value was stored or a -1
if the value was already in the list. The brute-force way to perform this operation would be
to simply scan forward in the list comparing the new value to each value in the list.
int addEntryToList(object list[], object new, int nList)
{
for(i = 0; i < nList; i++)
{
int iRelation = compareTo(object,list[i]));
if(iRelation == 0) return -1;
if(iRelation > 0)
{
insert object before i in list;
return i;
}
}
append object to list;
return nList;
}
The problem is that this brute-force approach has complexity O(n) plus the time required to
do the insertion or the appending. Since the list was sorted, it should be possible to locate
the value in the list using a binary search with complexity O(log n). But the standard binary
search algorithm only gives the location of the entry if it is in the list. No information
about where to insert an entry, if it is not in the list is supplied. So using a binary
search would improve nothing for entries not yet in the list. The entire forward scan in the
above would have to be repeated. The
SortPair class, therefore, uses an extension
to the binary search referred to as a "search index". A "search index" is a signed integer
value that can be associated with any value regardless of whether or not it presently is
within the list. Its values are as follows:
Value | Meaning
|
+seq | A positive search index means that the entry is presently in the list. The seq
value is the sequence number relative to one of the actual position of the entry in
the list.
|
0 | A zero value indicates that the entry is not in the list and is below all entries in
the sort order.
|
-seq | A negative search index means that the entry is not presently in the list, but it is
above some entries now in the list in the sort order. The seq value is the negative
sequence number relative to one of that entry in the list immediately below the entry itself.
|
The algorithm for computing the search index is shown below.
int searchIndex(Value list[],Value new,int nList)
{
int iBottom = 0;
int iMiddle = 0;
int iRelation = 0;
int iTop = nList - 1;
while(iBottom <= iTop)
{
iMiddle = (iBottom + iTop) / 2;
iRelation = compareTo(new,list[iMiddle]);
if(iRelation == 0) return iMiddle + 1;
if(iRelation < 0) iTop = iMiddle - 1;
else iBottom = iMiddle + 1;
}
if(iRelation > 0) return -(iMiddle + 1);
else if(iMiddle == 0) return 0;
return -iMiddle;
}
The iterative portion of the search index algorithm is identical to a normal binary search.
The only difference occurs when we hit a match. Then the algorithm returns iMiddle + 1 rather
than simply iMiddle. We do not diverge from the normal binary search until we establish that
there is no match present for the new object. When iBottom exceeds iTop, the last examined
entry, the one whose index is "iMiddle" is either that entry in the list that is directly
greater than the new one or directly less. Which can be determined by the value of the
relation code. If the relation code is positive then the entry at iMiddle is less that the
new entry, so the return value is -(iMiddle+1). Else we now know that the last examined entry
was greater than the new object. If that entry was at the front of the list then the return
value is zero since the new object is less than all entries in the list. If it is not zero
then the return value is simply -iMiddle.
Using the search index the addEntryToList algorithm now becomes
int addEntryToList(object list[], object new, int nList)
{
int searchIndex = searchIndex(list,new,nList)
if(SearchIndex > 0) return -1;
if(searchIndex < 0)
{
int i = -searchIndex;
insert object before i in list;
return i;
}
append object to list;
return nList;
}
The addEntryToList algorithm is independent of nList and the searchIndex algorithm has the
same complexity as a binary search which is (O log nList). Therefore our revised version has
complexity O(log nList) plus the time required to do the insertion or the appending.
The problem of course is within the step "insert object before i in list". This insertion
requires moving the entire upstream portion of the list so that an insertion can occur. A
linked list cannot be used because traversing the links would destroy the efficiency of
binary search. To avoid this the list of index-value pairs is stored in a tree-structured set
of fixed-sized blocks. When a block is filled it is split into two blocks. Higher level
blocks replace the index component of the pair with the offset of the child block that has
that value as its first entry. The search index thus becomes a search path which is the
block offset of each block searched starting with the initial block in the tree. If the
maximum number of levels is SORT_MAX_LEVELS (10 here) and maxPairs is the number of pairs
per block (max = 31 here, average = 16) then the addEntryToList operation is never needed
more than SORT_MAX_LEVELS times and never has to search a list longer than maxPairs and never
has to move more than that number. With its current settings the algorthimn can store 10 to
the 12th pairs and has a maximum run time of
SORT_MAX_LEVELS*(searchTime(maxPairsInList)+insertTime(maxPairsInList)),
where:
SORT_MAX_LEVES = 10
maxPairsInList = 31.
Which this is completely independent of the length of the list.
The allocation of the control structure itself is the responsibility of the user of the sort
pair list. This class is used very aggresively by applications to maintain sorted lists that
are written once and then accessed repeatedly. To make these accesses as quick as possible
the control structure for this class is allocated on the stack via local variable
declarations in the using code.
The field SortCaseSensitive
Prototype
extern int SortCaseSensitive;
The
SortCaseSensitive field specifies how string comparisons are to be performed. A
zero value makes them case-insensitive, while a nonzero value makes then case-sensitive. By
default this field is set to zero, case-insensitive.
The method SortPair_ChangePosting
Prototype
void SortPair_ChangePosting(void* listStore,int basePoint,int index,int value);
The
SortPair_ChangePosting method changes the posting value associated with a posting
index. Its parameters are as follows:
Parameter | Description
|
listStore | Specifies the handle for the LongMemory area that contains the posting
pair.
|
basePoint | Specifies the root offset of the posting pair in the LongMemory area
as returned by the method SortPair_CreatePostingPair.
|
index | Specifies the index value for the pair. This is the value used to find the pair.
This method assumes that a pair with the specified index already exists.
|
value | Specifies the new value to be associated with the pair.
|
The method SortPair_Close
Prototype
void SortPair_Close(void* This);
The
SortPair_Close method closes the indicated control structure for the sort pair
list. If no changes have been made to the list, then this method ends up doing nothing,
since the allocation of the control structure itself is the responsibility of the user of the
sort pair list. If the list has been changed, then its root information is updated in the
storage area containing the actual list. Its parameter is:
Parameter | Description
|
This | Specifies the handle of the information structure controlling the sort pair list.
|
The method SortPair_Create
Prototype
int SortPair_Create(void* This,void* listStore,int valueType,int blkSize);
The
SortPair_Create method initializes a structure which describes the sort pair list
and writes the initial block for that sort pair to the
LongMemory area. Note that
memory for the control structure is allocated by the user of this method. This method has the
following parameters:
Parameter | Description
|
This | Specifies the handle of the information structure controlling the sort pair list.
The memory for this structure most be allocated as an int vector and managed by
the caller via a reference to the SortPair_Stack constant in the header.
A typical declaration is "auto int sortPair[SortPair_StackSize];".
|
listStore | Specifies the handle for the LongMemory area that contains the sort pair.
|
valueType | Specifies the type of value being used to order the entries in the list. This
must be one of the basic representation type constants REP_* from the header.
|
blkSize | Specifies the block size to be used for the sorted values. If zero, then the
value of LongMemory_GetPostingSize() is used.
|
If all goes well then the method returns the location in the
LongMemory area of the
control information vector for the sort pair list. If there is a problem, then a zero is
returned.
The method SortPair_CreatePostingPair
Prototype
int SortPair_CreatePostingPair(void* listStore);
The
SortPair_CreatePostingPair method creates a small integer
SortPair used
for looking up links and relationships between components in
LongMemory based on
their defining root offsets. The information structure associated with posting pairs itself
is allocated locally on the stack and is never left open; therefore, posting pairs do not
have handles and are controlled entirely via their root offsets. This method has the
following parameter:
Parameter | Description
|
listStore | Specifies the handle for the LongMemory area that contains the posting
pair.
|
If all goes well, then the method returns the root offset in the
LongMemory area of
the control information vector for the posting pair. If there is a problem, then a zero is
returned.
The method SortPair_Find
Prototype
int* SortPair_Find(void* This,void* value,int delta);
The
SortPair_Find method finds a value in a sort pair list. The value as specified
must be of the type specified for the list when it was created. As this runs it retains
the search path used (see the discussion in the introduction). If the value is not in the
list then the search path can be used to insert the value into the list very quickly since
its location has already been determined. The parameters of this method are as follows:
Parameter | Description
|
This | Specifies the handle for the information structure controlling the sort pair
list.
|
value | Specifies the value being sought in the list. It must be of the type specified
when the list was created.
|
delta | Specifies whether the caller wants to change the pair value. A nonzero value
indicates yes, while a zero value indicates no.
|
If there is a pair in the list with the specified value, then a pointer to that pair is
returned. If there is no pair with a matching value then a NULL is returned.
The method SortPair_FindPosting
Prototype
int SortPair_FindPosting(void* listStore,int basePoint,int pairIndex);
The
SortPair_FindPosting method looks up the posting value associated with a posting
index into a posting pair. The parameters of this method are as follows:
Parameter | Description
|
listStore | Specifies the handle for the LongMemory area that contains the posting
pair.
|
basePoint | Specifies the root offset of the posting pair in the LongMemory area as
returned by the method SortPair_CreatePostingPair.
|
pairIndex | Specifies the index value for which a corresponding posting value is desired.
|
If the posting pair contains an entry for the index, then its associated value is returned;
else a zero is returned.
The method SortPair_GetLocation
Prototype
void SortPair_GetLocation(void* This,int* location1,int* location2);
The
SortPair_GetLocation method gets the location properties of the sort pair. These
are user specific values that are stored within the sort pair control structure for
convenience purposes since many applications of the sort pair logic require some additional
control information. The parameters of this method are as follows:
Parameter | Description
|
This | Specifies the handle for the information structure controlling the sort pair
list.
|
location1 | returns the primary location value stored in the sort pair structure, if
not NULL.
|
location2 | returns the secondary location value stored in the sort pair structure, if
not NULL.
|
The method SortPair_Next
Prototype
int* SortPair_Next(void* Store,tPosting* PostSet);
The
SortPair_Next method moves to the next pair in a sorted list. The sort pair list is
not a sequence and cannot therefore simply be accessed via a subscript. Instead an iterator
must be used. First the method
SortPair_Rewind is used to initialize the iterator so
that it points to the initial pair in the sorted list. Then this method is used to obtain the
subsequent pairs. The parameters of this method are as follows:
Parameter | Description
|
Store | Specifies the handle for the LongMemory area that contains the sorted pair
list. Note that this handle is returned by the SortPair_Rewind method when
it initializes the iterator.
|
PostSet | Contains the user allocated structure tPostSet which controls the movement
though pair postings.
|
If there is another pair in the sorted list, then a pointer to that pair is returned. If
there is no additional pair, then a NULL is returned.
The method SortPair_Position
Prototype
int* SortPair_Position(void* This,tPosting* PostSet,void* value);
The
SortPair_Position method positions to an indicated pair in the list. The sort pair
list is not a sequence and cannot therefore simply be accessed via a subscript. Instead an
iterator must be used. This method positions the iterator on the first record that is greater
than or equal to the specified record value and returns the sortpair associated with that
record. Subsequent calls to SortNext will then return subsequent records from the sorted list.
This method has the following parameters:
Parameter | Description
|
This | Specifies the handle for the information structure controlling the sort pair
list.
|
PostSet | Specifies a user allocated structure tPostSet that controls the movement
though pair postings.
|
value | Specifies the value being sought in the list. It must be of the type specified
when the list was created.
|
If there is no pair in the sorted list that is greater than or equal to the indicated value,
then a NULL is returned; else a pointer to that pair is returned.
The method SortPair_Open
Prototype
void SortPair_Open(void* This,void* listStore,int baseIndex,int blkSize);
The method
SortPair_Open initializes a structure which describes the sort pair and
reads the initial block for that sort pair from the
LongMemory area. This method has
the following parameters:
Parameter | Description
|
This | Contains the structure controlling the sort pair. The memory for this structure
must be allocated and managed by the caller.
|
listStore | Specifies the handle for the LongMemory area that contains the sort pair
itself.
|
baseIndex | Specifies the location in the LongMemory area of the initial block of
the sort pair.
|
blkSize | Specifies the block size to be used for the sorted values. If zero, then the
value of LongMemory_GetPostingSize() is used.
|
The method SortPair_Rewind
Prototype
void* SortPair_Rewind(void* This,tPosting* PostSet);
The
SortPair_Rewind method moves to the first pair in a sorted list. A sort pair list
is not a sequence and cannot therefore simply be accessed via a subscript. Instead an
iterator must be used. First this method is used to initialize the iterator so that it
points to the initial pair in the sorted list. Then the
SortPair_Next method is used to
obtain the subsequent pairs. This method has the following parameters:
Parameter | Description
|
This | Specifies the handle for the information structure controlling the sort pair
list.
|
PostSet | Receives the information needed to iterate though the sort pair. It is a user
allocated structure tPostSet.
|
The method returns the handle for the storage area containing the sort pair list.
The method SortPair_SetLocation
Prototype
void SortPair_SetLocation(void* This,int location1,int location2);
The
SortPair_SetLocation method sets the location properties of the sort pair. These
are user specific values that are stored within the sort pair control structure for
convenience purposes since many applications of the sort pair logic require some additional
control information. Its parameters are as follows:
Parameter | Description
|
This | Specifies the structure containing the sort pair information.
|
location1 | Specifies the primary location value to be stored in the sort pair structure.
|
location2 | Specifies the secondary location value to be stored in the sort pair structure.
|
The method SortPair_Write
Prototype
void SortPair_Write(void* This,int index,int value);
The
SortPair_Write method writes a new sort pair to the list. It inserts a new
index-value pair into a sorted list at the location specified by the search path stored
within the control structure. This method assumes that the
SortPair_Find method has
already been used and that method has determined that the specified pair is not
already in the list. This method does not repeat that step. It is a utility method only. All
searching is performed by the
SortPair_Find method. Its parameters are:
Parameter | Description
|
This | Specifies the handle for the sort pair list which contains a valid search path.
|
index | Specifies the index to be associated with value. This is typically either a
sequence number or an offset into the LongMemory area. This value is stored
by this method but it is not used. It can be anything that can be stored in a
32-bit word.
|
value | Specifies the value to be associated with index. For simple representation types
this is literally a value, for more complex types it is the address of a value.
|
The method SortPair_WritePosting
Prototype
int SortPair_WritePosting(void* listStore,int basePoint,int index,int value);
The
SortPair_WritePosing method writes a new posting to a posting pair. Its parameters
are:
Parameter | Description
|
listStore | Specifies the handle that controls the LongMemory area that contains the
posting pair.
|
basePoint | Specifies the root offset of the posting pair in the LongMemory area as
returned by the method SortPair_CreatePostingPair.
|
index | Specifies the index value for the pair, This is the value used to find the pair.
|
value | Specifies the value associated with the pair that is returned when it is found.
|
If all goes well, the method returns a one. If the pair already exists and has a nonzero
value associated with it, the method returns a zero.