The SortPair Service Class
The service class
SortPair maintains an open ended list of entry-value pairs in sort order by value. Pairs can be retrieved by their value and then the corresponding entry is immediately available. The algorithm underlying the sortpair methods is designed to make the positioning into the list at or near a particular specified value as quick as possible. Once this positioning has been achieved, the operation of writing a new entry to the list is optimal. In particular, the problem is to maintain various lists of values in sorted ascending order. The algorithm needs to be able to add new values to the list with no guarantee that those values were not already there.
To make sorting possible, values are assigned binary representation types and each type has a technique available which compares a current value to another and computes a signed comparison value. A negative value means that the current value is below the other one in their sort order; a zero value means that the two values are equal with regards to their sort order; and a positive value means that the current value is greater than the other one. Using the Java terminolgy we will refer to this technique as "compareTo(newValue,listValue)"
Given that we know how to compare individual values, an addEntryToList algorithm is now needed which will return the index in the sorted list where the new value was stored or a -1 if the value was already in the list. The brute-force way to perform this operation would be to simply scan forward in the list comparing the new value to each value in the list.
Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int addEntryToList(object list[], object new, int nList)
{
for(i = 0; i < nList; i++)
{
int iRelation = compareTo(object,list[i]));
if(iRelation == 0) return -1;
if(iRelation > 0)
{
insert object before i in list;
return i;
}
}
append object to list;
return nList;
}
|
The problem is that this brute-force approach has complexity O(n) plus the time required to do the insertion or the appending. Since the list was sorted, it should be possible to locate the value in the list using a binary search with complexity O(log n). But the standard binary search algorithm only gives the location of the entry if it is in the list. No information about where to insert an entry, if it is not in the list is supplied. So using a binary search would improve nothing for entries not yet in the list. The entire forward scan in the above would have to be repeated. The
SortPair class, therefore, uses an extension to the binary search referred to as a "search index". A "search index" is a signed integer value that can be associated with any value regardless of whether or not it presently is within the list. Its values are as follows:
Value | Meaning |
+seq | A positive search index means that the entry is presently in the list. The seq value is the sequence number relative to one of the actual position of the entry in the list. |
0 | A zero value indicates that the entry is not in the list and is below all entries in the sort order. |
-seq | A negative search index means that the entry is not presently in the list, but it is above some entries now in the list in the sort order. The seq value is the negative sequence number relative to one of that entry in the list immediately below the entry itself. |
The algorithm for computing the search index is shown below.
Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int searchIndex(Value list[],Value new,int nList)
{
int iBottom = 0;
int iMiddle = 0;
int iRelation = 0;
int iTop = nList - 1;
while(iBottom <= iTop)
{
iMiddle = (iBottom + iTop) / 2;
iRelation = compareTo(new,list[iMiddle]);
if(iRelation == 0) return iMiddle + 1;
if(iRelation < 0) iTop = iMiddle - 1;
else iBottom = iMiddle + 1;
}
if(iRelation > 0) return -(iMiddle + 1);
else if(iMiddle == 0) return 0;
return -iMiddle;
}
|
The iterative portion of the search index algorithm is identical to a normal binary search. The only difference occurs when we hit a match. Then the algorithm returns iMiddle + 1 rather than simply iMiddle. We do not diverge from the normal binary search until we establish that there is no match present for the new object. When iBottom exceeds iTop, the last examined entry, the one whose index is "iMiddle" is either that entry in the list that is directly greater than the new one or directly less. Which can be determined by the value of the relation code. If the relation code is positive then the entry at iMiddle is less that the new entry, so the return value is -(iMiddle+1). Else we now know that the last examined entry was greater than the new object. If that entry was at the front of the list then the return value is zero since the new object is less than all entries in the list. If it is not zero then the return value is simply -iMiddle.
Using the search index the addEntryToList algorithm now becomes
Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int addEntryToList(object list[], object new, int nList)
{
int searchIndex = searchIndex(list,new,nList)
if(SearchIndex > 0) return -1;
if(searchIndex < 0)
{
int i = -searchIndex;
insert object before i in list;
return i;
}
append object to list;
return nList;
}
|
The addEntryToList algorithm is independent of nList and the searchIndex algorithm has the same complexity as a binary search which is (O log nList). Therefore our revised version has complexity O(log nList) plus the time required to do the insertion or the appending.
The problem of course is within the step "insert object before i in list". This insertion requires moving the entire upstream portion of the list so that an insertion can occur. A linked list cannot be used because traversing the links would destroy the efficiency of binary search. To avoid this the list of index-value pairs is stored in a tree-structured set of fixed-sized blocks. When a block is filled it is split into two blocks. Higher level blocks replace the index component of the pair with the offset of the child block that has that value as its first entry. The search index thus becomes a search path which is the block offset of each block searched starting with the initial block in the tree. If the maximum number of levels is SORT_MAX_LEVELS (10 here) and maxPairs is the number of pairs per block (max = 31 here, average = 16) then the addEntryToList operation is never needed more than SORT_MAX_LEVELS times and never has to search a list longer than maxPairs and never has to move more than that number. With its current settings the algorthimn can store 10 to the 12th pairs and has a maximum run time of
Code Block |
---|
language | none |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
SORT_MAX_LEVELS*(searchTime(maxPairsInList)+insertTime(maxPairsInList)),
where:
SORT_MAX_LEVES = 10
maxPairsInList = 31.
|
Which this is completely independent of the length of the list.
The allocation of the control structure itself is the responsibility of the user of the sort pair list. This class is used very aggresively by applications to maintain sorted lists that are written once and then accessed repeatedly. To make these accesses as quick as possible the control structure for this class is allocated on the stack via local variable declarations in the using code.
The field SortCaseSensitive
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
extern int SortCaseSensitive;
|
The
SortCaseSensitive field specifies how string comparisons are to be performed. A zero value makes them case-insensitive, while a nonzero value makes then case-sensitive. By default this field is set to zero, case-insensitive.
The method SortPair_ChangePosting
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
void SortPair_ChangePosting(void* listStore,int basePoint,int index,int value);
|
The
SortPair_ChangePosting method changes the posting value associated with a posting index. Its parameters are as follows:
Parameter | Description |
listStore | Specifies the handle for the LongMemory area that contains the posting pair. |
basePoint | Specifies the root offset of the posting pair in the LongMemory area as returned by the method SortPair_CreatePostingPair. |
index | Specifies the index value for the pair. This is the value used to find the pair. This method assumes that a pair with the specified index already exists. |
value | Specifies the new value to be associated with the pair. |
The method SortPair_Close
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
void SortPair_Close(void* This);
|
The
SortPair_Close method closes the indicated control structure for the sort pair list. If no changes have been made to the list, then this method ends up doing nothing, since the allocation of the control structure itself is the responsibility of the user of the sort pair list. If the list has been changed, then its root information is updated in the storage area containing the actual list. Its parameter is:
Parameter | Description |
This | Specifies the handle of the information structure controlling the sort pair list. |
The method SortPair_Create
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int SortPair_Create(void* This,void* listStore,int valueType,int blkSize);
|
The
SortPair_Create method initializes a structure which describes the sort pair list and writes the initial block for that sort pair to the
LongMemory area. Note that memory for the control structure is allocated by the user of this method. This method has the following parameters:
Parameter | Description |
This | Specifies the handle of the information structure controlling the sort pair list. The memory for this structure most be allocated as an int vector and managed by the caller via a reference to the SortPair_Stack constant in the header. A typical declaration is "auto int sortPair[SortPair_StackSize];". |
listStore | Specifies the handle for the LongMemory area that contains the sort pair. |
valueType | Specifies the type of value being used to order the entries in the list. This must be one of the basic representation type constants REP_* from the header. |
blkSize | Specifies the block size to be used for the sorted values. If zero, then the value of LongMemory_GetPostingSize() is used. |
If all goes well then the method returns the location in the
LongMemory area of the control information vector for the sort pair list. If there is a problem, then a zero is returned.
The method SortPair_CreatePostingPair
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int SortPair_CreatePostingPair(void* listStore);
|
The
SortPair_CreatePostingPair method creates a small integer
SortPair used for looking up links and relationships between components in
LongMemory based on their defining root offsets. The information structure associated with posting pairs itself is allocated locally on the stack and is never left open; therefore, posting pairs do not have handles and are controlled entirely via their root offsets. This method has the following parameter:
Parameter | Description |
listStore | Specifies the handle for the LongMemory area that contains the posting pair. |
If all goes well, then the method returns the root offset in the
LongMemory area of the control information vector for the posting pair. If there is a problem, then a zero is returned.
The method SortPair_Find
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int* SortPair_Find(void* This,void* value,int delta);
|
The
SortPair_Find method finds a value in a sort pair list. The value as specified must be of the type specified for the list when it was created. As this runs it retains the search path used (see the discussion in the introduction). If the value is not in the list then the search path can be used to insert the value into the list very quickly since its location has already been determined. The parameters of this method are as follows:
Parameter | Description |
This | Specifies the handle for the information structure controlling the sort pair list. |
value | Specifies the value being sought in the list. It must be of the type specified when the list was created. |
delta | Specifies whether the caller wants to change the pair value. A nonzero value indicates yes, while a zero value indicates no. |
If there is a pair in the list with the specified value, then a pointer to that pair is returned. If there is no pair with a matching value then a NULL is returned.
The method SortPair_FindPosting
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int SortPair_FindPosting(void* listStore,int basePoint,int pairIndex);
|
The
SortPair_FindPosting method looks up the posting value associated with a posting index into a posting pair. The parameters of this method are as follows:
Parameter | Description |
listStore | Specifies the handle for the LongMemory area that contains the posting pair. |
basePoint | Specifies the root offset of the posting pair in the LongMemory area as returned by the method SortPair_CreatePostingPair. |
pairIndex | Specifies the index value for which a corresponding posting value is desired. |
If the posting pair contains an entry for the index, then its associated value is returned; else a zero is returned.
The method SortPair_GetLocation
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
void SortPair_GetLocation(void* This,int* location1,int* location2);
|
The
SortPair_GetLocation method gets the location properties of the sort pair. These are user specific values that are stored within the sort pair control structure for convenience purposes since many applications of the sort pair logic require some additional control information. The parameters of this method are as follows:
Parameter | Description |
This | Specifies the handle for the information structure controlling the sort pair list. |
location1 | returns the primary location value stored in the sort pair structure, if not NULL. |
location2 | returns the secondary location value stored in the sort pair structure, if not NULL. |
The method SortPair_Next
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int* SortPair_Next(void* Store,tPosting* PostSet);
|
The
SortPair_Next method moves to the next pair in a sorted list. The sort pair list is not a sequence and cannot therefore simply be accessed via a subscript. Instead an iterator must be used. First the method
SortPair_Rewind is used to initialize the iterator so that it points to the initial pair in the sorted list. Then this method is used to obtain the subsequent pairs. The parameters of this method are as follows:
Parameter | Description |
Store | Specifies the handle for the LongMemory area that contains the sorted pair list. Note that this handle is returned by the SortPair_Rewind method when it initializes the iterator. |
PostSet | Contains the user allocated structure tPostSet which controls the movement though pair postings. |
If there is another pair in the sorted list, then a pointer to that pair is returned. If there is no additional pair, then a NULL is returned.
The method SortPair_Position
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int* SortPair_Position(void* This,tPosting* PostSet,void* value);
|
The
SortPair_Position method positions to an indicated pair in the list. The sort pair list is not a sequence and cannot therefore simply be accessed via a subscript. Instead an iterator must be used. This method positions the iterator on the first record that is greater than or equal to the specified record value and returns the sortpair associated with that record. Subsequent calls to SortNext will then return subsequent records from the sorted list. This method has the following parameters:
Parameter | Description |
This | Specifies the handle for the information structure controlling the sort pair list. |
PostSet | Specifies a user allocated structure tPostSet that controls the movement though pair postings. |
value | Specifies the value being sought in the list. It must be of the type specified when the list was created. |
If there is no pair in the sorted list that is greater than or equal to the indicated value, then a NULL is returned; else a pointer to that pair is returned.
The method SortPair_Open
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
void SortPair_Open(void* This,void* listStore,int baseIndex,int blkSize);
|
The method
SortPair_Open initializes a structure which describes the sort pair and reads the initial block for that sort pair from the
LongMemory area. This method has the following parameters:
Parameter | Description |
This | Contains the structure controlling the sort pair. The memory for this structure must be allocated and managed by the caller. |
listStore | Specifies the handle for the LongMemory area that contains the sort pair itself. |
baseIndex | Specifies the location in the LongMemory area of the initial block of the sort pair. |
blkSize | Specifies the block size to be used for the sorted values. If zero, then the value of LongMemory_GetPostingSize() is used. |
The method SortPair_Rewind
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
void* SortPair_Rewind(void* This,tPosting* PostSet);
|
The
SortPair_Rewind method moves to the first pair in a sorted list. A sort pair list is not a sequence and cannot therefore simply be accessed via a subscript. Instead an iterator must be used. First this method is used to initialize the iterator so that it points to the initial pair in the sorted list. Then the
SortPair_Next method is used to obtain the subsequent pairs. This method has the following parameters:
Parameter | Description |
This | Specifies the handle for the information structure controlling the sort pair list. |
PostSet | Receives the information needed to iterate though the sort pair. It is a user allocated structure tPostSet. |
The method returns the handle for the storage area containing the sort pair list.
The method SortPair_SetLocation
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
void SortPair_SetLocation(void* This,int location1,int location2);
|
The
SortPair_SetLocation method sets the location properties of the sort pair. These are user specific values that are stored within the sort pair control structure for convenience purposes since many applications of the sort pair logic require some additional control information. Its parameters are as follows:
Parameter | Description |
This | Specifies the structure containing the sort pair information. |
location1 | Specifies the primary location value to be stored in the sort pair structure. |
location2 | Specifies the secondary location value to be stored in the sort pair structure. |
The method SortPair_Write
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
void SortPair_Write(void* This,int index,int value);
|
The
SortPair_Write method writes a new sort pair to the list. It inserts a new index-value pair into a sorted list at the location specified by the search path stored within the control structure. This method assumes that the
SortPair_Find method has already been used and that method has determined that the specified pair is not already in the list. This method does not repeat that step. It is a utility method only. All searching is performed by the
SortPair_Find method. Its parameters are:
Parameter | Description |
This | Specifies the handle for the sort pair list which contains a valid search path. |
index | Specifies the index to be associated with value. This is typically either a sequence number or an offset into the LongMemory area. This value is stored by this method but it is not used. It can be anything that can be stored in a 32-bit word. |
value | Specifies the value to be associated with index. For simple representation types this is literally a value, for more complex types it is the address of a value. |
The method SortPair_WritePosting
Prototype Code Block |
---|
language | cpp |
---|
theme | Eclipse |
---|
linenumbers | true |
---|
|
int SortPair_WritePosting(void* listStore,int basePoint,int index,int value);
|
The
SortPair_WritePosing method writes a new posting to a posting pair. Its parameters are:
Parameter | Description |
listStore | Specifies the handle that controls the LongMemory area that contains the posting pair. |
basePoint | Specifies the root offset of the posting pair in the LongMemory area as returned by the method SortPair_CreatePostingPair. |
index | Specifies the index value for the pair, This is the value used to find the pair. |
value | Specifies the value associated with the pair that is returned when it is found. |
If all goes well, the method returns a one. If the pair already exists and has a nonzero value associated with it, the method returns a zero.