CStringList

About

CStringList is dclib's mapping class. It maps a string key to a pointer to an object. For example, in dclib's userlist handling class, it maps the user's nick to their MyInfo object which stores their comment, tag, connection type, sharesize etc.

CStringList is implemented as a hashtable. However, the hash function is unlike any normal hash function. The default hash function just returns the first character of the string. Or the requested character, or zero if the string is shorter than that.

So a default CStringList divides what it stores into lists according to the first character of the key. One of these sub-lists will have to be searched (linearly, they are not sorted) to extract an object.

It gets more complicated if you create a non-default CStringList. If you create a CStringList with maxdepth 3, The first 4 (maxdepth 0 handles 1) characters of the keys will be used. Since one CStringList only handles 1 character, the first CStringList will create an array of CStringLists which will handle the next character, until the specified number of characters has been handled, and the deepest level of CStringLists will create arrays of lists.

So, for CStringList to perform well, the input must divide into fairly evenly sized lists using the first (maxdepth + 1) characters. The default CStringList has maxdepth set to 0, so requires the input to divide nicely using only the first character.

CStringLists (formerly) in use

CUserList in dclib

This used to use a CStringList with depth 2, handling an object for each user in the hub. Currently it uses a std::map, although a std::tr1::unordered_map (if available) may well be faster since the sorted property of the standard map is not being used.

DCClient in valknut

This used to have a CStringList with depth 0 handling an object for each user in the hub. It was replaced with Qt4 QHash or Qt3 QDict. Perhaps increasing the depth would have improved the speed here, at the expense of more memory used.

CFileManager in dclib

In versions before 0.3.18, a CStringList with depth 25 is used temporarily to ensure that a file is not shared twice via symlinks, and to protect against recursive symlinks. Here the CStringList is being used as a set not a map, since only the key is used, the object is either the same as the key or just an empty object, it does not matter. Unfortunately the protection is actually broken since symlinks are not "canonicalized", converted to the simplest possible absolute filename. In svn, this is fixed, and the CStringList was replaced with a std::set. But this was found to actually be slower than the CStringList it replaced. So this was replaced with a std::tr1::unordered_set, which appears to be slightly faster than the CStringList.

CQueryManager / CSearchIndex in dclib

This is currently still using CStringLists, with depth 0, as sets, to store search results before they get sent out. Converting them to use a std::set may be better because (1) the key is actually a number, and (2) it would then be possible to use set_intersection to combine the results for each substring, keeping only results that contain all the substrings so far.

Since 0.3.19, this system was re-written to use a std::tr1::unordered_map<CString, std::vector<ulonglong> > but then the whole keyword indexing system was scrapped and searching by keyword is now handled differently by CFileManager.

CConfig in dclib

CStringLists, all with depth 0, are used for lists of public hubs and bookmark hubs. They are probably not performance critical, although you may have a few thousand public hubs.

Other uses

Each "dereferencing type-punned pointer will break strict-aliasing rules" warning is where a CStringList Get method is being used, gcc does not like how the pointer is passed by reference. The alternative is to change CStringList and everything that uses it, and pass a pointer to the pointer.

To do

I must check how CStringList handles multiple items with the same key, so far all replacements do not allow multiple items per key. I think CStringList allows multiple items and the pointer parameter is used similar to CList::Next(). This is particularly important for the CAsyncDns class which handles all DNS lookups.