Domain Depots

Domain depots store the descriptors for domains loaded from the files, so that they can be reused when possible. They are used internally by all file formats built in Orange (except for the basket format); you can use them if you write functions for reading other data formats. In general, you can use depots whenever you want to store and reuse domain descriptors. In even more general, you can use depots as convenient methods for constructing domains, without intention of storing them.

Methods

DomainDepot
Constructor expects (and accepts) no arguments.
prepareDomain(attribute-names[, knownAttributes, knownMetaAttributes, dontStore, dontCheckStored])
Returns a domain that corresponds to the given attribute names. The returned domain can be a new one or one retrieved from the list of domains that were already constructed by this depot.

Attribute names must be prefixed similar as in Orange's .txt file format. First comes an optional character that denotes that the attribute is meta attribute ('m') or class attribute ('c'). Only one attribute can be marked as class attribute. Then follows the obligatory type character, 'D', 'C' or 'S' for discrete, continuous or string attributes. The next character must be '#' and what remains is the actual attribute name. For instance, if attribute names are given as ['mS#name', 'C#age', 'D#gender', 'D#race', 'cC#total', 'mS#SSN'], the constructed domain will have three attribute (continuous "age" followed by discrete "gender" and "race") and a continuous class attribute "total"; besides, there will be two meta attributes, strings with name and SSN.

knownAttributes is an optional argument, a list of attributes that can be reused in case the domain is not found among the stored domains. Similarly, knownMetaAttributes provides a dictionary of known meta attributes, with IDs as keys and Variables as values.

dontStore and dontCheckStored are flags that prevent the function searching among the stored domains and storing the new domain, respectively.

Function returns a tuple, with the first element being the constructed domain, the second is the list of IDs assigned to meta attributes (in the same order as they appear in the list of attribute names) and the last telling whether the domain is constructed anew or retrieved from the existing.

checkDomain(attribute-names, domain)
With attribute names in the same format as above, this function checks whether they correspond to the given domain. As a matter of fact, prepareDomain first calls checkDomain for each stored domain and returns the first for which the comparison is successful (if none is found, new domain is constructed). In your programs, you might want to use this function when the user is proposing a domain to be reused.

The function returns a tuple. The first element tells whether the domain matches. The second element is a list which, if the domain matches, contains a list of meta attribute ids, just as the one returned by prepareDomain.

Note that although this is a method of class DomainDepot it does not use any of its data. It's there only for convenience in the C++ code (where it is declared as a static member).

Examples

Depots are generally used for constructing domains, like this.

part of domainDepot.py

de = orange.DomainDepot() names = ['mS#SSN', 'C#gender', 'D#race', 'cC#total'] domain, metaIDs, isNew = de.prepareDomain(names) print "Domain:", domain print "IDs of meta attributes: ", metaIDs print "Is new: ", isNew and "yes" or "no"

This will print

Domain: [age, gender, race, total], {-2:name, -3:SSN} IDs of meta attributes: [-2, -3] Is new: yes

The domain is as expected. In the list of IDs of meta attributes, each element corresponds to a meta attribute in the same order as they are given in the list of names. Here, -2 corresponds to the first ('mS#name') and -3 to the second ('mS#SSN').

If we call the function again, but with the order of the meta attributes changed,

names = ['mS#SSN', 'mS#name', 'C#age', 'D#gender', 'D#race', 'cC#total'] domain, metaIDs, isNew = de.prepareDomain(names)

the domain is reused (the order of meta attributes is irrelevant), thus isNew is true; metaIDs now equal [-3, -2] since the first meta attribute in the list got the ID -3 and the second -2. If you don't find this useful, wait till you program your own routines for reading data from files.

On the other hand, if you change the order, type or name of one of the attributes, a new domain is constructed altogether and new meta id's are constructed for meta attributes.

With the first two optional arguments, we can request reuse of attributes even when a new domain is constructed.

part of domainDepot.py

names = ['mS#SSN', 'D#gender', 'C#race', 'cC#total'] domain2, metaIDs2, isNew2 = de.prepareDomain(names, domain.attributes, domain.getmetas()) print "IDs of meta attributes: ", metaIDs print "Is new? ", bool(isNew) for name in names: undname = name.split("#")[1] print "Is '%s' same?" % undname, domain[undname] == domain2[undname] print

Here we simply told the prepareDomain to use whatever useful it finds among the domain's attributes and meta-attributes. Printout reveals that although the domain descriptor is new, attribute descriptors for 'SSN', 'gender' are reused, while 'race' is not since it changed the type to continuous, and 'total' is not since we've only given the domain.attribute which does not include the class attribute (but we could have used domain.variables or domain.attributes + [domain.classVar] instead). The ID for meta attribute 'SSN' is also the same as before.

Finally, we can disable storing the domains and/or looking up for the stored by adding the two flags. Here's a little game.

part of domainDepot.py

names = ['D#v%i' % i for i in range(5)] domain1, mid, isNew = de.prepareDomain(names, None, None, 1) domain2, mid, isNew = de.prepareDomain(names, None, None) print "I constructed two same domains, but without storing the first." print "Is the second new? ", bool(isNew) print domain3, mid, isNew = de.prepareDomain(names, None, None, 0, 1) print "I've stored the second and constructed the third without looking for old domains." print "Is the third new? ", bool(isNew) print domain4, mid, isNew = de.prepareDomain(names, None, None) print "Finally, I've constructed the fourth domain, without masking anything." print "Is it new? ", bool(isNew) print print "Which one is it equal to?", for n, d in [("first", domain1), ("second", domain2), ("third", domain3)]: if d == domain4: print n, print

The domain is retrieved only for the last call. Two domains are stored - the second and the third -, which are essentially equal and are both appropriate for the fourth. Due to the order of storing, the third (the most recent) is reused.