orngChem: A library for searching frequent molecular fragments

orngChem implements the following classes

FragmentMiner

A class for finding frequent molecular fragments

Attributes

active
list of smiles codes of active molecules
inactive
list of smiles codes of inactive molecules
minSupport
minimum frequency in the active set of the fragments to search for
maxSupport
maximum frequency in the inactive set of the fragments to search for
addWholeRings
if True rings will be added as a whole rather then atom by atom
canonicalPruning
if True a cache of all cannonical codes of all fragments will be kept to avoid redundant search
findClosed
finds only fragments that are not sub-structures of any other fragment with the same support (default: True)

Methods

Search()
Runs the fragment search algorithm and returns a list of found fragments

Example

miner = FragmentMiner(active = ["NC(C)C(=O)O", "NC(CS)C(=O)O", "NC(CO)C(=O)O"], inactive = [], minSupport = 0.6) for fragment in miner.Search(): print fragment.ToSmiles() , "Support: %.3f" %fragment.Support()

Fragment

A class representing a molecular fragment

Methods

ToOBMol()
Returns an openbabel.OBMol object representation
ToSmiles()
Returns a SMILES code representation
ToCanonicalSmiles()
Returns a canonical SMILES code representation
Support()
Returns the support of the fragment in the active set
OcurrencesIn(smiles)
Returns the number of times a fragment is containd in the molecule represented by the smiles code argument
ContainedIn(smiles)
Returns True if the fragment is present in the molecule represented by the smiles code argument

Fragmenter

An object that is used to fragment an ExampleTable

Attributes

minSupport
minimum frequency in the active set of the fragments to search for (default: 0.2)
maxSupport
maximum frequency in the inactive set of the fragments to search for (default: 0.2)
findClosed
finds only fragments that are not sub-structures of any other fragment with the same support (default: True)

Methods

__call__(data, smilesAttr, activeFunc)
Takes a data-set, and runs the FragmentMiner on it. Returns a new data-set and the fragments. The new data-set contains new attributes that represent the presence of a fragment that was found.

Arguments

data
the dataset
smilesAttr
the attribute in the data that contains the SMILES codes (if none is provided it will try to make a smart guess)
activeFunc
a function that takes an example from the data-set and returns True if the example should be considered as active (if none is provided all examples are considered active)

Example

fragmenter=Fragmenter(minSupport=0.1, maxSupport=0.05) data, fragments=fragmenter(data, "SMILES")

FragmentBasedLearner

A learner wrapper class that first runs the molecular fragmentation on the data.

Attributes

smilesAttr
Attribute in the data that contains the smiles codes (if none is provided it will try to make a smart guess)
learner
learner that will be used to actualy learn on the fragmented data (default: orngSVM.SVMLearner)
minSupport
minimum frequency in the active set of the fragments to search for
maxSupport
maximum frequency in the inactive set of the fragments to search for
activeFunc
a function that takes an example from the learning data-set and returns True if the example should be considered as active (if none is provided all examples are considered active)
findClosed
finds only fragments that are not sub-structures of any other fragment with the same support (default: True)