minimum k union algorithm implementation
Author: | Staal A. Vinterbo |
---|---|
Copyright: | 2009 Staal A. Vinterbo |
Version: | generated for 0.01 |
Availability: | GPL |
URI: | minkunion.py |
Let m be a list of lists representation of a collection of sets, and let k be a positive integer. Then:
from minkunion import minkunion d = minkunion(m, k)
produces a list of elements that is an approximation of the minimum union of k elements from m.
If we let X be list of lists, each of which represents a row in a data matrix, and we let k be a positive integer, then:
from minkunion import kambiguate Y = kambiguate(X, k)
produces a k-ambiguated version of X in Y by cell-suppression. This means that each list in Y matches at least k lists in X if we interpret '?' as representing any possible value.
Run as a standalone program this module will perform k-ambiguation on the input data set.
Note that the rectangular data set required as input is assumed to contain attribute names on the first line. Each subsequent line contains the attribute values for an object. The last attribute, i.e., last column, is taken to contain class labels.
Try (assuming a shell command line):
$ python minkunion.py -h $ python minkunion.py -e | less $ python minkunion.py dataset.txt | less $ python minkunion.py -k 3 dataset.txt | less
To generate a html version of this short explanation:
$ python minkunion.py -e | rst2html > explanation.html
rst2html is a part of the python docutils package http://docutils.sourceforge.net/docs/
The minimum k union of a collection of sets is the set of smallest cardinality obtainable by taking the union of k members of the collection.
This module contains a simple greedy algorithm that iteratively selects the set that contributes the least to the union. This algorithm is used to implement k-ambiguation by cell suppression.
Details in:
Vinterbo, S. A. A Stab at Approximating Minimum Subadditive Join Algorithms and Data Structures: 10th International Workshop, WADS 2007, Halifax, Canada, August 15-17, 2007 Proceedings, Springer Verlag, 2007