generate a rule based classifer learned from input data.
Author: | Staal A. Vinterbo |
---|---|
Copyright: | 2009 Staal A. Vinterbo |
Version: | generated for 0.07 |
Availability: | GPL |
URI: | rgen.py (Fuzzy extension: rgenfuzzy) |
Let m be a data matrix (list of rows) where columns represent attributes and rows represent the attribute values for objects. Then:
X, A = makeX(m), makeA(m) P = makeP(X) I = greedy_cover([P(a) for a in A(m)]))
computes (currently somewhat inefficiently, due to the size of P(a)) the approximation of a minimum index set I such that distinct rows in m remain distinct when m is projected onto the columns indexed by I.
Unless A is significantly larger than X, the same can be computed more efficiently as:
mda(A, X)
If we consider the last column in m to be the class labels, the code:
X, A = makeX(m), makeA(m) S = makeS(X) z = X[0] I = greedy_cover([S(z)(a) & S(z)(A[-1]) for a in A[:-1]])
computes the index set I of attributes from which we compute the antecedent for the rule instantiated from the first row in m.
Unless A is significantly larger than X, the same can be computed more efficiently as:
I = mdar(A[:-1], X, z, A[-1])
Continuing:
exec(descriptor_code(A,X)) r = eval(rule_descriptor_code(I,z))
computes a function r that when when applied to a data point p yields a tuple (o,v) where o is the class label and v is the rule strength. Example:
r(m[0])
applies r to the first row in m.
Also, if m is:
m = [[1, 2, 0], [1, 3, 1], [2, 2, 1]]
the code:
print(rule_descriptor_code(I,z))
prints:
lambda p,f = (lambda x:x) : (0, min(f(map(lambda (a, av, pv): descriptor(a,av)(pv), zip([0, 1], [1, 2], [p[i] for i in [0, 1]])))))
The f argument in the lambda above allows processing of the list of values returned by the individual propositions in the rule antecedent. An example use is to implement partial antecedent matching in that a rule is allowed to apply to a point even though it is not contained in one or more of the antecedent propositions' truth sets.
The code in this module allows the more or less direct implementation of the theory description. One change is that P(a)(z) is more efficiently computed as S(z)(a). Furthermore, the equivalence predicate takes the symbol '?' for missing values into account.
Run as a standalone program this module will compute a rule based classifier and print a python program that implements it.
Note that the rectangular data set required as input is assumed to contain attribute names on the first line. Each subsequent line contains the attribute values for an object. The last attribute, i.e., last column, is taken to contain class labels.
Try (assuming a shell command line):
$ python rgen.py -h $ python rgen.py -e | less $ python rgen.py dataset.txt | less $ python rgen.py dataset.txt | python - h $ python rgen.py dataset.txt | python - dataset.txt $ python rgen.py dataset.txt | python - -c dataset.txt
To generate a html version of this short explanation:
$ python rgen.py -e | rst2html > explanation.html
rst2html is a part of the python docutils package http://docutils.sourceforge.net/docs/
Note that the standalone program can also attempt to reduce the number of rules by filtering. This filtering tries to cover all instances in the training data with as few rules that have the same label in the antecedent as the instances they cover.
If there is a loadable module rgenfuzzy then the option of generating fuzzy rules is enabled.
With this module loaded the fuzzy rules are computed much in the same way as described in the paper:
Vinterbo, S.A., Kim, E. & Ohno-Machado, L Small, fuzzy and interpretable gene expression based classifiers. Bioinformatics Vol. 21(9), pp. 1964-1970, 2005.
Let U be a set of objects, let X be a subset of U, and let A be a set of functions from U to some set. Further let l be function from U to a set L of class labels.
The goal is that only given the restriction of l to X, we wish to represent an as large extension of l as possible using X. We want to represent this extension as classification rules for elements in U using A. In order to generalize as best as possible we adopt the strategy of creating as short rule antecedents as possible. A principled way of going about this is as follows.
Let a rule be a function r on X that when applied to an element x returns a tuple (l, v) where l is the class label associated with the consequent of the rule and v is the value of 'applying' the antecedent of the rule.
We will conveniently represent rule r as a tuple r = (c, alpha) of antecedent and class label, respectively. We can then, abusing notation slightly, compute r(x) as:
r(x) = (c, alpha(x)).
This computation can be interpreted as assigning c to x with strength alpha(x). The tuples resulting from an application of rules in a set of rules can be combined to yield an assignment of strengths to the possible class labels. This assignment can in turn be used to make an assignment of label to x.
By a proposition over U we mean a characteristic function of some subset of U.
A rule antecedent alpha is constructed as a conjunction of such propositions. We can view such a conjunction as a set. If alpha is the conjunction:
p_i and p_j and ... and p_k
we can represent alpha as a set:
alpha = {p_i, p_j, ..., p_k}
and (again abusing notation) compute alpha(x) as:
min {p(x) | p in alpha}.
Given a function a in A and an element x in U we can form a proposition p(a,x) over U that is the characteristic function of:
T(p(a,x)) = { y in U | a(y) = a(x) },
i.e, p(a,x)(z) = 1 iff z lies in T(p(a,x)).
Given a subset A' of A and a fixed element z in X, we can form a conjunction ant(z) as:
ant(z) = {p(a, z) | a in A'}.
(Recall that we compute ant(z)(x) for some x as min {p(x) | p in ant(z)}).
We will translate the the wish of a maximal extension of the classifier, or labelling function, l, into a criterion for rule r = (c, ant(z)):
the antecedent should be as short as possible while not being applicable to any elements y in X that have a different class label, i.e., we require that ant(y) = 0 if l(y) != l(z) (if y and z are discernable by A, that is).
Let f be any function on X (note that we assume that we have a predicate '=' that lets us determine equivalence and discernibility for each point in the image of X under f), then we define the set of pairs of elements from X that we can discern between by the use of f as:
P(f) = {(x,y) in X^2 | not f(x) = f(y) }
Given an element x in X we have that the set of elements in Y that are discernable from x by f as:
P(f)(x) = { y in X | not f(x) = f(y) }
The criterion on ant(z) using A' can then be formulated in terms of X as as:
| D(A') | / | D(A) | >= t
where:
D(B) = intersection {union {P(a)(z) | a in B}, P(l)(z)}.
If t = 1 then the requirement is strict. However, we can avoid overfitting if t < 1. Doing that reduces the correctness of the rule on X however. A more detailed description of the function of t above can be found in:
Vinterbo, S. & Ohrn, A. Minimal Approximate Hitting Sets and Rule Templates. International Journal of Approximate Reasoning, Vol. 25(2), pp. 123-143, 2000.
Find the above IJAR paper here.
Finding a minimum A' such that ant(y) = 0 if l(y) != l(z) is provably NP-hard (for a proof of this and approximation properties of the case where l is injective see here).
We can transform it to a standard set covering problem for the following collection of sets:
[P(a)(z) & P(l)(z) | a in A].
Let set_cover(z) be a function that computes the above cover, then a rule r can be computed by:
r = (l(z), {p(a_i, z) | i in set_cover([P(a)(z) & P(l)(z) | a in A])}).