Accuracy vs. Simplicity: A Complex Trade-Off

Inductive learning aims at finding general rules that hold true in a database. Targeted learning seeks rules for the prediction of the value of a variable based on the values of others, as in the case of linear or non-parametric regression analysis. Non-targeted learning finds regularities without a specific prediction goal. We model the product of non-targeted learning as rules that state that a certain phenomenon never happens, or that certain conditions necessitate another. For all types of rules, there is a trade-off between the rule's accuracy and its simplicity. Thus rule selection can be viewed as a choice problem, among pairs of degree of accuracy and degree of complexity. However, one cannot in general tell what is the feasible set in the accuracy complexity space. Formally, we show that finding out whether a point belongs to this set is computationally hard. In particular, in the context of linear regression, finding a small set of variables that obtain a certain value of R2 is computationally hard. Computational complexity may explain why a person is not always aware of rules that, if asked, she would find valid. This, in turn, may explain why one can change other people's minds (opinions, beliefs) without providing new information.