Misplaced Pages

Lift (data mining)

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.
Term in data mining and association rule learning For other uses, see Lift.

In data mining and association rule learning, lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. A targeting model is doing a good job if the response within the target ( T {\displaystyle T} ) is much better than the baseline ( B {\displaystyle B} ) average for the population as a whole. Lift is simply the ratio of these values: target response divided by average response. Mathematically,

lift = P ( T B ) P ( T ) = P ( T B ) P ( T ) P ( B ) {\displaystyle \operatorname {lift} ={\frac {P(T\mid B)}{P(T)}}={\frac {P(T\wedge B)}{P(T)P(B)}}}

For example, suppose a population has an average response rate of 5%, but a certain model (or rule) has identified a segment with a response rate of 20%. Then that segment would have a lift of 4.0 (20%/5%).

Applications

Typically, the modeller seeks to divide the population into quantiles, and rank the quantiles by lift. Organizations can then consider each quantile, and by weighing the predicted response rate (and associated financial benefit) against the cost, they can decide whether to market to that quantile or not.

The lift curve can also be considered a variation on the receiver operating characteristic (ROC) curve, and is also known in econometrics as the Lorenz or power curve.

Example

Assume the data set being mined is:

Antecedent Consequent
A 0
A 0
A 1
A 0
B 1
B 0
B 1

where the antecedent is the input variable that we can control, and the consequent is the variable we are trying to predict. Real mining problems would typically have more complex antecedents, but usually focus on single-value consequents.

Most mining algorithms would determine the following rules (targeting models):

  • Rule 1: A implies 0
  • Rule 2: B implies 1

because these are simply the most common patterns found in the data. A simple review of the above table should make these rules obvious.

The support for Rule 1 is 3/7 because that is the number of items in the dataset in which the antecedent is A and the consequent 0. The support for Rule 2 is 2/7 because two of the seven records meet the antecedent of B and the consequent of 1. The supports can be written as:

supp ( A 0 ) = P ( A 0 ) = P ( A ) P ( 0 A ) = P ( 0 ) P ( A 0 ) {\displaystyle \operatorname {supp} (A\Rightarrow 0)=P(A\land 0)=P(A)P(0\mid A)=P(0)P(A\mid 0)}
supp ( B 1 ) = P ( B 1 ) = P ( B ) P ( 1 B ) = P ( 1 ) P ( B 1 ) {\displaystyle \operatorname {supp} (B\Rightarrow 1)=P(B\land 1)=P(B)P(1\mid B)=P(1)P(B\mid 1)}

The confidence for Rule 1 is 3/4 because three of the four records that meet the antecedent of A meet the consequent of 0. The confidence for Rule 2 is 2/3 because two of the three records that meet the antecedent of B meet the consequent of 1. The confidences can be written as:

conf ( A 0 ) = P ( 0 A ) {\displaystyle \operatorname {conf} (A\Rightarrow 0)=P(0\mid A)}
conf ( B 1 ) = P ( 1 B ) {\displaystyle \operatorname {conf} (B\Rightarrow 1)=P(1\mid B)}

Lift can be found by dividing the confidence by the unconditional probability of the consequent, or by dividing the support by the probability of the antecedent times the probability of the consequent, so:

  • The lift for Rule 1 is (3/4)/(4/7) = (3*7)/(4 * 4) = 21/16 ≈ 1.31
  • The lift for Rule 2 is (2/3)/(3/7) = (2*7)/(3 * 3) = 14/9 ≈ 1.56
lift ( A 0 ) = P ( 0 A ) P ( 0 ) = P ( A 0 ) P ( A ) P ( 0 ) {\displaystyle \operatorname {lift} (A\Rightarrow 0)={\frac {P(0\mid A)}{P(0)}}={\frac {P(A\land 0)}{P(A)P(0)}}}
lift ( B 1 ) = P ( 1 B ) P ( 1 ) = P ( B 1 ) P ( B ) P ( 1 ) {\displaystyle \operatorname {lift} (B\Rightarrow 1)={\frac {P(1\mid B)}{P(1)}}={\frac {P(B\land 1)}{P(B)P(1)}}}

If some rule had a lift of 1, it would imply that the probability of occurrence of the antecedent and that of the consequent are independent of each other. When two events are independent of each other, no rule can be drawn involving those two events.

If the lift is > 1, like it is here for Rules 1 and 2, that lets us know the degree to which those two occurrences are dependent on one another, and makes those rules potentially useful for predicting the consequent in future data sets.

Observe that even though Rule 1 has higher confidence, it has lower lift. Intuitively, it would seem that Rule 1 is more valuable because of its higher confidence—it seems more accurate (better supported). But accuracy of the rule independent of the data set can be misleading. The value of lift is that it considers both the confidence of the rule and the overall data set.

References

  1. Tufféry, Stéphane (2011); Data Mining and Statistics for Decision Making, Chichester, GB: John Wiley & Sons, translated from the French Data Mining et statistique décisionnelle (Éditions Technip, 2008)
  • Coppock, David S. (2002-06-21). "Why Lift?". Retrieved 2015-07-05.

See also

Category:
Lift (data mining) Add topic