Intro to Data Mining

Chapter 3, exercises in 3.11
5. Consider the following data set for a binary class problem.
A B Class Label
T F +
T T +
T T +
T F −
T T +
F F −
F F −
F F −
T T −
T F −
a. Calculate the information gain when splitting on A and B. Which
attribute would the decision tree induction algorithm choose?
b. Calculate the gain in the Gini index when splitting on A and B.
Which attribute would the decision tree induction algorithm
choose?
c. Figure 3.11 shows that entropy and the Gini index are both
monotonically increasing on the range [0, 0.5] and they are both
monotonically decreasing on the range [0.5, 1]. Is it possible that
information gain and the gain in the Gini index favor different
attributes? Explain.

7. Consider the following set of training examples.
X Y Z No. of Class C1 Examples No. of Class C2 Examples
0 0 0 5 40
0 0 1 0 15
0 1 0 10 5
0 1 1 45 0
1 0 0 10 5
1 0 1 25 0
1 1 0 5 20
1 1 1 0 15
a. Compute a two-level decision tree using the greedy approach
described in this chapter. Use the classification error rate as the
criterion for splitting. What is the overall error rate of the induced
tree?
b. Repeat part (a) using X as the first splitting attribute and then
choose the best remaining attribute for splitting at each of the two
successor nodes. What is the error rate of the induced tree?
c. Compare the results of parts (a) and (b). Comment on the suitability
of the greedy heuristic used for splitting attribute selection.

8. The following table summarizes a data set with three attributes A, B,
C and two class labels +, −. Build a two-level decision tree.
A B C
Number of Instances
+ −
T T T 5 0
F T T 0 20
T F T 20 0
F F T 0 5
T T F 0 0
F T F 25 0
T F F 0 0
F F F 0 25
a. According to the classification error rate, which attribute would be
chosen as the first splitting attribute? For each attribute, show the
contingency table and the gains in classification error rate.
b. Repeat for the two children of the root node.
c. How many instances are misclassified by the resulting decision
tree?
d. Repeat parts (a), (b), and (c) using C as the splitting attribute.
e. Use the results in parts (c) and (d) to conclude about the greedy
nature of the decision tree induction algorithm.

find the cost of your paper

B6028T wk2 assignment ranju lewis

  External Environmental Scan In order to develop effective strategies, it is critical to  understand the marketplace environment. In this assignment, you will  explore the relationship between marketplace positioning based….

RESOURCES AVAILABLE FOR VULNERABLE POPULATION

  Understanding what resources are available for vulnerable populations is an important aspect of holistic patient care. Many patients lack access to affordable healthcare, have a low income, poor health….

Need 2 separate for BUS402 Strategic Management & Business Policy details below

Discussion 1 Prior to beginning this discussion, review section 7.4 Contingency Planning in your textbook. Strategic plans are focused on current and future company goals, therefore changes in the environment….