(10) Consider the context of selecting the best attribute for decision tree construction. Explain briefly the difference between “information gain” and “gini ratio” metrics for selecting the best attributes. Do not write any formulas. Explain in a language only.
Blog
I want to create an ensemble of seven decision trees, using…
I want to create an ensemble of seven decision trees, using bagging, for a training dataset. (a) (8) Write briefly the steps you will use to generate this ensemble. (b) (4) Why do you think an ensemble generally works better than a single decision tree?
Discuss whether or not each of the following activitiy is a…
Discuss whether or not each of the following activitiy is a data miningtask. Monitoring the heart rate of a patient for abnormalities.
Consider the following three data vectors. D1: (5, 6, 7, 9),…
Consider the following three data vectors. D1: (5, 6, 7, 9), D2:(6, 9, 10, 14), and D3: (4, 6, 2, 3). (a) (3) What are the Manhattan distances for the data-point pairs: D1-D2: D1-D3: D2-D3: (b) (5) What are the cosine similarities for the data-point pairs: D1-D2: D1-D3: D2-D3: (c) (2) Which two pints are the closest as per the Manhattan distance? (d) (2) Which two points are the closest as per the Cosine similarity?
(5) What is a kernel-trick and how does it help us in design…
(5) What is a kernel-trick and how does it help us in designing a non-linear SVM? (5) What is the advantage of a Non-Linear SVM over a Linear-SVM?
You are approached by the marketing director of a local comp…
You are approached by the marketing director of a local company, who believes that he has devised a foolproof way to measure customer satisfaction.He explains his scheme as follows: “It’s so simple that I can’t believe that no one has thought of it before. I just keep track of the number of customer complaints about each product. I read in a data mining book that counts are ratio attributes, and so, my measure of product satisfaction must be a ratio attribute. But when I rated the products based on my new customer satisfaction measure and showed them to my boss, he told me that I had overlooked the obvious and that my measure was worthless. I think that he was just mad because our best-selling product had the worst satisfaction since it had the most complaints. Could you help me set him straight?” What can you say about the attribute type of the original product satisfaction attribute?
Discuss whether or not each of the following activity is a d…
Discuss whether or not each of the following activity is a data mining task. Extracting the frequencies of a sound wave.
Let us say the cost of a False Positive is 6 units and the c…
Let us say the cost of a False Positive is 6 units and the cost of a False-negative is 2 units. In this context answer the following. (a) (6) How would you adjust the decision tree algorithm so that its performance on unseen cases minimizes the total expected cost instead of maximizing the accuracy? (b) (6) Recall the Support Vector Machine formulation discussed in class, specifically the case in which we minimize the cost of misclassifications using a constant parameter C. Suggest a solution for learning an SVM classifier in which the cost of the two types of errors are different. Do not write any formulas, describe your ideas in language.
Consider the following training data for a perceptron: X. Y…
Consider the following training data for a perceptron: X. Y. Z. Class 0. 3. 5. 1 1. 4. 8. 0 7. 1. 2. 1 -1. 5. 5. 0 2. 6. 7. 0 Use (3 1 3 2) as the initial weight vector. Execute the perceptron training algorithm as discussed in class and report the following: 1. (4) The updated weight vector after the first data point is processed. 2. (4) The updated weight vector after the second data point is processed. 3. (6) The updated weight vector after the fifth data point is processed.
You are approached by the marketing director of a local comp…
You are approached by the marketing director of a local company, who believes that he has devised a foolproof way to measure customer satisfaction.He explains his scheme as follows: “It’s so simple that I can’t believe that no one has thought of it before. I just keep track of the number of customer complaints about each product. I read in a data mining book that counts are ratio attributes, and so, my measure of product satisfaction must be a ratio attribute. But when I rated the products based on my new customer satisfaction measure and showed them to my boss, he told me that I had overlooked the obvious and that my measure was worthless. I think that he was just mad because our best-selling product had the worst satisfaction since it had the most complaints. Could you help me set him straight?” Who is right, the marketing director or his boss? If you answered, his boss, what would you do to fix the measure of satisfaction?