Most choices involve _______________, which compares the ben…

Questions

Mоst chоices invоlve _______________, which compаres the benefits аnd costs of choosing а little more or a little less of a good.

Questiоn 1: Dаtа Explоrаtiоn (11 points) 1a) (2 points) What is the median "Monthly_Working_Hours" for employees across different workplaces? Note: Answer must be grouped by "Workplace_Flexibility". 1b) (2 points) What is the proportion of employees who stayed with the company (i.e., did not leave) for each type of "Health_Benefits"? Note: As an example, the proportion of employees who stayed with the company for Full Coverage equals the number of employees with full coverage who stayed divided by the number of employees with full coverage. 1c) (2 points) Print the rows with the highest "Salary_Increase_Percentage". Identify the qualitative variable responses that are the same between the rows with the highest "Salary_Increase_Percentage"? 1d) (5 points) Create boxplots and interpret each plot for the following predictors against the response variable (Turnover). i) Monthly_Working_Hours ii) Years_With_Company In general, using boxplots, can we make statements about statistical significance of the differences between the group means? How can we infer if the group means are statistically significantly different from each other?

Questiоn 4: Predictiоn (14 pоints) Use the "testDаtа" for аll questions in this question. 4a)(4 points) Using testData, predict the probability of an employee leaving, i.e. being a turnover, and output the average of these probabilities for each of the models below: i) model1 (question 2a) ii) model2 (question 2b) iii) model3 (question 3a) and iv) model4 (question 3a) 4b) (4 points) Using the probabilities from Q4a and a threshold of 0.5 (inclusive of 0.5), obtain the classifications of an employee being a turnover for all four models. Note: every row in the testData prediction must be classified. Print the last ten classification rows for all the model classifications as well as the actual response for Turnover of those rows. 4c) (6 points) In this question, you will compare the prediction accuracy of the four models. i) (4 points) Using the classifications from Q4b, create a confusion matrix and output the classification evaluation metrics for all four models (i.e. Accuracy, Sensitivity, and Specificity). Note: every row in the testData classification must be used (do not use only the last ten classification rows). ii) (2 points) Which metric measures the rate of true negatives? Which model shows the highest value for this metric?