Question 4: Bike Data – Prediction (4a) 2 pts – Predict bike…

Question 4: Bike Data – Prediction (4a) 2 pts – Predict bikes for the test set (bike_data_test) using model1. Display the first six predicted values. (4b) 2 pts – Calculate and display the mean squared prediction error (MSPE) for model1. List one limitation of using this metric to evaluate prediction accuracy. (4c) 1 pt – Refit model1 on bike_data_full, and call it model2. Display the summary table for the model. (4c.1) 3 pts – Estimate the 10-fold and leave-one-out cross validation mean prediction squared error (MSPE) for model2. Hint: cv.glm() from the boot package uses MSPE as the default cost function. (4c.2) 1 pt – How do these two MSPEs compare to the model1 MSPE from 4b? Apply your knowledge of cross validation to explain your results.

5.3 Citizenship can be defined as (1)   A.     the…

5.3 Citizenship can be defined as (1)   A.     the status of a person recognized under the law of a country of belonging to thereof.     B.     a legal identification of a person in international law, establishing the person as a subject, a national, of a sovereign state.     C.    the act of finding out who someone is or what something is.     D.    a distinct territorial body or political entity  

Read Data Read the data and answer the questions below. Assu…

Read Data Read the data and answer the questions below. Assume a significance threshold of 0.05 for hypothesis tests unless stated otherwise. # Load relevant libraries (add other libraries here if needed)library(car)library(CombMSC)library(aod)library(bestglm)library(boot)library(corrplot)library(caret)library(glmnet)# Ensure that the sampling type is correctRNGkind(sample.kind=”Rejection”)# Set seed (please do not change for consistency of the results)set.seed(0)###### Read and process the data: Bike data ######## Read databike_data_full = read.csv(“brooklyn_bridge_bike_counts.csv”, header=TRUE)# Convert month and day columns to categorical variablesbike_data_full$month

  TENEO INTERNATIONAL SCHOOL     SUBJECT:  iLIFE S…

  TENEO INTERNATIONAL SCHOOL     SUBJECT:  iLIFE SMARTS      DATE: 26 JULY 2021     TIME: 85 MINUTES + 5 MINUTES SUBMISSION TIME     MARKS: 50 MARKS     EXAMINER: R. ISMAIL     MODERATOR:  N. du TOIT         INSTRUCTIONS   1. The answers you provide to the question paper, must be your own, original work. No copying from any source is allowed. No marks will be awarded for work that is copied   2. Read all the questions carefully.   3. Use the mark allocation as a guide to how much information is required in your answers.   4. Answer all the questions – do not leave any blank.  

Question 3: Bike Data – Goodness of Fit (3a) 3 pts – Evaluat…

Question 3: Bike Data – Goodness of Fit (3a) 3 pts – Evaluate whether the deviance residuals are approximately normally distributed by producing a QQ plot and histogram of the deviance residuals. Based on these plots, what assessment can you make about the goodness of fit of model1? (3b) 2 pts – Perform a goodness-of-fit statistical test for model1 using the deviance residuals and significance level 0.05. Provide the null and alternative hypotheses, test statistic, p-value, and conclusion in the context of the problem.  (3c) 3 pts – Why might a Poisson regression model not be a good fit? Provide two reasons. How can you try to improve the fit in each situation? Do not apply the recommendations.

Question 7: Wine Data – Regularized Regression (7a) Using wi…

Question 7: Wine Data – Regularized Regression (7a) Using wine_data_train, conduct ridge regression with quality as the binary response variable and all other variables in wine_data_train as the predicting variables. (7a.1) 3 pts – Use 10-fold cross validation on the misclassification error to select the optimal lambda value. What optimal lambda value did you obtain? Hint: Make sure to change the value of type.measure in order to perform cross validation on the misclassification error. If needed, you can take a look at the help file by typing ?cv.glmnet. (7a.2) 1.5 pts – Fit a glmnet object with nlambda = 100. Call it ridge_model.  (7a.3) 1 pt – Display the estimated coefficients at the optimal lambda value.

Question 8: Wine Data – Prediction (8a) 6 pts – Using model3…

Question 8: Wine Data – Prediction (8a) 6 pts – Using model3, all_subsets_model, stepwise_model, and ridge_model, give a binary classification to each of the rows in wine_data_test, with 1 indicating a good quality wine. Use 0.5 as your classification threshold.  (8b) 4.5 pts – For each model, display its accuracy, sensitivity, and specificity metrics. Hint: confusionMatrix() from the caret package could be used to calculate these metrics. (9b.1) Which model has the largest accuracy? (9b.2) Which model has the largest sensitivity?(9b.3) Which model has the largest specificity? (8c) 1 pt – In this context, should sensitivity or specificity matter more? Explain. Hint: Remember that sensitivity is the proportion of all 1s in the test set that are correctly classified as 1s, while specificity is the proportion of all 0s in the test set that are correctly classified as 0s. (8d) 1 pt – Based on 8b and 8c, which model performed the best?