Question 7: Wine Data – Regularized Regression (7a) Using wi…

Question 7: Wine Data – Regularized Regression (7a) Using wine_data_train, conduct ridge regression with quality as the binary response variable and all other variables in wine_data_train as the predicting variables. (7a.1) 3 pts – Use 10-fold cross validation on the misclassification error to select the optimal lambda value. What optimal lambda value did you obtain? Hint: Make sure to change the value of type.measure in order to perform cross validation on the misclassification error. If needed, you can take a look at the help file by typing ?cv.glmnet. (7a.2) 1.5 pts – Fit a glmnet object with nlambda = 100. Call it ridge_model.  (7a.3) 1 pt – Display the estimated coefficients at the optimal lambda value.

Question 8: Wine Data – Prediction (8a) 6 pts – Using model3…

Question 8: Wine Data – Prediction (8a) 6 pts – Using model3, all_subsets_model, stepwise_model, and ridge_model, give a binary classification to each of the rows in wine_data_test, with 1 indicating a good quality wine. Use 0.5 as your classification threshold.  (8b) 4.5 pts – For each model, display its accuracy, sensitivity, and specificity metrics. Hint: confusionMatrix() from the caret package could be used to calculate these metrics. (9b.1) Which model has the largest accuracy? (9b.2) Which model has the largest sensitivity?(9b.3) Which model has the largest specificity? (8c) 1 pt – In this context, should sensitivity or specificity matter more? Explain. Hint: Remember that sensitivity is the proportion of all 1s in the test set that are correctly classified as 1s, while specificity is the proportion of all 0s in the test set that are correctly classified as 0s. (8d) 1 pt – Based on 8b and 8c, which model performed the best?

Multiple Choice Questions 32-33 The following table shows th…

Multiple Choice Questions 32-33 The following table shows the R output of a logistic regression model, where the variables Sepal.Length, Sepal.Width,  Petal.Length, and Petal.Width are predictors and the variable virginica is the response. Using the following R output from a fitted logistic regression model, answer Questions 32 and 33.   Call:glm(formula = virginica ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width, family = binomial, data = iris)Deviance Residuals:      Min        1Q    Median        3Q       Max  -2.01105  -0.00065   0.00000   0.00048   1.78065  Coefficients:             Estimate Std. Error z value Pr(>|z|)  (Intercept)   -42.638     25.708  -1.659   0.0972 .Sepal.Length   -2.465      2.394  -1.030   0.3032  Sepal.Width    -6.681      4.480  -1.491   0.1359  Petal.Length    9.429      4.737   1.990   0.0465 *Petal.Width    18.286      9.743   1.877   0.0605 .—Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1(Dispersion parameter for binomial family taken to be 1)    Null deviance: 190.954  on 149  degrees of freedomResidual deviance:  11.899  on 145  degrees of freedomAIC: 21.899

Question 5 – Wine Data – Full Model (5a) 2 pts – Using wine_…

Question 5 – Wine Data – Full Model (5a) 2 pts – Using wine_data_train, fit a logistic regression model with quality as the response variable and all other variables as predicting variables. Include an intercept. Call it model3. Display the summary table for the model.  (5b) 2 pts – Conduct a multicollinearity test on model3. Using a VIF threshold of 10, what can you conclude? (5c) 2 pts – Estimate the dispersion parameter for model3. Does overdispersion seem to be a problem in this model?

Question 1: Bike Data – Exploratory Analysis (1a) 2 pts – Us…

Question 1: Bike Data – Exploratory Analysis (1a) 2 pts – Using bike_data_train, create a histogram of the variable bikes. Based on this plot, what generalized linear regression model(s) discussed in this course could be used to model this response variable? Explain. (1b) 2 pts – Using bike_data_train, create a scatterplot of bikes versus each numeric predicting variable (high_temp, low_temp, and precipitation) (3 scatterplots total). Do these variables appear useful in predicting the number of bikes crossing the Brooklyn Bridge on a given day? Include your reasoning.

Instructions The R Markdown file / Jupyter Notebook file inc…

Instructions The R Markdown file / Jupyter Notebook file includes the questions, the empty code chunk sections for your code, and the text blocks for your responses. Answer the questions below by completing the R Markdown file / Jupyter Notebook file. You must answer the questions using one of these files. You may make slight adjustments to get the file to knit/convert but otherwise keep the formatting the same. Once you’ve finished answering the questions, submit your responses in a single knitted/converted HTML file. There are 21 questions divided among 8 sections. The number of points for each question is provided. Partial credit may be given if your code is correct but your conclusion is incorrect or vice versa. Next Steps: Save the .Rmd/.ipynb file in your R working directory – the same directory where you will download the “brooklyn_bridge_bike_counts.csv” and “white_wine_quality.csv” data files into. Having all files in the same directory will help in reading the “brooklyn_bridge_bike_counts.csv” and “white_wine_quality.csv” files. Read the question and create the R code necessary within the code chunk section immediately below each question. Knitting this file will generate the output and insert it into the section below the code chunk. Type your answer to the questions in the text block provided immediately after the response prompt. Once you’ve finished answering all questions, knit this file and submit the knitted file as HTML on Canvas. Example Question Format: (8a) This will be the exam question – each question is already copied from Canvas and inserted into individual text blocks below, you do not need to copy/paste the questions from the online Canvas exam. # Example code chunk area. Enter your code below the comment and between the “`{r} and “` Response to question (8a)This is the section where you type your written answer to the question. Depending on the question asked, your typed response may be a number, a list of variables, a few sentences, or a combination of these elements.  Data  brooklyn_bridge_bike_counts.csv (right-click the link and select to open in new window/tab) white_wine_quality.csv (right-click the link and select to open in new window/tab) Submission Templates        You may use either the R Markdown or Jupyter Notebook Starter Template:  6414_Final_Part2_Submission_Summer2021.Rmd     (right-click the link and select to open in new window/tab) 6414_Final_Part2_Submission_Summer2021.ipynb   (right-click the link and select to open in new window/tab) Ready? Let’s begin. We wish you the best of luck!