# creating regression models using R

This assignment has you creating regression models using R. You may also be asked to make some modifications to the datasets and to interpret how these modifications affect the models. For example, you may be asked to remove an influential point and demonstrate how the model is affected.

*Please note that I am not providing you with reminders of what goes into this process.* Review your lectures notes, and in particular, the R tutorial (document and video) on creating a regression model for the kinds of things that need to be done, things that need to be checked for, etc.

The basic regression models you are asked to do here are not intended to be difficult. In fact, feel free to basically reproduce the steps from the regression process demonstrated in the R tutorial. The key point is that in your document, you must demonstrate an understanding of the steps involved, and the types of things that must be commented on.

Also — and I hope this is obvious (plus it’s on your assignment checklist) — whenever you draw a graph in R, be sure to include it into your homework document.

And of course, be sure to include your R code somewhere in your document. You can either paste your code in one block, or in pieces. But be sure to explain what your code does. As always, I don’t expect your explanations to go into extreme detail, but you do have to explain your process for full credit on the problems.

**Hints:**

- Before deciding on your final regression model, you should, of course, check to see if there are outliers or influential observations. If you do observe outliers or influential points, you must then decide about what to do with them. Remember that if you remove an observation, you must have a good reason for doing so, and you must also explain this reasoning in your analysis (in this case, your homework document).
- If there is only one outlier and that one observation is clearly making a significant change to the model, then it is probably worth leaving out. If, however, there are multiple observations that appear influential, then any individual observation is probably not all
*that*influential, so they can probably be kept in. In general, we try to avoid omitting observations if possible. - If you
*do*omit any observations, you must then create new vectors without those observations before embarking on your regression analysis.

**Problem #1**

Below are two variables representing a series of IQ scores and GPAs from a cohort of 78 students. For example, the first observation is a student who has an IQ of 111 and a GPA of 7.94. The second observation is a student with an IQ of 107 and a GPA of 8.292, etc.

Here are the values for IQ:

111, 107, 100, 107, 114, 115, 111, 97, 100, 112, 104, 89, 104, 102, 91, 114, 114, 103, 106, 105, 113, 109, 108, 113, 130, 128, 128, 118, 113, 120, 132, 111, 124, 127, 128, 136, 106, 118, 119, 123, 124, 126, 116, 127, 119, 97, 86, 102, 110, 120, 103, 115, 93, 72, 111, 103, 123, 79, 119, 110, 110, 107, 74, 105, 112, 105, 110, 107, 103, 77, 98, 90, 96, 112, 112, 114, 93, 106

Here are the values for GPA:

7.94, 8.292, 4.643, 7.47, 8.882, 7.585, 7.65, 2.412, 6, 8.833, 7.47, 5.528, 7.167, 7.571, 4.7, 8.167, 7.822, 7.598, 4, 6.231, 7.643, 1.76, 6.419, 9.648, 10.7, 10.58, 9.429, 8, 9.585, 9.571, 8.998, 8.333, 8.175, 8, 9.333, 9.5, 9.167, 10.14, 9.999, 10.76, 9.763, 9.41, 9.167, 9.348, 8.167, 3.647, 3.408, 3.936, 7.167, 7.647, 0.53, 6.173, 7.295, 7.295, 8.938, 7.882, 8.353, 5.062, 8.175, 8.235, 7.588, 7.647, 5.237, 7.825, 7.333, 9.167, 7.996, 8.714, 7.833, 4.885, 7.998, 3.82, 5.936, 9, 9.5, 6.057, 6.057, 6.938

You will note that this GPA does not use the familiar 0-4 range. However, it is simply a different scale and should not affect your analysis any.

You suspect that IQ may be a reasonable predictor of GPA. Create a regression model in which you model how IQ affects GPA.

** **

**Problem #2**

**Part A:**

Here is a different group of data in which we will attempt to see if GRE is a reasonable predictor of GPA.

GRE:

337, 324, 316, 322, 314, 330, 321, 308, 302, 323, 325, 327, 328, 307, 311, 314, 317, 319, 318, 303, 312, 325, 328, 334, 336, 340, 322, 298, 295, 339

GPA:

9.65, 8.87, 8, 8.67, 8.21, 9.34, 8.2, 7.9, 8, 8.6, 8.4, 9, 9.1, 8, 8.2, 8.3, 8.7, 8, 8.8, 8.5, 7.9, 8.4, 9.5, 9.7, 9.8, 9.6, 8.8, 7.5, 7.2, 7.3

Repeat the regression procedure. In this case, there is clearly one observation that seems out of place from the others. Indicate whether this is best labeled an outlier, influential, or both. Generate the regression model including this observation.

**Part B:**

Now remove the observation, regenerate your vectors, and generate an updated regression model.

Answer the following:

- Using the model and r in particular, how large of a role did this one observation appear play in affecting the model?
- If you were hired as an analyst to build a model for a client to use, would you give your client the model with our without the outlier?