{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Multiple Linear Regression\n", "\n", "In this example we will explore multiple linear regression, where we have multiple independent variables ($x$ values) to predict a single dependent variable ($y$ value). Remember, our model looks like this\n", "\n", "$$y_i = \\beta_0 + x_{i1} \\beta_1 + x_{i2} \\beta_2 + \\cdots + x_{id} \\beta_d + \\epsilon_i.$$\n", "\n", "In matrix/vector math, this is\n", "\n", "$$\\begin{pmatrix} y_1 \\\\ y_2 \\\\ \\vdots \\\\ y_n \\end{pmatrix} = \n", "\\begin{pmatrix} \n", "1 & x_{11} & x_{12} & \\cdots & x_{1d}\\\\\n", "1 & x_{21} & x_{22} & \\cdots & x_{2d}\\\\\n", "\\vdots & \\vdots & \\vdots & & \\vdots\\\\\n", "1 & x_{n1} & x_{n2} & \\cdots & x_{nd}\\\\\n", "\\end{pmatrix}\n", "\\begin{pmatrix} \\beta_0 \\\\ \\beta_1 \\\\ \\beta_2 \\\\ \\vdots \\\\ \\beta_d \\end{pmatrix}\n", "+\n", "\\begin{pmatrix} \\epsilon_1 \\\\ \\epsilon_2 \\\\ \\vdots \\\\ \\epsilon_n\\end{pmatrix}.$$" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import seaborn as sns\n", "import numpy as np\n", "\n", "# Just some color options for seaborn plots\n", "sns.set(style=\"darkgrid\")\n", "sns.set_palette(\"Dark2\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In addition to the above libraries that we've seen before, we are going to import `statsmodels` for doing multiple linear regression." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import statsmodels.api as sm\n", "import statsmodels.formula.api as smf" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll take another look at the hippocampus volume from the OASIS dementia data." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Unnamed: 0 | \n", "ID | \n", "M.F | \n", "Hand | \n", "Age | \n", "Educ | \n", "SES | \n", "MMSE | \n", "CDR | \n", "eTIV | \n", "nWBV | \n", "ASF | \n", "Delay | \n", "RightHippoVol | \n", "LeftHippoVol | \n", "TrainData | \n", "Dementia | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "1 | \n", "OAS1_0002_MR1 | \n", "F | \n", "R | \n", "55 | \n", "4 | \n", "1.0 | \n", "29 | \n", "0.0 | \n", "1147 | \n", "0.810 | \n", "1.531 | \n", "NaN | \n", "4230 | \n", "3807 | \n", "0 | \n", "0 | \n", "
1 | \n", "2 | \n", "OAS1_0003_MR1 | \n", "F | \n", "R | \n", "73 | \n", "4 | \n", "3.0 | \n", "27 | \n", "0.5 | \n", "1454 | \n", "0.708 | \n", "1.207 | \n", "NaN | \n", "2896 | \n", "2801 | \n", "1 | \n", "1 | \n", "
2 | \n", "7 | \n", "OAS1_0010_MR1 | \n", "M | \n", "R | \n", "74 | \n", "5 | \n", "2.0 | \n", "30 | \n", "0.0 | \n", "1636 | \n", "0.689 | \n", "1.073 | \n", "NaN | \n", "2832 | \n", "2578 | \n", "0 | \n", "0 | \n", "
3 | \n", "8 | \n", "OAS1_0011_MR1 | \n", "F | \n", "R | \n", "52 | \n", "3 | \n", "2.0 | \n", "30 | \n", "0.0 | \n", "1321 | \n", "0.827 | \n", "1.329 | \n", "NaN | \n", "3978 | \n", "4080 | \n", "0 | \n", "0 | \n", "
4 | \n", "10 | \n", "OAS1_0013_MR1 | \n", "F | \n", "R | \n", "81 | \n", "5 | \n", "2.0 | \n", "30 | \n", "0.0 | \n", "1664 | \n", "0.679 | \n", "1.055 | \n", "NaN | \n", "3557 | \n", "3495 | \n", "0 | \n", "0 | \n", "