← 返回首页
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Linear Regression Tutorial\n", "by Marc Deisenroth" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The purpose of this notebook is to practice implementing some linear algebra (equations provided) and to explore some properties of linear regression." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import scipy.linalg\n", "import matplotlib.pyplot as plt\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We consider a linear regression problem of the form\n", "$$\n", "y = \\boldsymbol x^T\\boldsymbol\\theta + \\epsilon\\,,\\quad \\epsilon \\sim \\mathcal N(0, \\sigma^2)\n", "$$\n", "where $\\boldsymbol x\\in\\mathbb{R}^D$ are inputs and $y\\in\\mathbb{R}$ are noisy observations. The parameter vector $\\boldsymbol\\theta\\in\\mathbb{R}^D$ parametrizes the function.\n", "\n", "We assume we have a training set $(\\boldsymbol x_n, y_n)$, $n=1,\\ldots, N$. We summarize the sets of training inputs in $\\mathcal X = \\{\\boldsymbol x_1, \\ldots, \\boldsymbol x_N\\}$ and corresponding training targets $\\mathcal Y = \\{y_1, \\ldots, y_N\\}$, respectively.\n", "\n", "In this tutorial, we are interested in finding good parameters $\\boldsymbol\\theta$." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYwAAAEGCAYAAAB2EqL0AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAP6klEQVR4nO3df6zdd13H8efLbvwQVNBeZGyTO+OdYcovPVkkGrPYVQcjKygkW4wOlSwYB3XRSOMSpkOSERPr+BGwymQYZBB+SLXF0Q1wEDPc6TJgW1lbl5GVLuyyCThBSOHtH/cM7+7ObT+9P873nHuej+Sk3x+ffvv6pu195fvjfL+pKiRJOpEf6DqAJGkyWBiSpCYWhiSpiYUhSWpiYUiSmpzSdYD1snnz5pqdne06hiRNlP3793+1qmaGrduwhTE7O0u/3+86hiRNlCRfWm6dp6QkSU0sDElSEwtDktTEwpCkDWbnvoPrsl0LQ5I2mGtvPrQu27UwJElNLAxJUhMLQ5LUZMN+cU+SpsHOfQeHXrOY3bHnMfPbt8xxxdazV/VnZaO+QKnX65Xf9JY0jWZ37OG+ay5c0e9Nsr+qesPWeUpKktTEwpAkNbEwJElNLAxJ2mC2b5lbl+1aGJK0waz2bqjlWBiSpCYWhiSpyVgURpLrkjyY5M5l1p+X5OtJ7hh83jDqjJI07cblm97vBt4GvOc4Yz5dVS8dTRxJ0lJjcYRRVbcAD3edQ5K0vLEojEYvSvK5JB9L8jPDBiS5LEk/SX9+fn7U+SRpQ5uUwrgdeHZVPR94K/BPwwZV1a6q6lVVb2ZmZqQBJWmjm4jCqKpvVNUjg+m9wKlJNnccS5KmykQURpJnJslg+lwWcj/UbSpJmi5jcZdUkvcB5wGbkxwBrgJOBaiqdwKvAH4/yTHgW8DFtVGfyy5JY2osCqOqLjnB+rexcNutJKkjE3FKSpLUPQtDktTEwpAkNbEwJElNLAxJUhMLQ5LUxMKQJDWxMCRJTSwMSVITC0OS1MTCkCQ1sTAkSU0sDElSEwtDktTEwpAkNbEwJElNLAxJUhMLQ5LUxMKQJDWxMCRJTSwMSVITC0OS1MTCkCQ1sTAkSU0sDElSk7EojCTXJXkwyZ3LrE+StyQ5nOTzSX5u1BkladqNRWEA7wYuOM76FwNzg89lwDtGkEmStMhYFEZV3QI8fJwh24D31IJbgaclOW006SRJMCaF0eB04P5F80cGyx4jyWVJ+kn68/PzIwsnSdNgUgojQ5bV4xZU7aqqXlX1ZmZmRhBLkqbHpBTGEeDMRfNnAEc7yiJJU2lSCmM38NuDu6V+Afh6VT3QdShJmiandB0AIMn7gPOAzUmOAFcBpwJU1TuBvcBLgMPAN4Hf6SapJE2vsSiMqrrkBOsL+IMRxZEkDTEpp6QkSR2zMCRJTSwMSVITC0OS1MTCkCQ1sTAkSU0sDElSEwtDktTEwpAkNbEwJElNLAxJUhMLQ5LUxMKQtCo79x3sOoJGxMKQtCrX3nyo6wgaEQtDktTEwpAkNbEwJElNxuKNe5Imw859B4des5jdsecx89u3zHHF1rNHFUsjkoW3n248vV6v+v1+1zGkDW92xx7uu+bCrmNojSTZX1W9Yes8JSVJamJhSJKaWBiSpCYWhqRV2b5lrusIGhELQ9KqeDfU9LAwJElNLAxJUpOxKIwkFyS5J8nhJDuGrH9Vkvkkdww+r+4ipyRNs86/6Z1kE/B2YCtwBLgtye6qunvJ0PdX1eUjDyhJAsbjCONc4HBV3VtV3wFuALZ1nEmStMQ4FMbpwP2L5o8Mli31G0k+n+SDSc4ctqEklyXpJ+nPz8+vR1ZJmlrjUBgZsmzpA67+GZitqucBNwHXD9tQVe2qql5V9WZmZtY4piRNt3EojCPA4iOGM4CjiwdU1UNV9e3B7N8CPz+ibJKkgXEojNuAuSRnJXkCcDGwe/GAJKctmr0IODDCfJIkxuAuqao6luRy4EZgE3BdVd2V5GqgX1W7gdcluQg4BjwMvKqzwJI0pXwfhiTp+3wfhiRp1SwMSVITC0OS1MTCkCQ1sTAkSU0sDElSEwtDktTEwpAkNbEwJElNLAxJUhMLQ5LU5ISFkeSmJM8fRRhJ0vhqOcL4E2Bnkr9f8phxSdIUOWFhVNXtVfUrwL8A/5rkqiRPXv9okqRx0nQNI0mAe4B3AK8FDiX5rfUMJm1kO/cd7DqCdNJarmF8BvgysBM4nYWXF50HnJtk13qGkzaqa28+1HUE6aS1vHHvNcBd9fg3Lb02ia9KlaQpccLCqKo7j7P6wjXMIkkaY6v6HkZV3btWQSRJ463llJSkVdi57+DQaxazO/Y8Zn77ljmu2Hr2qGJJJy2PvzSxMfR6ver3+13HkIaa3bGH+67xjK7GT5L9VdUbts5Hg0iSmlgYkqQmFoYkqYmFIXVg+5a5riNIJ20sCiPJBUnuSXI4yY4h65+Y5P2D9Z9NMjv6lNLa8W4oTaLOCyPJJuDtwIuBc4BLkpyzZNjvAf9VVT/FwiNK3jzalJKkzgsDOBc4XFX3VtV3gBuAbUvGbAOuH0x/ENgyeCCiJGlExqEwTgfuXzR/ZLBs6JiqOgZ8HfixpRtKclmSfpL+/Pz8OsWVpOk0DoUx7Ehh6bcJW8ZQVbuqqldVvZmZmTUJJ0laMA6FcQQ4c9H8GcDR5cYkOQX4EeDhkaSTJAHjURi3AXNJzkryBOBiYPeSMbuBSwfTrwA+MeRx65KkddT5wwer6liSy4EbgU3AdVV1V5KrgX5V7QbeBfxDksMsHFlc3F1iSZpOnRcGQFXtBfYuWfaGRdP/C7xy1LkkSf9vHE5JSZImgIUhSWpiYUiSmlgYkqQmFoYkqYmFIUlqYmFIkppYGJKkJhaGJKmJhSFJamJhSJKaWBiSpCYWhiSpiYUhSWpiYUiSmlgYkqQmFoYkqYmFIUlqYmFIkppYGJKkJhaGJKmJhSFJamJhSJKaWBiSpCYWhiSpiYUhSWrSaWEk+dEk+5IcGvz69GXGfTfJHYPP7lHnlCR1f4SxA7i5quaAmwfzw3yrql4w+Fw0uniSpEd1XRjbgOsH09cDL+swiyTpOLoujB+vqgcABr8+Y5lxT0rST3JrkmVLJcllg3H9+fn59cgrSVPrlPX+A5LcBDxzyKorT2IzP1FVR5P8JPCJJF+oqv9cOqiqdgG7AHq9Xq0osCRpqHUvjKo6f7l1Sb6S5LSqeiDJacCDy2zj6ODXe5N8Cngh8LjCkCStn65PSe0GLh1MXwp8dOmAJE9P8sTB9GbgF4G7R5ZQkgR0XxjXAFuTHAK2DuZJ0kvyd4MxzwH6ST4HfBK4pqosDEkasXU/JXU8VfUQsGXI8j7w6sH0vwPPHXE0SdISXR9hSJImhIUhSWpiYWhi7Nx3sOsI0lSzMDQxrr35UNcRpKlmYUiSmlgYkqQmFoYkqUmn38OQlrNz38Gh1yxmd+x5zPz2LXNcsfXsUcWSplqqNuYz+nq9XvX7/a5jaA3N7tjDfddc2HUMaUNLsr+qesPWeUpKktTEwpAkNbEwJElNLAxNjO1b5rqOIE01C0MTw7uhpG5ZGJKkJhaGJKmJhSFJamJhSJKaWBiSpCYWhiSpiYUhSWpiYUiSmlgYkqQmFoYkqYmFIUlq0mlhJHllkruSfC/J0Bd2DMZdkOSeJIeT7BhlRknSgq6PMO4Efh24ZbkBSTYBbwdeDJwDXJLknNHEkyQ9qtN3elfVAYAkxxt2LnC4qu4djL0B2Abcve4BJUnf1/URRovTgfsXzR8ZLHucJJcl6Sfpz8/PjyScJE2LdT/CSHIT8Mwhq66sqo+2bGLIsho2sKp2AbsAer3e0DGSpJVZ98KoqvNXuYkjwJmL5s8Ajq5ym5KkkzQJp6RuA+aSnJXkCcDFwO6OM0nS1On6ttqXJzkCvAjYk+TGwfJnJdkLUFXHgMuBG4EDwAeq6q6uMkvStOr6LqmPAB8Zsvwo8JJF83uBvSOMJklaYhJOSUmSxoCFIUlqYmFIkppYGMvYue9g1xEkaaxYGMu49uZDXUeQpLFiYUiSmlgYkqQmFoYkqUmnX9wbFzv3HRx6zWJ2x57HzG/fMscVW88eVSxJGiup2pgPde31etXv91f8+2d37OG+ay5cw0SSNP6S7K+qoW9A9ZSUJKmJhSFJamJhSJKaWBjL2L5lrusIkjRWLIxleDeUJD2WhSFJamJhSJKaWBiSpCYb9ot7SeaBL61iE5uBr65RnC5tlP0A92VcbZR92Sj7Aavbl2dX1cywFRu2MFYrSX+5bztOko2yH+C+jKuNsi8bZT9g/fbFU1KSpCYWhiSpiYWxvF1dB1gjG2U/wH0ZVxtlXzbKfsA67YvXMCRJTTzCkCQ1sTAkSU0sjGUkeWOSzye5I8nHkzyr60wrleQvk3xxsD8fSfK0rjOtVJJXJrkryfeSTNwtkEkuSHJPksNJdnSdZzWSXJfkwSR3dp1lNZKcmeSTSQ4M/m1t7zrTSiV5UpL/SPK5wb78+Zpu32sYwyX54ar6xmD6dcA5VfWajmOtSJJfBT5RVceSvBmgql7fcawVSfIc4HvA3wB/XFUrf63iiCXZBBwEtgJHgNuAS6rq7k6DrVCSXwYeAd5TVT/bdZ6VSnIacFpV3Z7kh4D9wMsm8e8lSYCnVNUjSU4FPgNsr6pb12L7HmEs49GyGHgKMLHNWlUfr6pjg9lbgTO6zLMaVXWgqu7pOscKnQscrqp7q+o7wA3Ato4zrVhV3QI83HWO1aqqB6rq9sH0fwMHgNO7TbUyteCRweypg8+a/eyyMI4jyZuS3A/8JvCGrvOskd8FPtZ1iCl1OnD/ovkjTOgPpo0qySzwQuCz3SZZuSSbktwBPAjsq6o125epLowkNyW5c8hnG0BVXVlVZwLvBS7vNu3xnWhfBmOuBI6xsD9jq2VfJlSGLJvYI9eNJslTgQ8Bf7jkDMNEqarvVtULWDiTcG6SNTtdeMpabWgSVdX5jUP/EdgDXLWOcVblRPuS5FLgpcCWGvMLVyfx9zJpjgBnLpo/AzjaURYtMjjf/yHgvVX14a7zrIWq+lqSTwEXAGtyY8JUH2EcT5LF72i9CPhiV1lWK8kFwOuBi6rqm13nmWK3AXNJzkryBOBiYHfHmabe4ELxu4ADVfVXXedZjSQzj94FmeTJwPms4c8u75JaRpIPAT/Nwh05XwJeU1Vf7jbVyiQ5DDwReGiw6NYJvuPr5cBbgRnga8AdVfVr3aZql+QlwF8Dm4DrqupNHUdasSTvA85j4VHaXwGuqqp3dRpqBZL8EvBp4Ass/H8H+NOq2ttdqpVJ8jzgehb+ff0A8IGqunrNtm9hSJJaeEpKktTEwpAkNbEwJElNLAxJUhMLQ5LUxMKQJDWxMCRJTSwMaYQG713YOpj+iyRv6TqT1GqqnyUldeAq4Ookz2DhqagXdZxHauY3vaURS/JvwFOB8wbvX5AmgqekpBFK8lzgNODbloUmjYUhjcjgVaDvZeEte/+TZGIemiiBhSGNRJIfBD4M/FFVHQDeCPxZp6Gkk+Q1DElSE48wJElNLAxJUhMLQ5LUxMKQJDWxMCRJTSwMSVITC0OS1OT/AE4cDzAccAchAAAAAElFTkSuQmCC\n", "text/plain": [ "" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Define training set\n", "X = np.array([-3, -1, 0, 1, 3]).reshape(-1,1) # 5x1 vector, N=5, D=1\n", "y = np.array([-1.2, -0.7, 0.14, 0.67, 1.67]).reshape(-1,1) # 5x1 vector\n", "\n", "# Plot the training set\n", "plt.figure()\n", "plt.plot(X, y, '+', markersize=10)\n", "plt.xlabel(\"$x$\")\n", "plt.ylabel(\"$y$\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Maximum Likelihood\n", "We will start with maximum likelihood estimation of the parameters $\\boldsymbol\\theta$. In maximum likelihood estimation, we find the parameters $\\boldsymbol\\theta^{\\mathrm{ML}}$ that maximize the likelihood\n", "$$\n", "p(\\mathcal Y | \\mathcal X, \\boldsymbol\\theta) = \\prod_{n=1}^N p(y_n | \\boldsymbol x_n, \\boldsymbol\\theta)\\,.\n", "$$\n", "From the lecture we know that the maximum likelihood estimator is given by\n", "$$\n", "\\boldsymbol\\theta^{\\text{ML}} = (\\boldsymbol X^T\\boldsymbol X)^{-1}\\boldsymbol X^T\\boldsymbol y\\in\\mathbb{R}^D\\,,\n", "$$\n", "where \n", "$$\n", "\\boldsymbol X = [\\boldsymbol x_1, \\ldots, \\boldsymbol x_N]^T\\in\\mathbb{R}^{N\\times D}\\,,\\quad \\boldsymbol y = [y_1, \\ldots, y_N]^T \\in\\mathbb{R}^N\\,.\n", "$$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us compute the maximum likelihood estimate for a given training set" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "## EDIT THIS FUNCTION\n", "def max_lik_estimate(X, y):\n", " \n", " # X: N x D matrix of training inputs\n", " # y: N x 1 vector of training targets/observations\n", " # returns: maximum likelihood parameters (D x 1)\n", " \n", " N, D = X.shape\n", " theta_ml = np.linalg.solve(X.T @ X, X.T @ y) ##