Linear Regression and K-Nearest Neighbors 3/28/18
Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number, not a category label The learned model: A linear function mapping input to output A weight for each feature (including bias)
Linear Regression We want to find the linear model that fits our data best. Key idea: model data as linear function plus noise. Pick the weights to minimize noise magnitude. f(~x) = 2 3 2 3 w b 1 w 0 6 7 4. 5 x 0 6 7 4. 5 + w d x d
Squared Error f(~x) = 2 w b 3 w 0 6 7 4. 5 w d 2 3 1 x 0 6 7 4. 5 + x d ˆf(~x) = 2 w b 3 w 0 6 7 4. 5 w d 2 3 1 x 0 6 7 4. 5 x d Define error for a data point to be the squared distance between correct output and predicted output: 2 f(~x) ˆf(~x) = 2 Error for the model is the sum of point errors: X y ˆf(~x) = X ~x2data ~x2data 2 ~x
<latexit sha1_base64="ajftwnfd3tvrioy73taufuoxv4c=">aaacknicbvfdb9mwfhuypkb4yka88xjfvdrjucuibuwdubqxhngyemvbdvu57k1nzbgz2cmrovwhfg5v/bvclcdwcsxlx+ee43t9nrzsgbtfvzx/68bnw7e37wr3791/8ddceftv6krkooza6jjjmuepfi6tsbktokswpxjp0rojdf5kiaurwn2xqwknovsokqnorknm4q+6rf5/b4i+xciiqdugeqeduqvvgjoqn9utcdeafqqsbrlbxs68/cmdyufaqqcj+gk8p3heisxqvcxodx/d66pband/tqfpb6shnnmlkxku1ot8s7axdam24dqio9ajxrzpwp90rnmvo7jcmmmmcvtyac1kk7jejqcvwylxm7baiyok5wimdtvtbvqomynrxy1lowx/ddqsn2avp06zm3tqnnnr8n+5swwzn9naqkkyqphloaysydwspwjmokru5cobxkvhegv+ykrgrfvgwa0h3nzydtb+ntwyxp9f90zvu2lsk6fkgrmqmoyteflijsmycc/09rz33sh/4h/6h/yjs6nvdz7h5er4n34dscvfmq==</latexit> <latexit sha1_base64="mbprtf7i+1x4hkw5vw+vg1scs6q=">aaac03icdvjbb9mwfhbcbyrbgudejqikwgmqbcenbeiteofxk+gwvfev45601hw7i51uvygeiff+hg/8bv4ebpoh1sgrlh/no9+5+jlkuhgbhj89/8rva9dv7nwmbt2+c/de5/6dq6plguoia6mlogegpva4ssjkjpmcwzzipeqo363jr0ssjndqg13lomnyxilucgydne38ogtmq7tu0yxy6qwebl1tbiielgijjx9aa+uzbagnovmgdgbzi6rw/fjpuxyab4gamptwrryoubdxkpbpbvdsfyvjehlemxfzwsgf7lxrnhr0pm5pmor0nu11wasec/3avgmngw7dxuayifrqja3ttzs/6ezzmknluwtgjkmwt5okfvzwixvas4m548dsjmmhfcvqtkrmuwroowygbhy3liwg/tujypkxqyxxyozzhdmorcl/xcaltv9okqhy0qlim0zpkcfqwl8wzesb3mqva4wxws0kfmekxq37b4g7hgj7yjfb6pnw1ta6enhde9pexg55rb6tponiltkj78k+grhuhxin3mfvi3/of/k/+t82ut9rcx6sc+z//w2l4+be</latexit> <latexit sha1_base64="mbprtf7i+1x4hkw5vw+vg1scs6q=">aaac03icdvjbb9mwfhbcbyrbgudejqikwgmqbcenbeiteofxk+gwvfev45601hw7i51uvygeiff+hg/8bv4ebpoh1sgrlh/no9+5+jlkuhgbhj89/8rva9dv7nwmbt2+c/de5/6dq6plguoia6mlogegpva4ssjkjpmcwzzipeqo363jr0ssjndqg13lomnyxilucgydne38ogtmq7tu0yxy6qwebl1tbiielgijjx9aa+uzbagnovmgdgbzi6rw/fjpuxyab4gamptwrryoubdxkpbpbvdsfyvjehlemxfzwsgf7lxrnhr0pm5pmor0nu11wasec/3avgmngw7dxuayifrqja3ttzs/6ezzmknluwtgjkmwt5okfvzwixvas4m548dsjmmhfcvqtkrmuwroowygbhy3liwg/tujypkxqyxxyozzhdmorcl/xcaltv9okqhy0qlim0zpkcfqwl8wzesb3mqva4wxws0kfmekxq37b4g7hgj7yjfb6pnw1ta6enhde9pexg55rb6tponiltkj78k+grhuhxin3mfvi3/of/k/+t82ut9rcx6sc+z//w2l4+be</latexit> <latexit sha1_base64="mbprtf7i+1x4hkw5vw+vg1scs6q=">aaac03icdvjbb9mwfhbcbyrbgudejqikwgmqbcenbeiteofxk+gwvfev45601hw7i51uvygeiff+hg/8bv4ebpoh1sgrlh/no9+5+jlkuhgbhj89/8rva9dv7nwmbt2+c/de5/6dq6plguoia6mlogegpva4ssjkjpmcwzzipeqo363jr0ssjndqg13lomnyxilucgydne38ogtmq7tu0yxy6qwebl1tbiielgijjx9aa+uzbagnovmgdgbzi6rw/fjpuxyab4gamptwrryoubdxkpbpbvdsfyvjehlemxfzwsgf7lxrnhr0pm5pmor0nu11wasec/3avgmngw7dxuayifrqja3ttzs/6ezzmknluwtgjkmwt5okfvzwixvas4m548dsjmmhfcvqtkrmuwroowygbhy3liwg/tujypkxqyxxyozzhdmorcl/xcaltv9okqhy0qlim0zpkcfqwl8wzesb3mqva4wxws0kfmekxq37b4g7hgj7yjfb6pnw1ta6enhde9pexg55rb6tponiltkj78k+grhuhxin3mfvi3/of/k/+t82ut9rcx6sc+z//w2l4+be</latexit> <latexit sha1_base64="mbprtf7i+1x4hkw5vw+vg1scs6q=">aaac03icdvjbb9mwfhbcbyrbgudejqikwgmqbcenbeiteofxk+gwvfev45601hw7i51uvygeiff+hg/8bv4ebpoh1sgrlh/no9+5+jlkuhgbhj89/8rva9dv7nwmbt2+c/de5/6dq6plguoia6mlogegpva4ssjkjpmcwzzipeqo363jr0ssjndqg13lomnyxilucgydne38ogtmq7tu0yxy6qwebl1tbiielgijjx9aa+uzbagnovmgdgbzi6rw/fjpuxyab4gamptwrryoubdxkpbpbvdsfyvjehlemxfzwsgf7lxrnhr0pm5pmor0nu11wasec/3avgmngw7dxuayifrqja3ttzs/6ezzmknluwtgjkmwt5okfvzwixvas4m548dsjmmhfcvqtkrmuwroowygbhy3liwg/tujypkxqyxxyozzhdmorcl/xcaltv9okqhy0qlim0zpkcfqwl8wzesb3mqva4wxws0kfmekxq37b4g7hgj7yjfb6pnw1ta6enhde9pexg55rb6tponiltkj78k+grhuhxin3mfvi3/of/k/+t82ut9rcx6sc+z//w2l4+be</latexit> ~w<latexit sha1_base64="ajftwnfd3tvrioy73taufuoxv4c=">aaacknicbvfdb9mwfhuypkb4yka88xjfvdrjucuibuwdubqxhngyemvbdvu57k1nzbgz2cmrovwhfg5v/bvclcdwcsxlx+ee43t9nrzsgbtfvzx/68bnw7e37wr3791/8ddceftv6krkooza6jjjmuepfi6tsbktokswpxjp0rojdf5kiaurwn2xqwknovsokqnorknm4q+6rf5/b4i+xciiqdugeqeduqvvgjoqn9utcdeafqqsbrlbxs68/cmdyufaqqcj+gk8p3heisxqvcxodx/d66pband/tqfpb6shnnmlkxku1ot8s7axdam24dqio9ajxrzpwp90rnmvo7jcmmmmcvtyac1kk7jejqcvwylxm7baiyok5wimdtvtbvqomynrxy1lowx/ddqsn2avp06zm3tqnnnr8n+5swwzn9naqkkyqphloaysydwspwjmokru5cobxkvhegv+ykrgrfvgwa0h3nzydtb+ntwyxp9f90zvu2lsk6fkgrmqmoyteflijsmycc/09rz33sh/4h/6h/yjs6nvdz7h5er4n34dscvfmq==</latexit> <latexit sha1_base64="ho9z9luppxjt2rx39owcnvzhppo=">aaac5hicbvjnbxmxepuux2x5akbhlioioi1uot0kcrbuqssfyysruluciq/jtax67cx2hkar3llwamsvh8wn/8khzmyrkaqjwfp85o09nnfacg5sfp31/bs3b92+s3m3uhf/wcpd1qphh40qnwv9qotssuome1yyvuvwsktqjospykfp+btl/htgtofkfrdzgg1zmpe845ryr41a/zp4smyvlui8y7s6whsdlqyw+1zygdt7lwvadkwsnfenx9ebzoxhqskw2yez/st1fw4bmzifvu0cyc4hcs7lkxcy8kilz68/upz0px0enqsexdxgkfytkbbzooicbwjqk/ebnclnhic125u8uasd9alaybvedwijxo5hrt94rgizm2mpimym4qiww4poy6lgrgwlyqwh52tcbg5kkjmzroopladjmdg4ctysfmr2/4yk5mbm89qpc2knzjo2jk+lduqbvrpwxbalzzkulspkavbbcuqw5pprk+yoekq5qxxolghcrfsygwtcvpnkbda/6l3uxscv2kdvm27socfokqprjf6ii/qeham+ol7qffw+ez/8if/n/+n/wkl9r8nzq2vm/74eyn7l0a==</latexit> <latexit sha1_base64="ho9z9luppxjt2rx39owcnvzhppo=">aaac5hicbvjnbxmxepuux2x5akbhlioioi1uot0kcrbuqssfyysruluciq/jtax67cx2hkar3llwamsvh8wn/8khzmyrkaqjwfp85o09nnfacg5sfp31/bs3b92+s3m3uhf/wcpd1qphh40qnwv9qotssuome1yyvuvwsktqjospykfp+btl/htgtofkfrdzgg1zmpe845ryr41a/zp4smyvlui8y7s6whsdlqyw+1zygdt7lwvadkwsnfenx9ebzoxhqskw2yez/st1fw4bmzifvu0cyc4hcs7lkxcy8kilz68/upz0px0enqsexdxgkfytkbbzooicbwjqk/ebnclnhic125u8uasd9alaybvedwijxo5hrt94rgizm2mpimym4qiww4poy6lgrgwlyqwh52tcbg5kkjmzroopladjmdg4ctysfmr2/4yk5mbm89qpc2knzjo2jk+lduqbvrpwxbalzzkulspkavbbcuqw5pprk+yoekq5qxxolghcrfsygwtcvpnkbda/6l3uxscv2kdvm27socfokqprjf6ii/qeham+ol7qffw+ez/8if/n/+n/wkl9r8nzq2vm/74eyn7l0a==</latexit> <latexit sha1_base64="ho9z9luppxjt2rx39owcnvzhppo=">aaac5hicbvjnbxmxepuux2x5akbhlioioi1uot0kcrbuqssfyysruluciq/jtax67cx2hkar3llwamsvh8wn/8khzmyrkaqjwfp85o09nnfacg5sfp31/bs3b92+s3m3uhf/wcpd1qphh40qnwv9qotssuome1yyvuvwsktqjospykfp+btl/htgtofkfrdzgg1zmpe845ryr41a/zp4smyvlui8y7s6whsdlqyw+1zygdt7lwvadkwsnfenx9ebzoxhqskw2yez/st1fw4bmzifvu0cyc4hcs7lkxcy8kilz68/upz0px0enqsexdxgkfytkbbzooicbwjqk/ebnclnhic125u8uasd9alaybvedwijxo5hrt94rgizm2mpimym4qiww4poy6lgrgwlyqwh52tcbg5kkjmzroopladjmdg4ctysfmr2/4yk5mbm89qpc2knzjo2jk+lduqbvrpwxbalzzkulspkavbbcuqw5pprk+yoekq5qxxolghcrfsygwtcvpnkbda/6l3uxscv2kdvm27socfokqprjf6ii/qeham+ol7qffw+ez/8if/n/+n/wkl9r8nzq2vm/74eyn7l0a==</latexit> <latexit sha1_base64="ho9z9luppxjt2rx39owcnvzhppo=">aaac5hicbvjnbxmxepuux2x5akbhlioioi1uot0kcrbuqssfyysruluciq/jtax67cx2hkar3llwamsvh8wn/8khzmyrkaqjwfp85o09nnfacg5sfp31/bs3b92+s3m3uhf/wcpd1qphh40qnwv9qotssuome1yyvuvwsktqjospykfp+btl/htgtofkfrdzgg1zmpe845ryr41a/zp4smyvlui8y7s6whsdlqyw+1zygdt7lwvadkwsnfenx9ebzoxhqskw2yez/st1fw4bmzifvu0cyc4hcs7lkxcy8kilz68/upz0px0enqsexdxgkfytkbbzooicbwjqk/ebnclnhic125u8uasd9alaybvedwijxo5hrt94rgizm2mpimym4qiww4poy6lgrgwlyqwh52tcbg5kkjmzroopladjmdg4ctysfmr2/4yk5mbm89qpc2knzjo2jk+lduqbvrpwxbalzzkulspkavbbcuqw5pprk+yoekq5qxxolghcrfsygwtcvpnkbda/6l3uxscv2kdvm27socfokqprjf6ii/qeham+ol7qffw+ez/8if/n/+n/wkl9r8nzq2vm/74eyn7l0a==</latexit> <latexit sha1_base64="ze7n0shh43t5guaznhuz21scndo=">aaac3nicbvjnj9mwehxc12742ajhlioqovralayqamfkk3hhueiulapl5bit1lrhcbzttop62qshqhvld3hjj3dgtyne2x3j8vobnzp2jjnccmoj6lfnx7t+4+atvf3g9p279w5a9x98mhmpofz5lnm9sjhbkrt2rbasb4vgliust5oznyv/6ry1ebl6bxcfjji2vsivnflhjvt/ontgbjuuqzphxp0vu8eoaxq/l2io9fnleiif5byar5ov6ybiyytmvdgqxtgcaspsxdvsoelbwg2jmiaqhuddehp10lrs/xqydabw5f91r9fqq0hdropoxy1nnzm+ommumzswub2jg7faus+qdxzb3ia2aexk3ppfjzkvm1sws2bmmi4ko6qytojldi8vdramn7epdh1ulemzqur5lkhjmam467illnts/xevy4xzziltzszozlzvrv7lg5y2ftgqhcpki4qvc6wlbjvdatgwerq5lqshgnfc3rx4jgngrfssgwtcvp3kxda/7l3sxe+ety9fn93yi4/iyxksmdwnx+qtosf9wj3qxxjfvo8+87/6p/zltdt3mpihzmp8n38byrbksw==</latexit> <latexit sha1_base64="ze7n0shh43t5guaznhuz21scndo=">aaac3nicbvjnj9mwehxc12742ajhlioqovralayqamfkk3hhueiulapl5bit1lrhcbzttop62qshqhvld3hjj3dgtyne2x3j8vobnzp2jjnccmoj6lfnx7t+4+atvf3g9p279w5a9x98mhmpofz5lnm9sjhbkrt2rbasb4vgliust5oznyv/6ry1ebl6bxcfjji2vsivnflhjvt/ontgbjuuqzphxp0vu8eoaxq/l2io9fnleiif5byar5ov6ybiyytmvdgqxtgcaspsxdvsoelbwg2jmiaqhuddehp10lrs/xqydabw5f91r9fqq0hdropoxy1nnzm+ommumzswub2jg7faus+qdxzb3ia2aexk3ppfjzkvm1sws2bmmi4ko6qytojldi8vdramn7epdh1ulemzqur5lkhjmam467illnts/xevy4xzziltzszozlzvrv7lg5y2ftgqhcpki4qvc6wlbjvdatgwerq5lqshgnfc3rx4jgngrfssgwtcvp3kxda/7l3sxe+ety9fn93yi4/iyxksmdwnx+qtosf9wj3qxxjfvo8+87/6p/zltdt3mpihzmp8n38byrbksw==</latexit> <latexit sha1_base64="ze7n0shh43t5guaznhuz21scndo=">aaac3nicbvjnj9mwehxc12742ajhlioqovralayqamfkk3hhueiulapl5bit1lrhcbzttop62qshqhvld3hjj3dgtyne2x3j8vobnzp2jjnccmoj6lfnx7t+4+atvf3g9p279w5a9x98mhmpofz5lnm9sjhbkrt2rbasb4vgliust5oznyv/6ry1ebl6bxcfjji2vsivnflhjvt/ontgbjuuqzphxp0vu8eoaxq/l2io9fnleiif5byar5ov6ybiyytmvdgqxtgcaspsxdvsoelbwg2jmiaqhuddehp10lrs/xqydabw5f91r9fqq0hdropoxy1nnzm+ommumzswub2jg7faus+qdxzb3ia2aexk3ppfjzkvm1sws2bmmi4ko6qytojldi8vdramn7epdh1ulemzqur5lkhjmam467illnts/xevy4xzziltzszozlzvrv7lg5y2ftgqhcpki4qvc6wlbjvdatgwerq5lqshgnfc3rx4jgngrfssgwtcvp3kxda/7l3sxe+ety9fn93yi4/iyxksmdwnx+qtosf9wj3qxxjfvo8+87/6p/zltdt3mpihzmp8n38byrbksw==</latexit> <latexit sha1_base64="ze7n0shh43t5guaznhuz21scndo=">aaac3nicbvjnj9mwehxc12742ajhlioqovralayqamfkk3hhueiulapl5bit1lrhcbzttop62qshqhvld3hjj3dgtyne2x3j8vobnzp2jjnccmoj6lfnx7t+4+atvf3g9p279w5a9x98mhmpofz5lnm9sjhbkrt2rbasb4vgliust5oznyv/6ry1ebl6bxcfjji2vsivnflhjvt/ontgbjuuqzphxp0vu8eoaxq/l2io9fnleiif5byar5ov6ybiyytmvdgqxtgcaspsxdvsoelbwg2jmiaqhuddehp10lrs/xqydabw5f91r9fqq0hdropoxy1nnzm+ommumzswub2jg7faus+qdxzb3ia2aexk3ppfjzkvm1sws2bmmi4ko6qytojldi8vdramn7epdh1ulemzqur5lkhjmam467illnts/xevy4xzziltzszozlzvrv7lg5y2ftgqhcpki4qvc6wlbjvdatgwerq5lqshgnfc3rx4jgngrfssgwtcvp3kxda/7l3sxe+ety9fn93yi4/iyxksmdwnx+qtosf9wj3qxxjfvo8+87/6p/zltdt3mpihzmp8n38byrbksw==</latexit> <latexit sha1_base64="uflf4ijy4ipiu3wmxujfvy+wyaq=">aaacihicbvhrbtmwfhuygcmdvsyjl1duq500qgqbyxpie7zwoctkguqqctybzppjb9vpvkx5f76jn/4gnwuidvzj8vg55/hex2elfnbf8a8g3lhzd/pe1v1o+8hdrzu9x7tfra4mxxhxups0yxaludhywklms4osycsezrcfv/mzbrortprilivocjzxihecou9nez/2kjzwsk0g6qhqbfl6stmh90btvuzrlrhqgaofaqp54gofxvyraeuz7adtrck8p/i9egugmzjtgv6av0e/0ejbmqs9jx7txbsmjdyo1tmmvx48jnua2ydpqj90ctrt/aqzzascleoswtto4tjnamac4bkbifyws8yv2bzhhipwoj3u7sqb2ppmdhw7fikhlfuvo2aftcsi88qcuxo7nlur/8unk5e/ndrclzvdxa8l5zuep2h1ltatbrmtsw8yn8l3cvycgcad/7zidyfzf/jtmho5pbomn1/1t95109git8kzmiajosqn5bm5jspcg83gihgdvam3wyq8di+upwhqez6qgxf++a1hcshy</latexit> <latexit sha1_base64="uflf4ijy4ipiu3wmxujfvy+wyaq=">aaacihicbvhrbtmwfhuygcmdvsyjl1duq500qgqbyxpie7zwoctkguqqctybzppjb9vpvkx5f76jn/4gnwuidvzj8vg55/hex2elfnbf8a8g3lhzd/pe1v1o+8hdrzu9x7tfra4mxxhxups0yxaludhywklms4osycsezrcfv/mzbrortprilivocjzxihecou9nez/2kjzwsk0g6qhqbfl6stmh90btvuzrlrhqgaofaqp54gofxvyraeuz7adtrck8p/i9egugmzjtgv6av0e/0ejbmqs9jx7txbsmjdyo1tmmvx48jnua2ydpqj90ctrt/aqzzascleoswtto4tjnamac4bkbifyws8yv2bzhhipwoj3u7sqb2ppmdhw7fikhlfuvo2aftcsi88qcuxo7nlur/8unk5e/ndrclzvdxa8l5zuep2h1ltatbrmtsw8yn8l3cvycgcad/7zidyfzf/jtmho5pbomn1/1t95109git8kzmiajosqn5bm5jspcg83gihgdvam3wyq8di+upwhqez6qgxf++a1hcshy</latexit> <latexit sha1_base64="uflf4ijy4ipiu3wmxujfvy+wyaq=">aaacihicbvhrbtmwfhuygcmdvsyjl1duq500qgqbyxpie7zwoctkguqqctybzppjb9vpvkx5f76jn/4gnwuidvzj8vg55/hex2elfnbf8a8g3lhzd/pe1v1o+8hdrzu9x7tfra4mxxhxups0yxaludhywklms4osycsezrcfv/mzbrortprilivocjzxihecou9nez/2kjzwsk0g6qhqbfl6stmh90btvuzrlrhqgaofaqp54gofxvyraeuz7adtrck8p/i9egugmzjtgv6av0e/0ejbmqs9jx7txbsmjdyo1tmmvx48jnua2ydpqj90ctrt/aqzzascleoswtto4tjnamac4bkbifyws8yv2bzhhipwoj3u7sqb2ppmdhw7fikhlfuvo2aftcsi88qcuxo7nlur/8unk5e/ndrclzvdxa8l5zuep2h1ltatbrmtsw8yn8l3cvycgcad/7zidyfzf/jtmho5pbomn1/1t95109git8kzmiajosqn5bm5jspcg83gihgdvam3wyq8di+upwhqez6qgxf++a1hcshy</latexit> <latexit sha1_base64="uflf4ijy4ipiu3wmxujfvy+wyaq=">aaacihicbvhrbtmwfhuygcmdvsyjl1duq500qgqbyxpie7zwoctkguqqctybzppjb9vpvkx5f76jn/4gnwuidvzj8vg55/hex2elfnbf8a8g3lhzd/pe1v1o+8hdrzu9x7tfra4mxxhxups0yxaludhywklms4osycsezrcfv/mzbrortprilivocjzxihecou9nez/2kjzwsk0g6qhqbfl6stmh90btvuzrlrhqgaofaqp54gofxvyraeuz7adtrck8p/i9egugmzjtgv6av0e/0ejbmqs9jx7txbsmjdyo1tmmvx48jnua2ydpqj90ctrt/aqzzascleoswtto4tjnamac4bkbifyws8yv2bzhhipwoj3u7sqb2ppmdhw7fikhlfuvo2aftcsi88qcuxo7nlur/8unk5e/ndrclzvdxa8l5zuep2h1ltatbrmtsw8yn8l3cvycgcad/7zidyfzf/jtmho5pbomn1/1t95109git8kzmiajosqn5bm5jspcg83gihgdvam3wyq8di+upwhqez6qgxf++a1hcshy</latexit> <latexit sha1_base64="4ksqatidrqcsb6hsx7cqumz9m9c=">aaacwxicdvfbb9mwfhbczspccjzyckrv1epqjqhpiebmggceh0rzuf1vjnvsmjl2sj2yksqf5g3/bjfleovgsja/853vxoytlvjyf8dnqxjt+o2be/u3ott37t6733vw8kvvlee44vpqk2bmohqkj044iwlpkbwzxops5mm2frxgy4vwx9ymxfnblkrkgjpnqxnvjk6yq/nmsnfi69nmbbr/vginrf+zacox2kexjqyd7qgwvkithumzc+0i3gg1vtgvozlqosbtib+ejud5/0qm8psizyawtiy/2vvxxzqafnvrtbxpg5prw6ses/26vhmvh4/j1uaqsdrqj50dzxu/6elzqkdlugtwtpo4dloagse4xcailcws8ro2xkmhihvoz3w7ggygnlmah8cf5abl/86owwhtpsi8smbuzxdjw/jfswnl8lezwqiycqj4eao8kua0bpcjc2gqo7nxghej/kzav8ww7vzwi/8jye6tr4lji/hrcfl5zf/wbfcb++qxeukgjceh5jb8ikdkqnjwpsbabtr8gh4py9ccs8ogy3lellly/wzwd9kp</latexit> <latexit sha1_base64="4ksqatidrqcsb6hsx7cqumz9m9c=">aaacwxicdvfbb9mwfhbczspccjzyckrv1epqjqhpiebmggceh0rzuf1vjnvsmjl2sj2yksqf5g3/bjfleovgsja/853vxoytlvjyf8dnqxjt+o2be/u3ott37t6733vw8kvvlee44vpqk2bmohqkj044iwlpkbwzxops5mm2frxgy4vwx9ymxfnblkrkgjpnqxnvjk6yq/nmsnfi69nmbbr/vginrf+zacox2kexjqyd7qgwvkithumzc+0i3gg1vtgvozlqosbtib+ejud5/0qm8psizyawtiy/2vvxxzqafnvrtbxpg5prw6ses/26vhmvh4/j1uaqsdrqj50dzxu/6elzqkdlugtwtpo4dloagse4xcailcws8ro2xkmhihvoz3w7ggygnlmah8cf5abl/86owwhtpsi8smbuzxdjw/jfswnl8lezwqiycqj4eao8kua0bpcjc2gqo7nxghej/kzav8ww7vzwi/8jye6tr4lji/hrcfl5zf/wbfcb++qxeukgjceh5jb8ikdkqnjwpsbabtr8gh4py9ccs8ogy3lellly/wzwd9kp</latexit> <latexit sha1_base64="4ksqatidrqcsb6hsx7cqumz9m9c=">aaacwxicdvfbb9mwfhbczspccjzyckrv1epqjqhpiebmggceh0rzuf1vjnvsmjl2sj2yksqf5g3/bjfleovgsja/853vxoytlvjyf8dnqxjt+o2be/u3ott37t6733vw8kvvlee44vpqk2bmohqkj044iwlpkbwzxops5mm2frxgy4vwx9ymxfnblkrkgjpnqxnvjk6yq/nmsnfi69nmbbr/vginrf+zacox2kexjqyd7qgwvkithumzc+0i3gg1vtgvozlqosbtib+ejud5/0qm8psizyawtiy/2vvxxzqafnvrtbxpg5prw6ses/26vhmvh4/j1uaqsdrqj50dzxu/6elzqkdlugtwtpo4dloagse4xcailcws8ro2xkmhihvoz3w7ggygnlmah8cf5abl/86owwhtpsi8smbuzxdjw/jfswnl8lezwqiycqj4eao8kua0bpcjc2gqo7nxghej/kzav8ww7vzwi/8jye6tr4lji/hrcfl5zf/wbfcb++qxeukgjceh5jb8ikdkqnjwpsbabtr8gh4py9ccs8ogy3lellly/wzwd9kp</latexit> <latexit sha1_base64="4ksqatidrqcsb6hsx7cqumz9m9c=">aaacwxicdvfbb9mwfhbczspccjzyckrv1epqjqhpiebmggceh0rzuf1vjnvsmjl2sj2yksqf5g3/bjfleovgsja/853vxoytlvjyf8dnqxjt+o2be/u3ott37t6733vw8kvvlee44vpqk2bmohqkj044iwlpkbwzxops5mm2frxgy4vwx9ymxfnblkrkgjpnqxnvjk6yq/nmsnfi69nmbbr/vginrf+zacox2kexjqyd7qgwvkithumzc+0i3gg1vtgvozlqosbtib+ejud5/0qm8psizyawtiy/2vvxxzqafnvrtbxpg5prw6ses/26vhmvh4/j1uaqsdrqj50dzxu/6elzqkdlugtwtpo4dloagse4xcailcws8ro2xkmhihvoz3w7ggygnlmah8cf5abl/86owwhtpsi8smbuzxdjw/jfswnl8lezwqiycqj4eao8kua0bpcjc2gqo7nxghej/kzav8ww7vzwi/8jye6tr4lji/hrcfl5zf/wbfcb++qxeukgjceh5jb8ikdkqnjwpsbabtr8gh4py9ccs8ogy3lellly/wzwd9kp</latexit> Error as a function X : input examples Y : output examples X ~x Y f(x) 8~x 2 X : learned weights ˆf(~x) : model prediction ˆf(~x) ~w ~x Error depends on the data and the weights. (X, Y, ~w) = X f(x) ~w ~x 2 ~x2x For a given data set, error is a function of the weights. ( ~w) = X f(x) ~w ~x 2 ~x2x
<latexit sha1_base64="bbis2j/7lmdsrkdqenaoxgcsgig=">aaab8hicbvbns8naej34wetx1aoxxsiiqkleua9c0yvhcsyw21a22027dhctdjdicp0xxjyoepxneppfug1z0nyha4/3zpizfyacaeo6387c4tlyympprby+sbm1xdnzvddxqgj1scxj1qqxppxj6htmog0limirctomh9djv/lilwaxvdnzqgob+5jfjgbjpycmxslxhi5r2k1u3zo7azonxkgqukdrrxx1ejfjbzwgckx123mte+ryguy4hzu7qayjjkpcp21ljrzub/nk4he6teoprbgyjq2aql8nciy0zkroowu2az3rjcx/vhzqovmgzzjjdzvkuihkotixgr+pekxrynhmcsak2vsrgwcfibehlw0i3uzl88q/qv3uvnvtav2qskme+3aar+dbgdthbhrgawejz/akb452xpx352pauuaum3vwb87nd/wxj1o=</latexit> <latexit sha1_base64="bbis2j/7lmdsrkdqenaoxgcsgig=">aaab8hicbvbns8naej34wetx1aoxxsiiqkleua9c0yvhcsyw21a22027dhctdjdicp0xxjyoepxneppfug1z0nyha4/3zpizfyacaeo6387c4tlyympprby+sbm1xdnzvddxqgj1scxj1qqxppxj6htmog0limirctomh9djv/lilwaxvdnzqgob+5jfjgbjpycmxslxhi5r2k1u3zo7azonxkgqukdrrxx1ejfjbzwgckx123mte+ryguy4hzu7qayjjkpcp21ljrzub/nk4he6teoprbgyjq2aql8nciy0zkroowu2az3rjcx/vhzqovmgzzjjdzvkuihkotixgr+pekxrynhmcsak2vsrgwcfibehlw0i3uzl88q/qv3uvnvtav2qskme+3aar+dbgdthbhrgawejz/akb452xpx352pauuaum3vwb87nd/wxj1o=</latexit> <latexit sha1_base64="bbis2j/7lmdsrkdqenaoxgcsgig=">aaab8hicbvbns8naej34wetx1aoxxsiiqkleua9c0yvhcsyw21a22027dhctdjdicp0xxjyoepxneppfug1z0nyha4/3zpizfyacaeo6387c4tlyympprby+sbm1xdnzvddxqgj1scxj1qqxppxj6htmog0limirctomh9djv/lilwaxvdnzqgob+5jfjgbjpycmxslxhi5r2k1u3zo7azonxkgqukdrrxx1ejfjbzwgckx123mte+ryguy4hzu7qayjjkpcp21ljrzub/nk4he6teoprbgyjq2aql8nciy0zkroowu2az3rjcx/vhzqovmgzzjjdzvkuihkotixgr+pekxrynhmcsak2vsrgwcfibehlw0i3uzl88q/qv3uvnvtav2qskme+3aar+dbgdthbhrgawejz/akb452xpx352pauuaum3vwb87nd/wxj1o=</latexit> <latexit sha1_base64="bbis2j/7lmdsrkdqenaoxgcsgig=">aaab8hicbvbns8naej34wetx1aoxxsiiqkleua9c0yvhcsyw21a22027dhctdjdicp0xxjyoepxneppfug1z0nyha4/3zpizfyacaeo6387c4tlyympprby+sbm1xdnzvddxqgj1scxj1qqxppxj6htmog0limirctomh9djv/lilwaxvdnzqgob+5jfjgbjpycmxslxhi5r2k1u3zo7azonxkgqukdrrxx1ejfjbzwgckx123mte+ryguy4hzu7qayjjkpcp21ljrzub/nk4he6teoprbgyjq2aql8nciy0zkroowu2az3rjcx/vhzqovmgzzjjdzvkuihkotixgr+pekxrynhmcsak2vsrgwcfibehlw0i3uzl88q/qv3uvnvtav2qskme+3aar+dbgdthbhrgawejz/akb452xpx352pauuaum3vwb87nd/wxj1o=</latexit> 1D Inputs: y = mx + b
Minimizing Squared Error Goal: pick weights that minimize squared error. Approach #1: gradient descent Your reading derived this for 1D inputs. Does this look familiar?
Minimizing Squared Error Goal: pick weights that minimize squared error. Approach #2 (the right way): analytical solution The gradient is 0 at the error minimum. For linear regression, there is a unique global minimum with a closed formula: 1 ~w = X X T X T ~y 2 X ~x 0 ~x 1...~x n 3 1 1... 1 x 00 x 01... x 0n x 10 x 11... x 1n 6 4..... 7. 5 x d0 x d1... x dn
Change of Basis Polynomial regression is just linear regression with a change of basis. 2 3 2 x 0 3 x 1 6 7 4. 5! x d 2 x 0 3 (x 0 ) 2 x 1 (x 1 ) 2 6. 7 4 x d 5 (x d ) 2 quadratic basis Perform linear regression on the new representation. 2 x 0 3 x 1 6 7 4. 5! x d x 0 (x 0 ) 2 (x 0 ) 3 x 1 (x 1 ) 2 (x 1 ) 3. 6 x d 7 4(x d ) 2 5 (x d ) 3 cubic basis
Change of Basis Demo
K-Nearest Neighbors Hypothesis Space Supervised learning For every input in the data set, we know the output Classification Outputs are discrete Category labels The learned model: We ll talk about this in a bit.
K-nearest neighbors algorithm Training: Store all of the test points and their labels. Can use a data structure like a kd-tree that speeds up localized lookup. Prediction: Find the k training inputs closest to the test input. Output the most common label among them.
KNN implementation decisions How should we measure distance? (Euclidean distance between input vectors.) (and possible answers) What if there s a tie for the nearest points? (Include all points that are tied.) What if there s a tie for the most-common label? (Remove the most-distant point until a plurality is achieved.) What if there s a tie for both? (We need some arbitrary tie-breaking rule.)
KNN Hypothesis Space What does the learned model look like?
Weighted nearest neighbors Idea: closer points should matter more. Solution: weight the vote by Instead of contributing one vote for its label, each neighbor contributes votes for its label.
Why do we even need k neighbors? Idea: if we re weighting by distance, we can give all training points a vote. Points that are far away will just have really small weight. Why might this be a bad idea? Slow: we have to sum over every point in the training set. If we re using a kd-tree, we can get the neighbors quickly and sum over a small set.
The same ideas can apply to regression. K-nearest neighbors setting: Supervised learning (we know the correct output for each test point). Classification (small number of discrete labels). vs. Locally-weighted regression setting: Supervised learning (we know the correct output for each test point). Regression (outputs are continuous).
Locally-Weighted Average Instead of taking a majority vote, average the y-values. We could average over the k nearest neighbors. We could weight the average by distance. Better yet, do both.
Locally Weighted Regression Key idea: For any point we want to predict, compute a linear regression with error weighted by distance. As before, we find the linear function that minimizes total error, but we redefine total error, so that closer points count more: X ~x2data y ˆf(~x) dist( ~x t, ~x) = X ~x2data 2 ~x ~x t ~x 2
Supervised Learning Phases Fitting (a.k.a. training) Process data Create the model that will be used for prediction Prediction (a.k.a. testing) Evaluate the model on new inputs Compare models Describe the work done in each phase: Linear regression KNN