Transcription

Introduction toApplied Linear AlgebraVectors, Matrices, and Least SquaresStephen BoydDepartment of Electrical EngineeringStanford UniversityLieven VandenbergheDepartment of Electrical and Computer EngineeringUniversity of California, Los Angeles

University Printing House, Cambridge CB2 8BS, United KingdomOne Liberty Plaza, 20th Floor, New York, NY 10006, USA477 Williamstown Road, Port Melbourne, VIC 3207, Australia314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre,New Delhi – 110025, India79 Anson Road, #06–04/06, Singapore 079906Cambridge University Press is part of the University of Cambridge.It furthers the University’s mission by disseminating knowledge in the pursuit ofeducation, learning, and research at the highest international levels of excellence.www.cambridge.orgInformation on this title: www.cambridge.org/9781316518960DOI: 10.1017/9781108583664 Cambridge University Press 2018This publication is in copyright. Subject to statutory exceptionand to the provisions of relevant collective licensing agreements,no reproduction of any part may take place without the writtenpermission of Cambridge University Press.First published 2018Printed in the United Kingdom by Clays, St Ives plc, 2018A catalogue record for this publication is available from the British Library.ISBN 978-1-316-51896-0 HardbackAdditional resources for this publication at www.cambridge.org/IntroAppLinAlgCambridge University Press has no responsibility for the persistence or accuracyof URLs for external or third-party internet websites referred to in this publicationand does not guarantee that any content on such websites is, or will remain,accurate or appropriate.

ForAnna, Nicholas, and NoraDaniël and Margriet

ContentsPrefacexiI1Vectors1 Vectors1.1 Vectors . . . . . . . . . . . . . . .1.2 Vector addition . . . . . . . . . .1.3 Scalar-vector multiplication . . . .1.4 Inner product . . . . . . . . . . .1.5 Complexity of vector computationsExercises . . . . . . . . . . . . . . . . .3311151922252 Linear functions2.1 Linear functions . . .2.2 Taylor approximation2.3 Regression model . .Exercises . . . . . . . . . .29293538423 Norm and distance3.1 Norm . . . . . . . .3.2 Distance . . . . . .3.3 Standard deviation .3.4 Angle . . . . . . . .3.5 Complexity . . . . .Exercises . . . . . . . . .454548525663644 Clustering4.1 Clustering . . . . . . .4.2 A clustering objective .4.3 The k-means algorithm4.4 Examples . . . . . . . .4.5 Applications . . . . . .Exercises . . . . . . . . . . .69697274798587.

viiiContents5 Linear independence5.1 Linear dependence . . . .5.2 Basis . . . . . . . . . . .5.3 Orthonormal vectors . . .5.4 Gram–Schmidt algorithmExercises . . . . . . . . . . . .II.Matrices.89899195971031056 Matrices6.1 Matrices . . . . . . . . . . . .6.2 Zero and identity matrices . .6.3 Transpose, addition, and norm6.4 Matrix-vector multiplication . .6.5 Complexity . . . . . . . . . . .Exercises . . . . . . . . . . . . . . .1071071131151181221247 Matrix examples7.1 Geometric transformations7.2 Selectors . . . . . . . . . .7.3 Incidence matrix . . . . . .7.4 Convolution . . . . . . . .Exercises . . . . . . . . . . . . .1291291311321361448 Linear equations8.1 Linear and affine functions8.2 Linear function models . .8.3 Systems of linear equationsExercises . . . . . . . . . . . . .1471471501521599 Linear dynamical systems9.1 Linear dynamical systems9.2 Population dynamics . .9.3 Epidemic dynamics . . .9.4 Motion of a mass . . . .9.5 Supply chain dynamics .Exercises . . . . . . . . . . . .16316316416816917117410 Matrix multiplication10.1 Matrix-matrix multiplication .10.2 Composition of linear functions10.3 Matrix power . . . . . . . . .10.4 QR factorization . . . . . . . .Exercises . . . . . . . . . . . . . . .177177183186189191.

Contents11 Matrix inverses11.1 Left and right inverses .11.2 Inverse . . . . . . . . .11.3 Solving linear equations11.4 Examples . . . . . . . .11.5 Pseudo-inverse . . . . .Exercises . . . . . . . . . . .IIIix.Least squares.19919920220721021421722312 Least squares12.1 Least squares problem . . . . .12.2 Solution . . . . . . . . . . . .12.3 Solving least squares problems12.4 Examples . . . . . . . . . . . .Exercises . . . . . . . . . . . . . . .22522522723123423913 Least squares data fitting13.1 Least squares data fitting13.2 Validation . . . . . . . .13.3 Feature engineering . . .Exercises . . . . . . . . . . . .245. 245. 260. 269. 27914 Least squares classification14.1 Classification . . . . . .14.2 Least squares classifier .14.3 Multi-class classifiers .Exercises . . . . . . . . . . .28528528829730515 Multi-objective least squares15.1 Multi-objective least squares15.2 Control . . . . . . . . . . . .15.3 Estimation and inversion . .15.4 Regularized data fitting . . .15.5 Complexity . . . . . . . . . .Exercises . . . . . . . . . . . . . .30930931431632533033416 Constrained least squares16.1 Constrained least squares problem . . . .16.2 Solution . . . . . . . . . . . . . . . . . .16.3 Solving constrained least squares problemsExercises . . . . . . . . . . . . . . . . . . . . .339339344347352.

xContents17 Constrained least squares applications17.1 Portfolio optimization . . . . . . . .17.2 Linear quadratic control . . . . . . .17.3 Linear quadratic state estimation . .Exercises . . . . . . . . . . . . . . . . . .35735736637237818 Nonlinear least squares18.1 Nonlinear equations and least squares18.2 Gauss–Newton algorithm . . . . . . .18.3 Levenberg–Marquardt algorithm . . .18.4 Nonlinear model fitting . . . . . . . .18.5 Nonlinear least squares classification .Exercises . . . . . . . . . . . . . . . . . . .38138138639139940141219 Constrained nonlinear least squares19.1 Constrained nonlinear least squares19.2 Penalty algorithm . . . . . . . . .19.3 Augmented Lagrangian algorithm .19.4 Nonlinear control . . . . . . . . .Exercises . . . . . . . . . . . . . . . . .419419421422425434Appendices.437A Notation439B Complexity441C Derivatives and optimization443C.1 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 443C.2 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447C.3 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 448D Further study451Index455

PrefaceThis book is meant to provide an introduction to vectors, matrices, and leastsquares methods, basic topics in applied linear algebra. Our goal is to give thebeginning student, with little or no prior exposure to linear algebra, a good grounding in the basic ideas, as well as an appreciation for how they are used in manyapplications, including data fitting, machine learning and artificial intelligence, tomography, navigation, image processing, finance, and automatic control systems.The background required of the reader is familiarity with basic mathematicalnotation. We use calculus in just a few places, but it does not play a criticalrole and is not a strict prerequisite. Even though the book covers many topicsthat are traditionally taught as part of probability and statistics, such as fittingmathematical models to data, no knowledge of or background in probability andstatistics is needed.The book covers less mathematics than a typical text on applied linear algebra.We use only one theoretical concept from linear algebra, linear independence, andonly one computational tool, the QR factorization; our approach to most applications relies on only one method, least squares (or some extension). In this sensewe aim for intellectual economy: With just a few basic mathematical ideas, concepts, and methods, we cover many applications. The mathematics we do present,however, is complete, in that we carefully justify every mathematical statement.In contrast to most introductory linear algebra texts, however, we describe manyapplications, including some that are typically considered advanced topics, likedocument classification, control, state estimation, and portfolio optimization.The book does not require any knowledge of computer programming, and can beused as a conventional textbook, by reading the chapters and working the exercisesthat do not involve numerical computation. This approach however misses out onone of the most compelling reasons to learn the material: You can use the ideas andmethods described in this book to do practical things like build a prediction modelfrom data, enhance images, or optimize an investment portfolio. The growing powerof computers, together with the development of high level computer languagesand packages that support vector and matrix computation, have made it easy touse the methods described in this book for real applications. For this reason wehope that every student of this book will complement their study with computerprogramming exercises and projects, including some that involve real data. Thisbook includes some generic exercises that require computation; additional ones,and the associated data files and language-specific resources, are available online.

xiiPrefaceIf you read the whole book, work some of the exercises, and carry out computerexercises to implement or use the ideas and methods, you will learn a lot. Whilethere will still be much for you to learn, you will have seen many of the basic ideasbehind modern data science and other application areas. We hope you will beempowered to use the methods for your own applications.The book is divided into three parts. Part I introduces the reader to vectors,and various vector operations and functions like addition, inner product, distance,and angle. We also describe how vectors are used in applications to represent wordcounts in a document, time series, attributes of a patient, sales of a product, anaudio track, an image, or a portfolio of investments. Part II does the same formatrices, culminating with matrix inverses and methods for solving linear equations. Part III, on least squares, is the payoff, at least in terms of the applications.We show how the simple and natural idea of approximately solving a set of overdetermined equations, and a few extensions of this basic idea, can be used to solvemany practical problems.The whole book can be covered in a 15 week (semester) course; a 10 week(quarter) course can cover most of the material, by skipping a few applications andperhaps the last two chapters on nonlinear least squares. The book can also be usedfor self-study, complemented with material available online. By design, the pace ofthe book accelerates a bit, with many details and simple examples in parts I and II,and more advanced examples and applications in part III. A course for studentswith little or no background in linear algebra can focus on parts I and II, and coverjust a few of the more advanced applications in part III. A more advanced courseon applied linear algebra can quickly cover parts I and II as review, and then focuson the applications in part III, as well as additional topics.We are grateful to many of our colleagues, teaching assistants, and studentsfor helpful suggestions and discussions during the development of this book andthe associated courses. We especially thank our colleagues Trevor Hastie, RobTibshirani, and Sanjay Lall, as well as Nick Boyd, for discussions about data fittingand classification, and Jenny Hong, Ahmed Bou-Rabee, Keegan Go, David Zeng,and Jaehyun Park, Stanford undergraduates who helped create and teach the courseEE103. We thank David Tse, Alex Lemon, Neal Parikh, and Julie Lancashire forcarefully reading drafts of this book and making many good suggestions.Stephen BoydLieven VandenbergheStanford, CaliforniaLos Angeles, California

Part IVectors

Chapter 1VectorsIn this chapter we introduce vectors and some common operations on them. Wedescribe so