[1] G. Strang, Introduction to Linear Algebra, 5th ed., Wellesley-Cambridge Press (2016).
[2] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press (2004).
[3] G. H. Golub and C. F. Van Loan, Matrix Computations, 4th ed., Johns Hopkins University Press (2013).
[4] I. Goodfellow, Y. Bengio and A. Courville, Deep Learning, MIT Press (2016).
[5] V. N. Vapnik, Statistical Learning Theory, Wiley (1998).
[6] S. Sra, S. Nowozin and S. J. Wright, Optimization for Machine Learning, MIT Press (2012).
[7] M. A. Nielsen, Neural Networks and Deep Learning, Determination Press (2016).
[8] R. T. Rockafellar and R. J.-B. Wets, Variational Analysis, Springer (1997).
[9] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed., MIT Press (2018).
[10] A. Kendall, Quantum Computing for Computer Scientists, Cambridge University Press (2020).
[11] G. James, D. Witten, T. Hastie and R. Tibshirani, An Introduction to Statistical Learning: with Applications in R, Springer (2013).
[12] E. Alpaydin, Introduction to Machine Learning, 4th ed., MIT Press (2020).
[13] J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis, Cambridge University Press (2004).