# Mathematics for Data Analysis & Decision Making

Class Time/Place: MWF 1:10 PM - 2:00 PM at Chemistry 176

Instructor: Jesus A. De Loera
Office: 3228 Math. Sci. Building
Email: deloera@math.ucdavis.edu
Office Hours: Wed 2:10pm-3pm, Fr: 3:10-4pm (or by appointment).

TA:Ji Chen
Office: 3131 Math. Sci. Building
Email: ljichen@math.ucdavis.edu
Office Hours: Mondays 3:10pm-5pm

Course Description: Mathematical models are at the heart of all data science applications such as information searching (Google), machine learning (e.g., face recognition algorithms), airline-crew scheduling, social network analysis, and more.

This course discusses the mathematics used in the analysis of data and the models used to make optimal decisions. Methods include advanced linear algebra, graph theory, optimization, probability, and geometry. These are some of the mathematical tools necessary for the data classification, machine learning, clustering and pattern recognition and for planning scheduling, and ranking.

The course should be useful to those students interested in data sciences and in decisions models who wish to learn the basic mathematical theory used in algorithms and software.

WARNING: This course is intensely hands-on. Grade is all based on computer projects and tries to simulate real life working experience.

References:

Unfortunately there is not a unique undergraduate textbook that contains all the relevant mathematics (yet!!). Here are some of my sources below, but they are NOT required, do not buy! I will try to provide students with my notes.

1) Optimization Models, by G. Calafiore and L. El Ghaoui, Cambridge Press, 2015

2) Matrix Methods in Data Mining and Pattern Recognition (Fundamentals of Algorithms), by Lars Elden, Published by SIAM
Note that this textbook has its official website: author's web site. There, you can find a lot of useful information (e.g., errata).

3) A gentle introduction to optimization, by B. Guenin, J. Koenemann, L. Tuncel Cambridge University Press, 2015.

4) Who's #1? The science of rating and ranking, by A. Langville and C.D. Meyer.

Here is the

## Syllabus (order may change)

#### Some Data Analysis and Decision Projects

• Project 1. (weeks 1-2) Linear Algebra models for Ranking and Learning from Data.
Eigenvalues and Singular Value Decompositions, basic graph theory for Network analysis and ranking. Modeling who is top-ranked. Finding key word Pagerank algorithm and markov chains: How does Google work? HOMEWORK: The recognition of a hand-written digit or ranking of electoral votes Analysis of text-documents through networks.
• Project 2. (weeks 3-4) Convex Optimization models for Supervised learning and decisions
First steps on optimization models: linear & quadratic models. Data Fitting/Regression vs sparse regression, Support vector Machines, LASSO, convex optimization basics. HOMEWORK: Diagnosis of cancer through Support vector machines. More on text-mining, identifying keywords of an author.
• Project 3. (weeks 5-6-7) Discrete Models
Integer programming, discrete optimization techniques: Scheduling, Optimal Packing bins and bags. Stable assignment problems. Homework Routing problems (shortest path), Scheduling and transportation problems (job/transplant allocation). HOMEWORK: Sudoku solver, Network analysis (shortest paths), knapsack.
• Project 4. (weeks 8-9-10) Discrete and Non-linear Models
Non-linear programs. Convex relations, subgradients, Karush-Kuhn-Tucker optimality conditions, semi-definite programs. HOMEWORK: Stocks Index, choosing a stock portfolio through optimization, pricing, supply chain management.
• Final Project (Due final day) . Putting all together: Mathematical models for optimal decisions require both nonlinear and discrete components. The final project will require you to go from data collection to decision making. TBA.

Prerequisite and Expectations
• MAT 167 or equivalent (i.e., solid understanding of elementary linear algebra, beyond MAT 22A or MAT 67). Mathematical maturity equivalent to at least one upper division course with proofs. WARNING: Trying to take this class without a good handle of linear algebra is not a good idea!
• Solid familiarity with programming is required. MATLAB will be used in the class. The software SCIP will also be used in class.

<\li>

• Although not required, having had MAT 168 before 160, would make this class so much easier for you

<\li>

• I will provide some tutorial for the software that we will use regularly. E.g., If do not know how to use MATLAB, then you need to self-study using the MATLAB Primer and other material listed below.
• Create an account at the Math Department. Visit http://www.math.ucdavis.edu/comp/class-accts and follow the instructions.

It is important to create your account before you come to the Lab for the first time. You can then work either at the Undergraduate Computer Lab (2118 Math. Sci. Bldg.) or from any other lab in the campus or even from your home PC by remotely connecting to one of the departmental servers, such as [point,cosine,sine,tangent].math.ucdavis.edu. The lab is open 9am-5pm on weekdays.

• Attendance will not be taken, however, whether you are able to attend class or not, you are responsible for all the material presented in class.
• This is a 4 unit course! You are expected to work 3 hours at home for each hour of lecture. In other words, expect to have 10 hours of homework each week.

The grades will be calculated using the average and standard deviation of the class. 100 points are possible which will be divided as follows:
• 4 Regular Projects 15 points (the lowest score is dropped),
• 1 midterm 20 points
• 1 Final Project 35 points and
Some important rules will be followed:
• The due homework and other material will be posted at bottom of the course web site. Homework is due at the beginning of class on the day the assignment is due. LATE HOMEWORK WILL NOT BE ACCEPTED.
• Your work is not being graded solely from the final answer, I expect you to write neatly, justify your reasoning and show all missing details.
• I will assign some HW problems that require you to use MATLAB, SCIP or R.
• The projects will include writing code to investigate the application topics presented in class and theory to understand methods.

SOFTWARE and other RESOURCES:
This class uses MATLAB and SCIP. For accessing the software necessary:

• Create an account at the Math Department. Visit http://www.math.ucdavis.edu/comp/class-accts and follow the instructions.

It is important to create your account before you come to the Lab for the first time. You can then work either at the Undergraduate Computer Lab (2118 Math. Sci. Bldg.) or from any other lab in the campus or even from your home PC by remotely connecting to one of the departmental servers, such as [fuzzy,cosine,sine,tangent].math.ucdavis.edu. The lab is open 9am-5pm on weekdays.

• Use your own account at your own department if your department has the MATLAB license. This is the case for most of the engineering departments.
• Buy a Student Version of MATLAB at UCD Bookstore (costs about \$100).
• Install Octave system on your own PC, which is free software and emulates MATLAB. Caution: Most likely you can do all the lab exercises, but I have not tested all the exercises yet. Visit the official web site of Octave at http://www.octave.org for downloading and installing information.

An introduction to ZIMPL (the language used to program SCIP) is available in ZIMPL Manual. THe best way to learn it is to follow the numerous examples provided in the text.

For MATLAB, please take a look at the following highly useful MATLAB primers and tutorials.

# HOMEWORKS & HANDOUTS

• Homework 1, due April 18th 11:55pm

• Homework 2, due May 4nd 11:55pm:

• Homework 3, due May 23rd 11:55pm:

• Homework 4, due Friday June 8th 11:55pm:

• final project, due June 11th 6pm