# Mathematics for Data Analysis & Decision Making

Class Time/Place: MWF 2:10 - 3:00PM Hoagland 168

Instructor: Jesus A. De Loera
Office: 3228 Math. Sci. Building
Email: deloera@math.ucdavis.edu
Office Hours: Wed 4:10pm-5pm, Thu: 3:10-4pm (or by appointment).

TA: Lily Silverstein
Office: 2232 Math. Sci. Building
Email: lsilver@ucdavis.edu
Office Hours: Thursdays 1:10-2pm.

TA: Roger Tian
Office: 3129 Math. Sci. Building
Email: rgtian@ucdavis.edu
Office Hours: Tuesdays 11:00-12pm.

Course Description: Data mining and Decision mathematical models are at the heart of successful applications such as information search (Google), airline-crew scheduling planning, social network analysis, bioinformatics. This course discusses the mathematics methods used in the analysis of data and for modeling to make optimal decisions. Methods include advanced linear algebra, optimization, probability, and geometry. These are some of the mathematical tools necessary for the data classification, machine learning, clustering and pattern recognition and for planning scheduling, and ranking. The course should be useful to those students interested in data sciences and in decisions models who wish to learn the basic mathematical theory used in algorithms and software.

References:

Optimization Models, by G. Calafiore and L. El Ghaoui, Cambridge, 2015

Matrix Methods in Data Mining and Pattern Recognition (Fundamentals of Algorithms), by Lars Elden, Published by SIAM
Note that this textbook has its official website: author's web site. There, you can find a lot of useful information (e.g., errata).

Here is the

## Syllabus

#### Five Data Analysis and Decision Projects

• Project 1. (weeks 1-2) Supervised Learning from Data
Data Fitting/Regression, notions of Sparsity, Support vector Machines. Homework: Diagnosis of disease through optimization models and training data.
• Project 2. (weeks 3-4) Unsupervised Learning from Data.
Singular Value Decompositions, basics of convex optimization, recognition of a hand-written digit.
• Project 3. (weeks 5-6) Using Math to Cluster and Rank information
Clustering models, modeling who is top-ranked. Finding key word Pagerank algorithm and markov chains: How does Google work?
• Project 4. (weeks 7-8) Discrete Models
Integer programming, discrete optimization techniques: Scheduling, Optimal Packing bins and bags
• Final Project (Due final day) . Putting all together: Final project.

Prerequisite and Expectations
• MAT 22A or equivalent (i.e., practical understanding of elementary linear algebra). Mathematical maturity equivalent to at least one upper division course with proofs.
• Solid familiarity with programming is required. MATLAB will be used in the class. The software SCIP will also be used in class.

I will provide some tutorial for the software that we will use regularly. E.g., If do not know how to use MATLAB, then you need to self-study using the MATLAB Primer and other material listed below.

• Attendance will not be taken, however, whether you are able to attend class or not, you are responsible for all the material presented in class.
• This is a 4 unit course! You are expected to work 3 hours at home for each hour of lecture. In other words, expect to have 10 hours of homework each week.

The grades will be calculated using the average and standard deviation of the class. 100 points are possible which will be divided as follows:
• 4 Regular Projects 20 points (with the lowest score dropped),
• 1 Final Project 35 points (Saturday, June 04 at 10:30am) and
• Extra 5 points awarded for participation in class, office hours.
Some important rules will be followed:
• The due homework and other material will be posted at bottom of the course web site. Homework is due at the beginning of class on the day the assignment is due. LATE HOMEWORK WILL NOT BE ACCEPTED.
• Your work is not being graded solely from the final answer, I expect you to write neatly, justify your reasoning and show all missing details.
• I will assign some HW problems that require you to use MATLAB, SCIP or R.
• The projects will include writing code to investigate the application topics presented in class and theory to understand methods.

SOFTWARE and other RESOURCES:
This class uses MATLAB and SCIP. For accessing the software necessary:

• Create an account at the Math Department. Visit http://www.math.ucdavis.edu/comp/class-accts and follow the instructions.

It is important to create your account before you come to the Lab for the first time. You can then work either at the Undergraduate Computer Lab (2118 Math. Sci. Bldg.) or from any other lab in the campus or even from your home PC by remotely connecting to one of the departmental servers, such as [fuzzy,cosine,sine,tangent].math.ucdavis.edu. The lab is open 9am-5pm on weekdays.

• Use your own account at your own department if your department has the MATLAB license. This is the case for most of the engineering departments.
• Buy a Student Version of MATLAB at UCD Bookstore (costs about \$100).
• Install Octave system on your own PC, which is free software and emulates MATLAB. Caution: Most likely you can do all the lab exercises, but I have not tested all the exercises yet. Visit the official web site of Octave at http://www.octave.org for downloading and installing information.

An introduction to ZIMPL (the language used to program SCIP) is available in ZIMPL Manual. THe best way to learn it is to follow the numerous examples provided in the text.

For MATLAB, please take a look at the following highly useful MATLAB primers and tutorials.

# HOMEWORKS & HANDOUTS

• Homework 1, due April 17th 11:55pm:

NOTE: Part 3 of the cancer problem, in posted version, was removed

• Homework 2, due May 2nd 11:55pm:

Click here for the data necessary to do the main project 2.

• Homework 3, due May 16th 11:55pm:

• Homework 4, due May 27 11:55pm:

• final project, due June 4th 10:30 AM: