Mathematics for Data Analysis & Decision Making
Class Time/Place: MWF 1:10 PM - 2:00 PM at Chemistry 176
Instructor: Jesus A. De Loera
Office: 3228 Math. Sci. Building
Office Hours: Wed 2:10pm-3pm, Fr: 3:10-4pm
(or by appointment).
Office: 3131 Math. Sci. Building
Office Hours: Mondays 3:10pm-5pm
Mathematical models are at the heart of all
data science applications such as information searching (Google),
machine learning (e.g., face recognition algorithms),
airline-crew scheduling, social network analysis, and more.
This course discusses the mathematics used in the analysis of data and
the models used to make optimal decisions. Methods include advanced
linear algebra, graph theory, optimization, probability, and geometry.
These are some of the mathematical tools necessary for the data
classification, machine learning, clustering and pattern recognition
and for planning scheduling, and ranking.
The course should be useful to those students interested in data
sciences and in decisions models who wish to learn the basic
mathematical theory used in algorithms and software.
WARNING: This course is intensely hands-on. Grade is all based on
computer projects and tries to simulate real life working experience.
Unfortunately there is not a unique undergraduate textbook that contains all
the relevant mathematics (yet!!). Here are some of my sources below, but
they are NOT required, do not buy! I will try to provide students with my notes.
1) Optimization Models, by G. Calafiore and L. El Ghaoui, Cambridge Press, 2015
2) Matrix Methods in Data Mining and Pattern Recognition (Fundamentals of Algorithms), by Lars Elden, Published by SIAM
Note that this textbook has its official website: author's web site. There, you can find a lot of useful information (e.g., errata).
3) A gentle introduction to optimization, by B. Guenin, J. Koenemann, L. Tuncel
Cambridge University Press, 2015.
4) Who's #1? The science of rating and ranking, by A. Langville and C.D. Meyer.
Here is the
Syllabus (order may change)
Some Data Analysis and Decision Projects
- Project 1. (weeks 1-2) Linear Algebra models for Ranking and Learning from Data.
Eigenvalues and Singular Value Decompositions,
basic graph theory for Network analysis and ranking.
Modeling who is top-ranked. Finding key word
Pagerank algorithm and markov chains: How does Google work?
HOMEWORK: The recognition of a hand-written digit or ranking of electoral votes Analysis of text-documents through networks.
- Project 2. (weeks 3-4) Convex Optimization models for Supervised learning and decisions
First steps on
optimization models: linear & quadratic models.
Data Fitting/Regression vs sparse regression, Support vector Machines, LASSO, convex optimization basics.
HOMEWORK: Diagnosis of cancer through Support vector machines. More on text-mining, identifying keywords of an author.
- Project 3. (weeks 5-6-7) Discrete Models
Integer programming, discrete optimization techniques: Scheduling, Optimal Packing bins and bags. Stable assignment problems. Homework Routing problems (shortest path), Scheduling and transportation problems (job/transplant allocation). HOMEWORK: Sudoku solver, Network analysis (shortest paths), knapsack.
- Project 4. (weeks 8-9-10) Discrete and Non-linear Models
Non-linear programs. Convex relations, subgradients, Karush-Kuhn-Tucker optimality conditions, semi-definite programs.
HOMEWORK: Stocks Index, choosing a stock portfolio through optimization, pricing, supply chain management.
- Final Project (Due final day) . Putting all together:
Mathematical models for optimal decisions require both nonlinear and discrete components. The final project will require you to go from data collection to
decision making. TBA.
Prerequisite and Expectations
- MAT 167 or equivalent (i.e., solid understanding of elementary linear algebra, beyond MAT 22A or MAT 67). Mathematical maturity equivalent to at least one upper division course with proofs. WARNING: Trying to take this class without a good handle of linear algebra is not a good idea!
- Solid familiarity with programming is required. MATLAB will be
used in the class. The software SCIP will also be used in class.
- Although not required, having had MAT 168 before 160, would make this
class so much easier for you
- I will provide some tutorial for the software that we will use regularly. E.g.,
If do not know how to use MATLAB, then you need to self-study using the MATLAB
Primer and other material listed below.
- Create an account at the Math Department. Visit http://www.math.ucdavis.edu/comp/class-accts and follow the instructions.
It is important to create your account before you come to the Lab for the first time. You can then work either at the Undergraduate Computer Lab (2118 Math. Sci. Bldg.) or from any other lab in the campus or even from your home PC by remotely connecting to one of the departmental servers, such as [point,cosine,sine,tangent].math.ucdavis.edu. The lab is open 9am-5pm on weekdays.
- Attendance will not be taken, however, whether you are
able to attend class or not, you are responsible for
all the material presented in class.
This is a 4 unit course! You are expected to work
3 hours at home for each hour of lecture. In other words,
expect to have 10 hours of homework each week.
The grades will be calculated using the
average and standard deviation of the class. 100 points are possible
which will be divided as follows:
Some important rules will be followed:
- 4 Regular Projects 15 points (the lowest score is dropped),
- 1 midterm 20 points
- 1 Final Project 35 points and
- The due homework and other material will be posted at bottom of the course
web site. Homework is due at the beginning of class on
the day the assignment is due. LATE HOMEWORK WILL NOT BE ACCEPTED.
- Your work is not being graded solely from the final answer,
I expect you to write neatly, justify your reasoning and
show all missing details.
- I will assign some HW problems that require you to use MATLAB, SCIP or R.
- The projects will include writing code to investigate the application
topics presented in class and theory to understand methods.
SOFTWARE and other RESOURCES:
This class uses MATLAB and SCIP. For accessing the software necessary:
- Create an account at the Math Department. Visit
and follow the instructions.
It is important to create your account before you
come to the Lab for the first time. You can then work either at the
Undergraduate Computer Lab (2118 Math. Sci. Bldg.) or from any other lab in the
campus or even from your home PC by remotely connecting to one of the
departmental servers, such as [fuzzy,cosine,sine,tangent].math.ucdavis.edu. The
lab is open 9am-5pm on weekdays.
- Use your own account at your own department if your department
has the MATLAB license. This is the case for most of the engineering
- Buy a Student Version of MATLAB at UCD Bookstore (costs about
- Install Octave system on your own PC, which is free
software and emulates MATLAB. Caution: Most likely you can do all
the lab exercises, but I have not tested all the exercises yet.
Visit the official web site of Octave at
http://www.octave.org for downloading and installing information.
An introduction to ZIMPL (the language used to program SCIP) is available
in ZIMPL Manual. THe best way to learn it is to
follow the numerous examples provided in the text.
For MATLAB, please take a look at the following highly useful MATLAB
primers and tutorials.
HOMEWORKS & HANDOUTS
Homework 1, due April 18th 11:55pm
Homework 2, due May 4nd 11:55pm:
Homework 3, due May 23rd 11:55pm:
Homework 4, due Friday June 8th 11:55pm:
final project, due June 11th 6pm