Mathematics for Data Analysis & Decision Making
Class Time/Place: MWF 2:10 - 3:00PM Hoagland 168
Instructor: Jesus A. De Loera
Office: 3228 Math. Sci. Building
Email: deloera@math.ucdavis.edu
Office Hours: Wed 4:10pm-5pm, Thu: 3:10-4pm
(or by appointment).
TA: Lily Silverstein
Office: 2232 Math. Sci. Building
Email: lsilver@ucdavis.edu
Office Hours: Thursdays 1:10-2pm.
TA: Roger Tian
Office: 3129 Math. Sci. Building
Email: rgtian@ucdavis.edu
Office Hours: Tuesdays 11:00-12pm.
Course Description:
Data mining and Decision mathematical models are at the heart of
successful applications such as information search (Google),
airline-crew scheduling planning, social network analysis,
bioinformatics.
This course discusses the mathematics methods used in the analysis of
data and for modeling to make optimal decisions. Methods include advanced
linear algebra, optimization, probability, and geometry. These are
some of the mathematical tools necessary for the data classification,
machine learning, clustering and pattern recognition and for planning scheduling, and ranking. The course should be
useful to those students interested in data sciences and in decisions models
who wish to learn the basic mathematical theory used in algorithms and
software.
References:
Optimization Models, by G. Calafiore and L. El Ghaoui, Cambridge, 2015
Matrix Methods in Data Mining and Pattern Recognition (Fundamentals of Algorithms), by Lars Elden, Published by SIAM
Note that this textbook has its official website: author's web site. There, you can find a lot of useful information (e.g., errata).
Here is the
Syllabus
Five Data Analysis and Decision Projects
- Project 1. (weeks 1-2) Supervised Learning from Data
Data Fitting/Regression, notions of Sparsity, Support vector Machines.
Homework: Diagnosis of disease through optimization models and training data.
- Project 2. (weeks 3-4) Unsupervised Learning from Data.
Singular Value Decompositions, basics of convex optimization, recognition of a hand-written digit.
- Project 3. (weeks 5-6) Using Math to Cluster and Rank information
Clustering models, modeling who is top-ranked. Finding key word Pagerank algorithm and markov chains: How does Google work?
- Project 4. (weeks 7-8) Discrete Models
Integer programming, discrete optimization techniques: Scheduling, Optimal Packing bins and bags
- Final Project (Due final day) . Putting all together: Final project.
Prerequisite and Expectations
Grading:
The grades will be calculated using the
average and standard deviation of the class. 100 points are possible
which will be divided as follows:
- 4 Regular Projects 20 points (with the lowest score dropped),
- 1 Final Project 35 points (Saturday, June 04 at 10:30am) and
- Extra 5 points awarded for participation in class, office hours.
Some important rules will be followed:
- The due homework and other material will be posted at bottom of the course
web site. Homework is due at the beginning of class on
the day the assignment is due. LATE HOMEWORK WILL NOT BE ACCEPTED.
- Your work is not being graded solely from the final answer,
I expect you to write neatly, justify your reasoning and
show all missing details.
- I will assign some HW problems that require you to use MATLAB, SCIP or R.
- The projects will include writing code to investigate the application
topics presented in class and theory to understand methods.
SOFTWARE and other RESOURCES:
This class uses MATLAB and SCIP. For accessing the software necessary:
- Create an account at the Math Department. Visit
http://www.math.ucdavis.edu/comp/class-accts
and follow the instructions.
It is important to create your account before you
come to the Lab for the first time. You can then work either at the
Undergraduate Computer Lab (2118 Math. Sci. Bldg.) or from any other lab in the
campus or even from your home PC by remotely connecting to one of the
departmental servers, such as [fuzzy,cosine,sine,tangent].math.ucdavis.edu. The
lab is open 9am-5pm on weekdays.
- Use your own account at your own department if your department
has the MATLAB license. This is the case for most of the engineering
departments.
- Buy a Student Version of MATLAB at UCD Bookstore (costs about
$100).
- Install Octave system on your own PC, which is free
software and emulates MATLAB. Caution: Most likely you can do all
the lab exercises, but I have not tested all the exercises yet.
Visit the official web site of Octave at
http://www.octave.org for downloading and installing information.
An introduction to ZIMPL (the language used to program SCIP) is available
in ZIMPL Manual. THe best way to learn it is to
follow the numerous examples provided in the text.
For MATLAB, please take a look at the following highly useful MATLAB
primers and tutorials.
HOMEWORKS & HANDOUTS
Homework 1, due April 17th 11:55pm: NOTE: Part 3 of the cancer problem, in posted version, was removed
Homework 2, due May 2nd 11:55pm: Click here for the data necessary to do the main project 2.
Homework 3, due May 16th 11:55pm:
Homework 4, due May 27 11:55pm:
final project, due June 4th 10:30 AM: