Seminars :: math.ucdavis.edu

Programming and Correctness Support for Large-Scale Data Processing

Mathematics of Data & Decisions

Speaker:	Caleb Stanford, UC Davis (CS)
Related Webpage:	https://web.cs.ucdavis.edu/~cdstanford/
Location:	1025 PDSB
Start time:	Tue, Oct 28 2025, 3:10PM

Description

Modern data science and computing workloads require massively parallel computations over large distributed datasets, sometimes operating in real time, using popular systems and frameworks such as Apache Spark and MapReduce. Bugs in such workloads can be difficult to detect due to the scale of the data involved, difficult-to-reproduce distributed executions, and semantic discrepancies between the program that is written and the one that is executed in parallel on individual machines. I will overview some of my PhD work and ongoing directions in providing programming language and correctness support to ensure the safety and correct execution of parallel and distributed workloads. Particular questions include: (1) What is an appropriate semantics for programs that operate over large-scale distributed data streams that captures parallelism and distribution requirements? (2) How can we ensure large-scale data computations are correct by providing formal guarantees such as type-safety? (3) How can we obtain formal bounds on the performance and data ingestion requirements of data processing operators? I will discuss a selection of these topics, as well as some potential areas for future work.

Quick Links+

Mathematics Colloquia and Seminars

Programming and Correctness Support for Large-Scale Data Processing

Description