Advanced statistical computing
Teachers
Carl Nettelblad, Behrang Mahjani, Salman Toor, Silvelyn Zwanzig
(The guest lecturer for statistical analysis of large data sets will be announced later.)
Many recent statistical methods are based on computationally intensive computer simulations. Also, new areas of statistical computing are emerging based on large amounts of data in several disciplines. A rapidly growing field is statistical analysis of data from life sciences. Such analysis can be made in fixed functionality tools or pipelines, but the statistical computing environment R has become very popular due to its greater flexibility.
Correctly used, R is a powerful resource where application-specific libraries can be combined with the use of current heterogeneous and distributed computing infrastructures. This course puts students existing practical experience on a more solid theoretical ground, allowing them to benefit from the knowledge of a world-class expert in the field in the final block of the course.
Goal of the course
This a new course covering some of the most recent applications of scientific computing to statistics, for an audience with some familiarity with R or other computing tools such as Python or Matlab. After finishing the course, the student can implement computationally intensive algorithms for statistical analysis in R, and optimize such software for use on modern computer resources. They also learn about the statistical challenges of dealing with large data sets, and how to use R for data-intensive computing. The course focuses on applications from life science.
Target audience
PhD students in Bioinformatics (and related fields), Mathematical Statistics, and Scientific Computing.
Prerequisite
Basic knowledge of mathematical statistics, linear algebra, and numerical methods. Some experience of programming in R, Python, or Matlab.
Credits
5 or 7.5. This course consists of three blocks, at 2.5 credits each. The first block is not strictly necessary for students with significant knowledge of programming in R.
Number of scheduled occasions
16 (11 lectures + 5 labs)
Course schedule
Block 1: Oct 5-18, Week 41
Block 2: Nov 2-6, Week 45
Block 3: Nov 16-20, Week 47
Exam Information
To pass each block, the student should finish the project of that block. The content of each project is related to the labs in that block.
Syllabus
- Block 1: Advanced programming in R (2.5 credits)
- Block 2: High performance programming in R (2.5 credits)
- Block 3: Statistical and numerical methods for analysis of large data sets, with focus on bioinformatics applications (2.5 Credits)