CS435 / CS635 - Parallel Scientific Computing: Models, Algorithms and Implementation
University of Western Ontario
Computer Science Department
Date: September 7, 2009
Current hardware improvements have focused on increasing
the number of computations that can be performed
in parallel rather than on increasing clock speed alone.
This change has brought multi-core workstations
to the desktop, expanding interest in parallel algorithms
and software capable of exploiting these computing resources.
The aim of the course is to introduce you to the art of
analyzing and designing parallel algorithms. The following
concepts will guide our quest for high performance:
finding parallelism, scalability,
granularity, locality, synchronization,
scheduling and load balance.
Out of the course, you are anticipated to
have an in depth understanding of
some important parallel machine models,
parallel complexity theory,
fundamental parallel algorithms
(for sorting, linear algebra, FFT)
and parallel programming environments
and tools such as MPI, OpenMP, Cilk, Kaapi, UPC, Linpack.
- Prerequisites.
- CS 210a/b, 211a/b, 305a/b.
A good familiarity with linear algebra
and complexity analysis of algorithms
is recommended.
- Outlines.
- This presents the contents of the course, its assignments, quizzes and projects.
outline.html
- Lectures (tentative schedule).
-
- Student Evaluation.
- The course is very oriented toward assignments and projects.
Assignments and projects constitute
40% and 40% of the course mark, respectively.
There is no midterm examination and no final examination.
However, there will be at least four quizzes.
Quizzes constitute 20% of the course mark.
- Assignments and projects for Winter 2008.
-
- Software and Courses.
- These are links to some books and software related to the course.
- Conferences.
-
- Papers on Parallel Scientific Computing.
- Estimating Execution Time of Distributed Applications by Maciej Drozdowski
- pARMS: A Package for Solving General Sparse Linear Systems on Parallel Computers by Y. Saad and M. Sosonkina
- Fast Scheduling and Partitioning Algorithm in the Multi-processor System with Redundant Communication Resources by Eryk Laskowski
- Evaluation of Parallel Programs by Measurement of Its Granularity by Jan Kwiatkowski
- Parallel Triangular Decompositions of an Oil Refining Simulation by Xiaodong Zhang
- Parallel Algorithms for Arrangements by Richard Anderson, Paul Beame and Erik Brisson
- Efficient parallel computation of arrangements of hyperplanes in d dimensions by Torben Hagerup, Hermann Jung and Emo Welzl
- Recursive Array Layouts and Fast Parallel Matrix Multiplication by
Siddhartha Chatterjee, Alvin R. Lebeck, Praveen K. Patnala and Mithuna Thottethodi
- A Quantitative Comparison of Parallel Computation Models by Ben H.H. Juurlink and Harry A.G. Wijshoff
- Towards Efficiemlcy and Portability: Programming with the BSP Model by Mark Gouclreau, Kevin Lang Satish Rao, Torsten Suel and Thanasis Tsantilas
- A BSP Parallel Model for the Göttfert Algorithm over F2 by Fatima Abu Salem
- Fast Rectangular Matrix Multiplications and Improving Parallel Matrix Computations by Xiaohan Huang and Victor Y. Pan
- NC2 Computation of Gcd-free Basis and Application to Parallel Algebraic Numbers Computation by T. Gautier and J.L. Roth
- Processor-Efficient Parallel Matrix Inversion over .Abstract Fields: Two Extensions by W. Eberly
- Complexity of the Wu-Ritt decomposition by Á. Szántó
- Distributed Partial Evaluation by Michael Sperber, Herbert Klaeren and Peter Thiemann
- Processor Efficient Parallel Solution of Linear Systems over an Abstract Field by E. Kaltofen and V. Pan
- On the Parallel Cbmplexity of Matrix Factorization Algorithms byMauro Leoncini, Giovanni Manzi and Luciano Margara
- Multidimensional, Multiprocessor, Out-of-Core FFTs with Distributed Memory and Parallel Disks by Lauren M. Baptist and Thomas H. Cormen
- Multithreaded Algorithms for the Fast Fourier Transform by Parimala Thulasiraman, Kevin B. Theobald, Ashfaq A. Khokhar and Guang R. Gao
- How much can we speedup Gaussian Elimination with Pivoting? by M. Leoncini
- Communication Efficient Matrix Multiplication on Hypercubes by Himanshu Gupta and P. Sadayappan
- Distributed Data Structures and Algorithms for Gröbner Basis Computation by SOUMEN CHAKRABARTI and KATHERINE YELICK
- On the Correctness of a Distributed Memory Gröbner Basis Algorithm by SOUMEN CHAKRABARTI and KATHERINE YELICK
- Extended Parallelism in the Gröbner Basis Algorithm by Stephen A. Schwab
- A Fine-Grained Parallel Completion Procedure by Reinhard Bündgen, Manfred Göbel and Wolfgang Küchlin
- Parallelization of The Sparse Modular GCD Algorithm for Multivariate Polynomials on Shared Memory Multiprocessors by Mohamed Omar Rayes, Paul S. Wang and Kenneth Weber
- Parallel Computation of Modular Multivariate Polynomial Resultants on a Shared Memory Machine by H. Hong and H. W. Loidl
- A Parallel Completion Procedure for Term Rewriting Systems by Katherine Yelick and Stephen Garland
- Programming Models for Irregular Applications by Katherine Yelick
- Data Structures for Irregular Applications by Katherine Yelick
- Strategy-Accurate Parallel Buchberger Algorithms by G. Attardi and C. Traverso
- Work Efficient Parallel Solution of Toeplitz Systems and Polynomial GCD by John H. Reif
- Processor-Efficient Parallel Matrix Inversion over .Abstract Fields: Two Extensions by W. Eberly
- A New Approach to Parallel Computation of Polynomial GCD and to Related Parallel Computations over Fields and Integer Rings by Victor Y. Pan
- Processor-Efficient Parallel Computation of Polynomial Greatest Common Divisors by Erich Kaltofen