Some keywords (to be completed)
These are links to the Wikipedia pages of some keywords
for this course.
Some other courses on HPC and parallelism
These are links to courses on HPC and Parallelism, which have inspired
this course. In particular, I am grateful to
Saman P. Amarasinghe (MIT), Matteo Frigo (Axis Semiconductor),
Charles E. Leiserson (MIT)
Markus Pueschel (CMU) Jeremy Johnson (Drexel Univ.)
for sharing with me
the sources of their course notes and other documents.
-
MIT OpenCourseWare
- MIT 6.895 Fall 2003 by Charles E. Leiserson
- MIT 6.172 Fall 2009 by Saman P. Amarasinghe and Charles E. Leiserson
- MIT 18.337 Applied Parallel Computing Spring 2009 by Alan Edelman
- MIT 6.852 Distributed Algorithms by Nancy A Lynch
- CMU 18-645 How to Write Fast Code (Spring 2008) by Markus Pueschel
-
Drexel University CS 540 High Performance Computing by Jeremy Johnson
-
Stony Brook University
CSE 690 - General Purpose Computing on Graphics Hardware
by Klaus Mueller
- U.C. Berkeley CS267/EngC233 Home Page by James Demmel
- CS 267, Applications of Parallel Computers by Jim Demmel (Univ. Berkeley)
- A LINPACK, LAPACK course by Jim Demmel (Univ. Berkeley)
- COMP 422 Parallel Computing, Spring 2008 at Rice University
- CS 402/CS 535: Parallel and Distributed Computing at UWO
-
U. Illinois ECE 498 Programming Massively Parallel Processors by Wen-Mei Hwu.
-
U. Illinois Heterogeneous Parallel Programming Wen-mei W. Hwu.
- Parallel Algorithms (WISM 459, 2005/2006)by Rob Bisseling
- Topics in Parallel Computing (CS 838, Univ. of Wisconsin-Madison, Spring 1999)by Pavel Tvrdik
- Parallel Algorithms (CS 662, San Diego Univ., Spring 1996)by Roger Whitney
- Computer Systems: A Programmer's Perspective (CS:APP) by Randal E. Bryant and David R. O'Hallaron
- Introduction to Parallel Programming (Uni. Waterloo)
- CS Honours course 7933: Distributed and High-Performance Computing at the Univ. of Adelaide (Australia)
- Principles of Distributed Computing (FS 2010) (ETH, Zurich)
- CS262: Introduction to Distributed Computing at Harvard, Spring, 2008
- An Introduction to Parallel and Distributed Computing at Middle East Technical University
- Introduction to Parallel and Distributed Computing (RISC, Austria)
- Calcul parallele
et distribue a l'Ecole Polytechnique.
Concurrency platforms
These are links to programming languages for multi-threaded parallelism.
Some HPC libraries in scientific computing
These are links to some HPC libraries in scientific computing.
These software makes use of auto-tuning techniques.
Performance analyzers and debuggers
These are links to some HPC libraries in scientific computing.
These software makes use of auto-tuning techniques.
Some links to hardware architecture pages
These are links to hardware architecture pages related to this course.
Some papers on HPC and parallelism: survey articles and books
Some papers on multi-threaded parallelsim
- The Implementation of the Cilk-5 Multithreaded Language by Matteo Frigo Charles E. Leiserson Keith H. Randall.
- Thread Scheduling for Multiprogrammed Multiprocessors by Nimar S. Arora, Robert D. Blumofe and C. Greg Plaxton
- KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors by Thierry Gautier, Xavier Besseron, Laurent Pigeon.
- The Cilkview Scalability Analyzer by Yuxiong He, Charles E. Leiserson and William M. Leiserson.
- Identifying Performance Bottlenecks in Work-Stealing Computations by Nathan R. Tallent and John M. Mellor-Crummey.
- The Design of OpenMP Tasks by Eduard Ayguade, Nawal Copty, Member, IEEE Computer Society, Alejandro Duran, Jay Hoeflinger, Yuan Lin, Federico Massaioli, Member, IEEE, Xavier Teruel, Priya Unnikrishnan, and Guansong Zhang.
- Scheduling Multithreaded Computations by Work Stealing by CHARLES E. LEISERSON and ROBERT D. BLUMOFE.
-
-
-
Some papers on many-core computing (GPGPU)
- A Memory Model for Scientific Algorithms on Graphics Processors by Naga K. Govindaraju, Scott Larsen, Jim Gray and Dinesh Manocha.
- The FFT on a GPU by Kenneth Moreland and Edward Angel.
- Fitting FFT onto the G80 Architecture by Vasily Volkov and Brian Kazian.
- Cache and Bandwidth Aware Matrix Multiplication on the GPU by Jesse D. Hall, Nathan A. Carr and John C. Hart.
- High Performance Discrete Fourier Transforms on Graphics Processors by Naga K. Govindaraju, Brandon Lloyd, Yuri Dotsenko, Burton Smith, and John Manferdelli.
- Understanding the Efficiency of GPU Algorithms for Matrix-Matrix Multiplication by K. Fatahalian, J. Sugerman, and P. Hanrahan.
- Reducing Branch Divergence in GPU Programs by Tianyi David Han and Tarek S. Abdelrahman
- Designing efficient sorting algorithms for manycore GPUs by Satish, Nadathur and Harris, Mark and Garland, Michael.
- Linear algebra operators for GPU implementation of numerical algorithms by Jens Krüger and Rüdiger Westermann.
- Simple Memory Machine Models for GPUs by Koji Nakano.
- Algorithmic Strategies for Optimizing the Parallel Reduction Primitive in CUDA by Pedro J. Martín, Luis F. Ayuso, Roberto Torres, Antonio Gavilanes.
-
Some papers on cache memories and the ideal cache model
- Cache-Oblivious Algorithms by Matteo Frigo, Charles E. Leiserson, Harald Prokop and Sridhar Ramachandran
- Cache-Oblivious Algorithms and Data Structures by Erik D. Demaine.
- Reducers and Other Cilk++ Hyperobjects by Matteo Frigo, Pablo Halpern, Charles E. Leiserson and Stephen Lewin-Berlin.
- The memory behavior of cache oblivious stencil computations by Matteo Frigo and Volker Strumpen.
- The Cache Complexity of Multithreaded Cache Oblivious Algorithms by Matteo Frigo and Volker Strumpen.
- Cache-oblivious comparison-based algorithms on multisets by Arash Farzan, Paolo Ferragina, Gianni Franceschini and J. Ian Munro.
- A Consistency Architecture for Hierarchical Shared Caches by Edya Ladan-Mozes and Charles E. Leiserson.
- Analysing Cache Effects in Distribution Sorting by Rahman, Naila and Raman, Rajeev.
- Communication-optimal parallel algorithm for strassen's matrix multiplication
by Ballard, Grey and Demmel, James and Holtz, Olga and Lipshitz, Benjamin and Schwartz, Oded.
-
Some papers on other models of computations for HPC
Some papers on applications of HPC
- Stencil Computation Optimization and Auto-tuning on State-of-the-Art Multicore Architectures by Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf and Katherine Yelick.
- SPIRAL: Code Generation for DSP Transforms by MARKUS PÜSCHEL, MEMBER, IEEE, JOSÉ M. F. MOURA, FELLOW, IEEE,
JEREMY R. JOHNSON, MEMBER, IEEE, DAVID PADUA, FELLOW, IEEE,
MANUELA M. VELOSO, BRYAN W. SINGER, JIANXIN XIONG, FRANZ FRANCHETTI,
ACA GACIC, STUDENT MEMBER, IEEE, YEVGEN VORONENKO, KANG CHEN,
ROBERT W. JOHNSON, AND NICHOLAS RIZZOLO.
- The Memory Behavior of Cache Oblivious Stencil Computations by Matteo Frigo and Volker Strumpen.
- Implementing FFTs in Practice by Steven G. Johnson andMatteo Frigo.
- A Work-Efficient Parallel Breadth-First Search Algorithm (or How to Cope with the Nondeterminism of Reducers) by Charles E. Leiserson andTao B. Schardl.
- Output-sensitive decoding for redundant residue systems by Majid Khonji, Clément Pernet, Jean-Louis Roch, Thomas Roche and Thomas Stalinski.
- Some Linear-Time Algorithms for Systolic Arrays by Richard P. Brent, H. T. Kung and Franklin T. Luk.
- Automated Empirical Optimization of Software and the ATLAS project.
- The Design and Implementation of FFTW3 by Matteo Frigo and Steven G. Johnson.
Some talks on HPC, parallelism and related topics
Notes and web sites on HPC, parallelism and related topics
Some conferences on HPC and parallelism