Parallel Computing and Optimization Techniques
Modern processors stopped getting faster simply by increasing clock speed around two decades ago, so squeezing more performance out of hardware now requires running many tasks simultaneously across multiple cores, specialized accelerators like GPUs, and carefully managed memory hierarchies. Researchers in this area study how to design, program, and evaluate these parallel systems — asking not just how fast a chip can go, but how efficiently it can do so given real-world constraints on power and heat. Active questions include how to automatically distribute work across increasingly heterogeneous hardware without requiring programmers to manage every detail by hand, and how to build accurate simulation and benchmarking tools that predict performance before expensive silicon is ever fabricated. As AI workloads and data-intensive applications continue to strain existing architectures, the tension between raw throughput, energy efficiency, and programmability remains one of the central unsolved problems in systems research.
- Works
- 201,890
- Total citations
- 2,321,020
- Keywords
- Parallel ComputingPerformance OptimizationGPU ComputingMulticore ArchitecturesMemory SystemsBenchmarking
Top papers in Parallel Computing and Optimization Techniques
Ordered by total citation count.
- Fast Parallel Algorithms for Short-Range Molecular Dynamics↗ 44,551
- fastp: an ultra-fast all-in-one FASTQ preprocessor↗ 29,561OA
- MapReduce↗ 18,534OA
- LINCS: A linear constraint solver for molecular simulations↗ 17,099
- PyTorch: An Imperative Style, High-Performance Deep Learning Library↗ 16,187OA
- Suspending OpenMP Tasks on Asynchronous Events: Extending the Taskwait Construct↗ 12,934OA
- Numerical recipes in Pascal: the art of scientific computing↗ 11,915
- The NumPy Array: A Structure for Efficient Numerical Computation↗ 11,046OA
- TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems↗ 9,777OA
- Computer Architecture: A Quantitative Approach↗ 9,568
- TensorFlow: A system for large-scale machine learning↗ 8,824OA
- Time, clocks, and the ordering of events in a distributed system↗ 8,436OA
Active researchers
Top authors in this area, ranked by h-index.