Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Research: HPC Parallel Patterns

This note grounds the capOS proposal for generic parallel processing pattern coverage. It is not a request to port full HPC suites immediately. The point is to classify which algorithm shapes capOS benchmarks should eventually cover so future SMP, threading, runtime, storage, network, and multi-node claims do not rest only on embarrassingly parallel worker loops.

Source Set

Consequences For capOS

The current capOS CPU-scaling benchmarks are necessary but narrow. They exercise static worker partitioning, final result verification, and a small amount of spawn/join or process-wait coordination. That covers one important HPC pattern: independent tasks with a final reduction. It does not cover:

  • structured grids and stencil/halo exchange;
  • dense tiled matrix work;
  • sparse matrix and irregular memory access;
  • FFT/transposes and global all-to-all style communication;
  • graph frontier expansion and high-fanout irregular queues;
  • task graphs with dependency scheduling and cancellation;
  • collectives as first-class operations;
  • multi-node communication and authority boundaries.

The benchmark plan should therefore treat “parallel processing” as a matrix of patterns rather than a single scaling demo. A useful capOS coverage target is:

Pattern familySource precedentcapOS evidence it should force
Static map/reduceOpenMP loop/reduction, NAS EPlow-overhead thread/process creation, result aggregation, no hot-path syscalls
Dynamic task graphOpenMP tasks, Berkeley composition pointwork queues, cancellation, dependency fan-in/fan-out, scheduler fairness under uneven tasks
Stencil and halo exchangeNAS MG/BT/SP/LUshared buffers, neighbor exchange, barriers, cache locality, future network transport
Dense tiled linear algebraHPL/LINPACKcompute locality, tile scheduling, reductions, optional SIMD/library runtime support
Sparse iterative solverHPCG, NAS CGirregular memory access, sparse matrix-vector work, global dot-product reductions
FFT/transposeNAS FTall-to-all movement, temporary buffers, memory pressure, future multi-node transpose
Sort/partitionNAS ISall-to-all buckets, prefix/scan, allocator and queue pressure
Graph frontierGraph500irregular frontier queues, atomic-like visited updates, high fanout, load imbalance
Collective communicationMPI collectivesbarrier, broadcast, scatter/gather, reduce/allreduce, all-to-all semantics
Pipeline/streamBerkeley composition point, future service graphsbounded queues, backpressure, stage-local authority, telemetry

The near-term capOS subset should stay CPU-only and single-node until the selected in-process threading milestone is closed. The first expansion should add pattern kernels that reuse existing userspace/runtime mechanisms, then let future networking and storage milestones add multi-node and data-intensive variants.