Wednesday, January 12, 2022

Concurrency, Parallelism, and Barrier Synchronization - Multiprocess and Multithreaded Programming

On a preemptive, timed sliced UNIX or Linux operating system (Solaris, AIX, Linux, BSD, OS X), program code from one process executes on the processor for a time-slice or quantum, after which, program code from another process executes for a time quantum. Linux divides the CPU time into epochs and each process has a specified time quantum within an epoch. The execution quantum is so tiny that the interleaved execution of independent, schedulable entities, often performing unrelated tasks, gives the appearance that multiple software applications are running in parallel. The currently executing process relinquishes the processor either voluntarily or involuntarily so that another process can execute its program code. This event is known as context switching. Context switching facilitates interleaved execution. Time sliced, interleaved execution of program code within an address space is known as concurrency.

The Linux kernel is fully preemptive. Fully preemptive means that the kernel can force a context switch for a higher priority process. When a process context switch occurs, the state of a process is saved to its process control block, and another process resumes execution on the processor. A UNIX process is heavyweight because it has its own address space, file descriptors, register state, and program counter. On Linux, the task_struct stores this information. When a process context switch occurs, this information must be saved, and this is a computationally expensive operation.

Concurrency applies to both threads and processes. A thread is also a schedulable entity, defined as an independent sequence of execution within a UNIX process. Both threads and processes are scheduled for execution on a processor core, and thread context switching is lighter in weight than process context switching.  UNIX processes often have multiple threads of execution that share the process's memory space. When multiple threads of execution are running inside of a process, they are typically performing related tasks. The Linux user-space APIs for process and thread management abstracts many details. However, the concurrency level can be adjusted to influence the time quantum so that system throughput is affected by shorter and longer durations of schedulable entity execution time.

While threads are typically lighter weight than processes, there have been different implementations across UNIX and Linux operating systems over the years. The three models that typically define the implementations across preemptive, time-sliced, multi-user UNIX and Linux operating systems are defined as follows - 1:1, 1:N, and M:N where 1:1 refers to the mapping of one user-space thread to one kernel thread, 1:N refers to the mapping of multiple user-space threads to a single kernel thread. M:N refers to the mapping of N user-space threads to M kernel threads.

Parallelism on a time-sliced, preemptive operating system means the simultaneous execution of multiple schedulable entities over a time quantum. Both processes and threads can execute in parallel across multiple cores or processors. Concurrency and parallelism are at play on a multi-user system with preemptive time-slicing and multiple processor cores. Affinity scheduling refers to scheduling processes and threads across multiple cores so that their concurrent and parallel execution is close to optimal.

Software applications are often designed to solve computationally complex problems. If the algorithm to solve a computationally complex problem can be parallelized, then multiple threads or processes can all run at the same time across multiple cores. Each process or thread executes by itself and does not contend for resources with other threads or processes working on the other parts of the problem to be solved. When each thread or process reaches the point where it can no longer contribute any more work to the solution of the problem, it waits at the barrier if a barrier has been implemented in software. When all threads or processes reach the barrier, their work output is synchronized and often aggregated by the primary process. Complex test frameworks often implement the barrier synchronization problem when certain types of tests can be run in parallel. Most individual software applications running on preemptive, time-sliced, multi-user Linux and UNIX operating systems are not designed with heavy, parallel thread or parallel, multiprocess execution in mind.

Minimizing lock granularity increases concurrency, throughput, and execution efficiency when designing multithreaded and multiprocess software programs. Multithreaded and multiprocess programs that do not correctly utilize synchronization primitives often require countless hours of debugging. The use of semaphores, mutex locks, and other synchronization primitives should be minimized to the maximum extent possible in computer programs that share resources between multiple threads or processes. Proper program design allows schedulable entities to run parallel or concurrently with high throughput and minimum resource contention. This is optimal for solving computationally complex problems on preemptive, time-sliced, multi-user operating systems without requiring hard, real-time scheduling.