July 30, 2016

Concurrency, Parallelism, and Barrier Synchronization - Multiprocess and Multithreaded Programming

 

Concurrency, parallelism, threads, and processes are often misunderstood concepts.

On a preemptive, timed sliced UNIX or Linux operating system (Solaris, AIX, Linux, BSD, OS X), program code from one process executes on the processor for a time slice or quantum, after which, program code from another process executes for a time quantum. The first process relinquishes the processor either voluntarily or involuntarily so that another process can execute its program code. This is known as context switching. Context switching facilitates interleaved execution. When a process context switch occurs, the state of a process is saved to its process control block and another process resumes execution on the processor. A UNIX process is heavyweight because it has its own virtual memory space, file descriptors, register state, scheduling information, memory management information, etc. When a process context switch occurs, this information must be saved, and this is a computationally expensive operation.

Concurrency refers to the interleaved execution of schedulable entities across one or more processor cores. The execution quantum is so small that the interleaved execution of independent, schedulable entities, often performing unrelated tasks, gives the appearance that multiple software applications are running in parallel.

Concurrency applies to both threads and processes. A thread is also a schedulable entity and is defined as an independent sequence of execution within a UNIX process. UNIX processes often have multiple threads of execution that share the memory space of the process. When multiple threads of execution are running inside of a process, they are typically performing related tasks.

While threads are typically lighter weight than processes, there have been different implementations of both across UNIX and Linux operating systems over the years. The three models that typically define the implementations across preemptive, time sliced, multi-user UNIX and Linux operating systems are defined as follows: 1:1, 1:N, and M:N where 1:1 refers to the mapping of one user space thread to one kernel thread, 1:N refers to the mapping of multiple user space threads to a single kernel thread, and M:N refers to the mapping of N user space threads to M kernel threads.

In summary, both threads and processes are scheduled for execution on a processor core. Thread context switching is lighter in weight than process context switching. Both threads and processes are schedulable entities and concurrency is defined as the interleaved execution over time of schedulable entities across one or more processor cores.

The Linux user space APIs for process and thread management abstract a lot of the details but you can set the level of concurrency and directly influence the time quantum so that system throughput is affected by shorter and longer durations of schedulable entity execution time.

Conversely, parallelism on a time sliced, preemptive operating system refers to the simultaneous execution of multiple schedulable entities over a time quantum. Both processes and threads can execute in parallel across multiple cores or multiple processors. On a multi-user system with preemptive time slicing and multiple processor cores, both concurrency and parallelism are often at play. Affinity scheduling refers to the scheduling of both processes and threads across multiple cores so that their concurrent and parallel execution is close to optimal.

Software applications are often designed to solve computationally complex problems. If the algorithm to solve a computationally complex problem can be parallelized, then multiple threads or processes can all run at the same time across multiple cores. Each process or thread executes by itself and does not contend for resources with other threads or processes that are working on the other parts of the problem to be solved. When each thread or process reaches the point where it can no longer contribute any more work to the solution of the problem, it waits at the barrier, if a barrier has been implemented in software. When all threads or processes reach the barrier, the output of their work is synchronized, and often aggregated by the primary process. Complex test frameworks often implement the barrier synchronization problem when certain types of tests can be run in parallel.

Most individual software applications running on preemptive, time sliced, multi-user Linux and UNIX operating systems are not designed with heavy, parallel thread or parallel, multiprocess execution in mind.

Last, when designing multithreaded and multiprocess software programs, minimizing lock granularity increases concurrency, throughput, and execution efficiency. Multithreaded and multiprocess programs that do not properly utilize synchronization primitives often require countless hours of debugging. The use of semaphores, mutex locks, and other synchronization primitives should be minimized to the maximum extent possible in computer programs that share resources between multiple threads or processes. Proper program design allows for schedulable entities to run in parallel or concurrently with high throughput and minimum resource contention, and this is optimal for solving computationally complex problems on preemptive, time sliced, multi-user operating systems without requiring hard real time scheduling.