Saturday, July 30, 2016

Concurrency, Parallelism, and Barrier Synchronization - Multiprocess and Multithreaded Programming

Concurrency, parallelism, threads, and processes are often misunderstood concepts.

On a preemptive, timed sliced UNIX or Linux operating system (Solaris, AIX, Linux, BSD, OS X), sequences of program code from different software applications are executed over time on a single processor.  A UNIX process is a schedulable entity.   On a UNIX system, program code from one process executes on the processor for a time quantum, after which, program code from another process executes for a time quantum.  The first process relinquishes the processor either voluntarily or involuntarily so that another process can execute its program code. This is known as context switching.  When a process context switch occurs, the state of a process is saved to its process control block and another process resumes execution on the processor.  Finally, A UNIX process is heavyweight because it has its own virtual memory space, file descriptors, register state, scheduling information, memory management information, etc.  When a process context switch occurs, this information has to be saved, and this is a computationally expensive operation.

Concurrency refers to the interleaved execution of schedulable entities on a single processor.  Context switching facilitates interleaved execution.  The execution time quantum is so small that the interleaved execution of independent, schedulable entities, often performing unrelated tasks, gives the appearance that multiple software applications are running in parallel.

Concurrency applies to both threads and processes.  A thread is also a schedulable entitity and is defined as an independent sequence of execution within a UNIX process. UNIX processes often have multiple threads of execution that share the memory space of the process.  When multiple threads of execution are running inside of a process, they are typically performing related tasks.

While threads are typically lighter weight than processes, there have been different implementations of both across UNIX and Linux operating systems over the years.  The three models that typically define the implementations across preemptive, time sliced, multi user UNIX and Linux operating systems are defined as follows: 1:1, 1:N, and M:N where 1:1 refers to the mapping of one user space thread to one kernel thread, 1:N refers to the mapping of multiple user space threads to a single kernel thread, and M:N refers to the mapping of N user space threads to M kernel threads.

In summary, both threads and processes are scheduled for execution on a single processor.  Thread context switching is lighter in weight than process context switching.  Both threads and processes are schedulable entities and concurrency is defined as the interleaved execution over time of schedulable entities on a single processor.

The Linux user space APIs for process and thread management are abstracted from alot of the details but you can set the level of concurrency and directly influence the time quantum so that system throughput is affected by shorter and longer durations of schedulable entity execution time.

Conversely, parallelism refers to the simultaneous execution of multiple schedulable entities over a time quanta.  Both processes and threads can execute in parallel across multiple cores or multiple processors.  On a multiuser system with preemptive time slicing and multiple processor cores, both concurrency and parallelism are often at play.  Affinity scheduling refers to the scheduling of both processes and threads across multiple cores so that their concurrent and often parallel execution is close to optimal.

Software applications are often designed to solve computationally complex problems.  If the algorithm to solve a computationally complex problem can be parallelized, then multiple threads or processes can all run at the same time across multiple cores.  Each process or thread executes by itself and does not contend for resources with other threads or processes that are working on the other parts of the problem to be solved. When each thread or process reaches the point where it can no longer contribute any more work to the solution of the problem, it waits at the barrier.  When all threads or processes reach the barrier, the output of their work is synchronized, and often aggregated by the master process.  Complex test frameworks often implement the barrier synchronization problem when certain types of tests can be run in parallel.

Most individual software applications running on preemptive, time sliced, multiuser Linux and UNIX operating systems are not designed with heavy, parallel thread or parallel, multi-process execution in mind.  Expensive, parallel algorithms often require multiple, dedicated processor cores with hard real time scheduling constrains.  The following paper describes the solution to a popular, parallel algorithm; flight scheduling.

Last, when designing multithreaded and multiprocess software programs, minimizing lock granularity greatly increases concurrency, throughput, and execution efficiency.  Multithreaded and multiprocess programs that do not utilize course-grained synchronization strategies do not run efficiently and often require countless hours of debugging.  The use of semaphores, mutex locks, and other synchronization primitives should be minimized to the maximum extent possible in computer programs that share resources between multiple threads or processes.  Proper program design allows for schedulable entities to run in parallel or concurrently with high throughput and minimum resource contention, and this is optimal for solving computationally complex problems on preemptive, time scliced, multi user operating systems without requiring hard real time scheduling.

After a fairly considerable amount of research in the above areas, I utilized the above design techniques for several successful, multi threaded and multi process software programs.

Wednesday, April 15, 2015

Vim for assembly, programming, and system admin

Computer pioneer, Bill Joy, created the Vi text editor.  Vi has made its way onto nearly every UNIX and Linux computer and is used by kernel developers, system administrators, programmers, and users.  The learning curve is steep; however, the ability to run circles around 95% of UNIX programmers, administrators, and the like can easily be achieved.  One hour per day for five to six years digging through kernel source code with ctags will allow you to become proficient with the editor. If you are already a C programmer and can work from the terminal quickly, then picking up Vi should be easy for you.  My notes below describe how to setup VIM, a fork of Vi that includes features such as color syntax highlighting.

Thanks to this guy for creating an awesome Vi cheat sheet for programmers. He has also created a Vi emulator Plugin for Microsoft Word.

 Vim is especially useful for reading assembly and bootloader code.when a VGA connection is not available.
! Spin Lock - Solaris 2.6 C4.2
.seg "text"
.global set_byte ! make the name visible outside the .o file
.global clear_byte !
.global spin_lock !
ldstub [%o0],%o0 ! delay slot for retl
set 0x0,%o1
swap [%o0],%o1
nop ! delay slot for retl
ldstub [%o0],%o1
tst %o1
bne busy_loop
nop ! delay slot for branch
nop ! delay slot for branch

For the non-programmer, having Vi handy on a terminal means easily modifying any readable file on a UNIX system from the terminal - including log files and tcpdump log file snippets.  Quickly setting up snort config files, copying public and private keys between files on servers, configuring build systems, and modifying /etc/hosts and resolv.conf can easily be done with Vim. 

Running make tags from the top level Linux kernel source tree will build the ctags file over the Linux kernel source. Alternatively; man ctags will show you how to recursively run ctags over your source code.
Nerd Tree and Taglist are two useful plugins that can be downloaded from  
Once NERD tree and Taglist are placed in ~/.vim/plugin/, the following lines in your .vimrc will allow you to use 

<ctrl-n> and <ctrl-m> to toggle the file explorer and visual tag list.
nmap <silent> <c-n> :NERDTreeToggle<CR>
nnoremap <silent> <c-m> :TlistToggle<CR>

Also, if you need a status line:
set statusline=\ %{HasPaste()}%F%m%r%h\ %w\ \ CWD:\ %r%{CurDir()}%h\ \ \ Line:\ %l/%L:%c
function! CurDir()
let curdir = substitute(getcwd(), '/Users/myhomedir/', "~/", "g")
return curdir

function! HasPaste()
if &paste
return 'PASTE MODE  '
return "

Vim should be good to go at this point. cd back into your source code directory and begin work.  Finally, man vim will tell you more about how to use the editor.

Enter g?g? in command mode on the current line of text.and Vim will perform a rot13 encryption of the text.

And here's that rot13 encryption algorithm in sparc assembler (courtesy of
.section ".text"
.align 4
.global main
.type main,#function
.proc 020
save %sp, -112, %sp ! save the stack!
mov 0, %o0 ! stdin
sub %fp, 1, %o1 ! 1 byte below frame pointer
mov 3, %g1
!call read
mov 1, %o2 ! 1 byte
ldub [%fp-1], %l1 ! pull the byte into %l1
cmp %o0, 0
be done ! byte was EOF, jump to done
and %l1, 32, %l2
xor %l2, 0xff, %l3 ! invert %l2, store as a temp
and %l1, %l3, %l1
cmp %l1, 0x41
bl skip ! note lack of trailing nop.
cmp %l1, 0x5A ! the instructions trailing
bg skip ! these branches affect nothing
mov 26, %o1 ! if the branch isn't taken.
sub %l1, 0x41, %l1 ! add 'A'
add %l1, 13, %l1
call .rem ! (modulus) call is unconditional
mov %l1, %o0 ! so final arg can be set afterwards
add %o0, 0x41, %l1
skip: or %l1, %l2, %l1 stb %l1, [%fp-1] ! return the byte to memory
mov 1, %o0 ! setup syscal args
sub %fp, 1, %o1
mov 4, %g4
! call write
mov 1, %o2
ba readbyte ! return to beginning
mov 0, %o0 ! stdin (see beginning)
done: ret ! return
restore ! fix stack before return completes

In conjunction with Vi, od and/or hexdump (if installed) can be used for examining binaries on different flavors of UNIX.

Thursday, January 29, 2015

Customizing a Linux distribution for an ARM® Cortex®-A9 based SBC

We will be pulling Yocto 1.7.1 (Dizzy branch) from Freescale source and building a BSP for the i.MX 6 RIoTboard. The final image will consist of the following components

  • U-Boot version 2014.10 from the Freescale git repositories.
  • Linux kernel version 3.17.4 from the Freescale git repositories.
  • ext3 root filesystem with selected packages

Camas Lilies at Sunrise

The image will be built from the custom distribution (bsecdist) and custom image (bsec-image) defined in the last post. bsec-image is derived from core-image-minimal. The configuration changes below will add support for package tests to the baec-image. In addition, the profiling tools and static development libraries and header files will be added to the image. Finally, several standard userspace packages will be added to baec-image; namely, bison, flex, and and gunning. Last, several configuration directives will be added to the local configuration file so that source code archives, package versions, and accompanying license files are stored and cached in a local directory for future builds and compliance purposes.

Monday, December 29, 2014

Creating a custom Linux distribution for an ARM® Cortex®-A9 based SBC

The Yocto project provides an ideal platform for building a custom Linux distribution.  It's design was intended to model a machine.  The Yocto project or machine should take a number of inputs and produce an output.  The inputs to the machine are the specifications for the Linux distribution.  The output of the machine is the Linux distribution.

The Yocto project is the most widely supported system for building custom Linux distributions.
The Yocto project is very well supported by both communities and companies.  The project consists of a tool called bitbake and a build system that is based off of OpenEmbedded.  Together, these two components along with a defined set of metadata comprise what is called the Poky reference platform.

Friday, December 12, 2014

ARM Powered® smartphones with NFC technology

Turing machines, first described by Alan Turing in (Turing 1937), are simple abstract computational devices intended to help investigate the extent and limitations of what can be computed.  - Stanford Encyclopedia of Philosophy
The head and the tape in a turing machine
There are a large number of Near field communication (NFC)-enabled phones (devices) on the consumer market. LG, Huawei, Motorola, Samsung, HTC, Nokia, ZTE, Sony, RIM, Amazon, and Apple manufacture and sell mobile phones with NFC technology.

Friday, November 21, 2014

C++ - Generative Programming

Nikon D80 - Alcatraz from Twin Peaks
San Francisco Bay Area from Twin Peaks - Nikon D80

C++ IOStreams are a powerful mechanism for transforming input into output.  Most programmers are at least familiar with C++ IOStreams in the context of reading and writing bytes to a terminal or file.

When a file or terminal is opened for reading or writing by a process, the operating system returns a numerical identifier to the process.  This numerical identifier is known as a file descriptor.
In turn, the file or terminal can be written to by the process via this file descriptor.  The read and write system calls, which are implemented as wrappers in libc, are passed this numerical file descriptor.

Many layers of abstraction reside on top of the read and write system calls.  These layers of abstraction are implemented in both C and C++. Examples of C based layers of abstraction are fprintf and printf. Internally, these functions call the write system call.   An example of a C++ based layer of abstraction is the IOStreams hierarchy.  Out of the box, most C++ compiler toolchains provide an implementation of IOStreams.  IOStreams are an abstraction on top of the read and write system calls. When data is written to a terminal via an IOStream, the IOStream implementation calls the write system call.  Lastly, these layers of abstraction handle things such as buffering and file synchronization.

In UNIX, everything is a file. Consequently, network devices, virtual terminals, files, block devices, etc. can all be written to via a numerical file descriptor - this in turn is why UNIX is referred to as having a uniform descriptor space. With this being said, the basic IOStreams and printf abstractions I mentioned above are not designed to used with network sockets, pipes, and the sort.  The lower layer read and write system calls can be used but there are a number of functions that must be called before writing raw bytes to an open file descriptor that points to a network socket.

The additional functionality that is needed for communicating with network sockets, shared memory, and the like, can be implemented in classes that are derived from the C++ iostream class.  It is for this reason that the IOStreams classes are extended via inheritance.

Over the years, several popular C++ libraries have implemented classes that are derived from the base classes in the iostreams hierarchy.  The C++ Boost library is a popular example.  However; this has not always been the case.  Going back to 1999, the Boost library did not exist and there were one or two examples on the entire Internet as to how to properly extend the C++ IOStreams classes.  

In 1999, I pulled the source code for the GNU compiler toolchain that is available on and derived a class hierarchy to support sockets, pipes, and shared memory. The methods in the classes that I derived from the base classes in the iostreams library were designed to be reentrant and easy to use.  I used generative programming techniques and template metaprogramming to create objects that could be instantiated using familiar C++ iostreams syntax and semantics. The library that I created is called mls and it is licensed under version 2 of the GPL.  MLS is available on github.

Since 1999, Boost has come a long way.  It provides support for cryptographic IOStreams, sockets, and all kinds of other fancy stuff. It uses generative programming techniques. It is very clean and I highly recommend it.

If you would prefer to roll your own, then I would suggest downloading the gnu compiler toolchain source code from  From there, you can run ctags over the source tree and begin to dig into the internals of the iostreams hierarchy. I would also recommend the following book 
Generative Programming - Methods, Tools, and Applications.  Last but not least, you'll need a Linux host with a reasonable distribution running on it, such as Fedora.

namespace mls 
  template<class BufType, int direction, class BaseType=mlbuf> class mlstreamimpl;
  template<class Parent, class BaseType=mlbuf> class mloutputimpl;
  template<class Parent, class BaseType=mlbuf> class mlinputimpl;
  template<class BufType, int direction, class BaseType=BufType>
  struct StreamConfig;
  template<class BufType, int direction, class BaseType>
  struct StreamConfig
    typedef typename SWITCH<(direction),
    CASE<0,mlinputimpl<mlstreamimpl<BufType, direction, BaseType>, BufType>,
    CASE<1,mloutputimpl<mlstreamimpl<BufType, direction, BaseType>, BufType>,
    CASE<10,mlinputimpl<mloutputimpl<mlstreamimpl<BufType, direction, BaseType>, 
         BufType>, BufType >,
    CASE<DEFAULT,mlinputimpl<mlstreamimpl<BufType, 10, BaseType>, 
         BufType > > > > > >::RET Base;

Monday, October 24, 2011

Android Command Line Dev with VI

Notes on developing Android apps from *NIX command line.

If you are a Vi user, then building your Android application from the command line can save you time.  Here are my notes on setting up Vim w/ tags and code completion for Android development.  I've also included the relevant Ant commands for building Android apps from the command line.  The example includes the commands for building and installing an Android app that links to a dependent java library which resides outside of the project source tree (in this case, the lvl lib), along with a C shared library that resides in the local jni/ directory.

Useful Vim Plugins for Android Development
Setting up Vim JDE (vjde) requires a few configuration changes in order to work well with Android projects.  First, you will need to download vjde.tgz version 2.6.18 from

Place vjde.tgz in $HOME/.vim and tar -zxvf vjde.tgz from within $HOME/.vim.  Change the permissions on $HOME/.vim/plugin/vjde/readtags as follows:
$ chmod +x $HOME/.vim/plugin/vjde/readtags
Fire up an empty editor:  $ vim and enter the following in command mode:
:helptags $HOME/.vim/doc
:h vjde 
will then pull up the help page.

That should take care of setting up vjde.  Now cd to your Android project dir.  Open a blank editor and input the following in command mode:
:Vjdeas .myproject.prj
:let g:vjde_lib_path='/<path_to_android_sdk_top_level_dir>/platforms/ \
Next, Open up a source file in your project and type :Vjdeload .myproject.prj in command mode (or script and/or add to .vimrc).  You can then use <ctrl-x><ctrl-u> for code completion. For example: import android.<ctrl-x><ctrl-u> and you will get a nice little dialog box for browsing the matching frameworks.

Next, run ctags over your java and native sources as follows:
$ ctags -R src gen jni
Once NERD tree and Taglist are placed in ~/.vim/plugin/, the following lines in your .vimrc will allow you to use <ctrl-n> and <ctrl-m> to toggle the file explorer and visual tag list.
nmap <silent> <c-n> :NERDTreeToggle<CR>
nnoremap <silent> <c-m> :TlistToggle<CR>
Also, if you need a status line:
set statusline=\ %{HasPaste()}%F%m%r%h\ %w\ \ CWD:\ %r%{CurDir()}%h\ \ \ Line:\ %l/%L:%c
function! CurDir()
let curdir = substitute(getcwd(), '/Users/myhomedir/', "~/", "g")
return curdir

function! HasPaste()
if &paste
return 'PASTE MODE  '
return "
Vim should be good to go at this point. cd back to $HOME/src/myproject.  This particular example accounts for a dependent Java library (the lvl) that resides outside of the project source tree, a shared library (which consists of a few C files natively compiled), and plain java source files in the appropriate src/com/ package subdir.

From within your top level project dir (assuming that you came from Eclipse, otherwise, you can use android create ...),
$ android update project --name myproject --target <desired_sdk_target> \
  --path $HOME/src/myproject
$ android update project --target <desired_sdk_target> --path $HOME/src/myproject \
  --library ../lvl_lib_dir
Make sure to check to ensure that the android.library.reference.1 variable now contains the relative pathname of the lvl lib directory.

Assuming that jni/ and jni/ are appropriately setup for your shared library, run ndk-build from the top level project directory.
ant debug should now handle the build and debug version of the application package file.

Start up an Emulator and then install your app with a
db -r install bin/myproject-debug.apk or use ant install.
Next, open the Dev tools application in the emulator and configure the following: set wait for debugger and select your application for debugging.

Next, run ddms & and check the debug port. It should be 8700.
Subsequently, start your activity with
adb shell 'am start -n com.mycohname.myproject/.BaseActivityName'
And finally, connect via jdb from the shell with
$ jdb -sourcepath $HOME/src/myproject -attach localhost:8700 
and start your debugging.