Saturday, July 30, 2016

Concurrency, Parallelism, and Barrier Synchronization - Multiprocess and Multithreaded Programming

Concurrency, parallelism, threads, and processes are often misunderstood concepts.

On a preemptive, timed sliced UNIX or Linux operating system (Solaris, AIX, Linux, BSD, OS X), sequences of program code from different software applications are executed over time on a single processor.  A UNIX process is a schedulable entity.   On a UNIX system, program code from one process executes on the processor for a time quantum, after which, program code from another process executes for a time quantum.  The first process relinquishes the processor either voluntarily or involuntarily so that another process can execute its program code. This is known as context switching.  When a process context switch occurs, the state of a process is saved to its process control block and another process resumes execution on the processor.  Finally, A UNIX process is heavyweight because it has its own virtual memory space, file descriptors, register state, scheduling information, memory management information, etc.  When a process context switch occurs, this information has to be saved, and this is a computationally expensive operation.

Concurrency refers to the interleaved execution of schedulable entities on a single processor.  Context switching facilitates interleaved execution.  The execution time quantum is so small that the interleaved execution of independent, schedulable entities, often performing unrelated tasks, gives the appearance that multiple software applications are running in parallel.

Concurrency applies to both threads and processes.  A thread is also a schedulable entitity and is defined as an independent sequence of execution within a UNIX process. UNIX processes often have multiple threads of execution that share the memory space of the process.  When multiple threads of execution are running inside of a process, they are typically performing related tasks.

While threads are typically lighter weight than processes, there have been different implementations of both across UNIX and Linux operating systems over the years.  The three models that typically define the implementations across preemptive, time sliced, multi user UNIX and Linux operating systems are defined as follows: 1:1, 1:N, and M:N where 1:1 refers to the mapping of one user space thread to one kernel thread, 1:N refers to the mapping of multiple user space threads to a single kernel thread, and M:N refers to the mapping of N user space threads to M kernel threads.

In summary, both threads and processes are scheduled for execution on a single processor.  Thread context switching is lighter in weight than process context switching.  Both threads and processes are schedulable entities and concurrency is defined as the interleaved execution over time of schedulable entities on a single processor.

The Linux user space APIs for process and thread management are abstracted from alot of the details but you can set the level of concurrency and directly influence the time quantum so that system throughput is affected by shorter and longer durations of schedulable entity execution time.

Conversely, parallelism refers to the simultaneous execution of multiple schedulable entities over a time quanta.  Both processes and threads can execute in parallel across multiple cores or multiple processors.  On a multiuser system with preemptive time slicing and multiple processor cores, both concurrency and parallelism are often at play.  Affinity scheduling refers to the scheduling of both processes and threads across multiple cores so that their concurrent and often parallel execution is close to optimal.

Software applications are often designed to solve computationally complex problems.  If the algorithm to solve a computationally complex problem can be parallelized, then multiple threads or processes can all run at the same time across multiple cores.  Each process or thread executes by itself and does not contend for resources with other threads or processes that are working on the other parts of the problem to be solved. When each thread or process reaches the point where it can no longer contribute any more work to the solution of the problem, it waits at the barrier.  When all threads or processes reach the barrier, the output of their work is synchronized, and often aggregated by the master process.  Complex test frameworks often implement the barrier synchronization problem when certain types of tests can be run in parallel.

Most individual software applications running on preemptive, time sliced, multiuser Linux and UNIX operating systems are not designed with heavy, parallel thread or parallel, multi-process execution in mind.  Expensive, parallel algorithms often require multiple, dedicated processor cores with hard real time scheduling constrains.  The following paper describes the solution to a popular, parallel algorithm; flight scheduling.

Last, when designing multithreaded and multiprocess software programs, minimizing lock granularity greatly increases concurrency, throughput, and execution efficiency.  Multithreaded and multiprocess programs that do not utilize course-grained synchronization strategies do not run efficiently and often require countless hours of debugging.  The use of semaphores, mutex locks, and other synchronization primitives should be minimized to the maximum extent possible in computer programs that share resources between multiple threads or processes.  Proper program design allows for schedulable entities to run in parallel or concurrently with high throughput and minimum resource contention, and this is optimal for solving computationally complex problems on preemptive, time scliced, multi user operating systems without requiring hard real time scheduling.

After a fairly considerable amount of research in the above areas, I utilized the above design techniques for several successful, multi threaded and multi process software programs.

Thursday, January 29, 2015

Customizing a Linux distribution for an ARM® Cortex®-A9 based SBC

We will be pulling Yocto 1.7.1 (Dizzy branch) from Freescale source and building a BSP for the i.MX 6 RIoTboard. The final image will consist of the following components

  • U-Boot version 2014.10 from the Freescale git repositories.
  • Linux kernel version 3.17.4 from the Freescale git repositories.
  • ext3 root filesystem with selected packages

Camas Lilies at Sunrise

The image will be built from the custom distribution (bsecdist) and custom image (bsec-image) defined in the last post. bsec-image is derived from core-image-minimal. The configuration changes below will add support for package tests to the baec-image. In addition, the profiling tools and static development libraries and header files will be added to the image. Finally, several standard userspace packages will be added to baec-image; namely, bison, flex, and and gunning. Last, several configuration directives will be added to the local configuration file so that source code archives, package versions, and accompanying license files are stored and cached in a local directory for future builds and compliance purposes.

Monday, December 29, 2014

Creating a custom Linux distribution for an ARM® Cortex®-A9 based SBC

The Yocto project provides an ideal platform for building a custom Linux distribution.  It's design was intended to model a machine.  The Yocto project or machine should take a number of inputs and produce an output.  The inputs to the machine are the specifications for the Linux distribution.  The output of the machine is the Linux distribution.


The Yocto project is the most widely supported system for building custom Linux distributions.
The Yocto project is very well supported by both communities and companies.  The project consists of a tool called bitbake and a build system that is based off of OpenEmbedded.  Together, these two components along with a defined set of metadata comprise what is called the Poky reference platform.

Friday, December 12, 2014

ARM Powered® smartphones with NFC technology

Turing machines, first described by Alan Turing in (Turing 1937), are simple abstract computational devices intended to help investigate the extent and limitations of what can be computed.  - Stanford Encyclopedia of Philosophy
The head and the tape in a turing machine
There are a large number of Near field communication (NFC)-enabled phones (devices) on the consumer market. LG, Huawei, Motorola, Samsung, HTC, Nokia, ZTE, Sony, RIM, Amazon, and Apple manufacture and sell mobile phones with NFC technology.

Friday, November 21, 2014

C++ - Generative Programming

Nikon D80 - Alcatraz from Twin Peaks
San Francisco Bay Area from Twin Peaks - Nikon D80


C++ IOStreams are a powerful mechanism for transforming input into output.  Most programmers are at least familiar with C++ IOStreams in the context of reading and writing bytes to a terminal or file.

When a file or terminal is opened for reading or writing by a process, the operating system returns a numerical identifier to the process.  This numerical identifier is known as a file descriptor.
In turn, the file or terminal can be written to by the process via this file descriptor.  The read and write system calls, which are implemented as wrappers in libc, are passed this numerical file descriptor.

Many layers of abstraction reside on top of the read and write system calls.  These layers of abstraction are implemented in both C and C++. Examples of C based layers of abstraction are fprintf and printf. Internally, these functions call the write system call.   An example of a C++ based layer of abstraction is the IOStreams hierarchy.  Out of the box, most C++ compiler toolchains provide an implementation of IOStreams.  IOStreams are an abstraction on top of the read and write system calls. When data is written to a terminal via an IOStream, the IOStream implementation calls the write system call.  Lastly, these layers of abstraction handle things such as buffering and file synchronization.

In UNIX, everything is a file. Consequently, network devices, virtual terminals, files, block devices, etc. can all be written to via a numerical file descriptor - this in turn is why UNIX is referred to as having a uniform descriptor space. With this being said, the basic IOStreams and printf abstractions I mentioned above are not designed to used with network sockets, pipes, and the sort.  The lower layer read and write system calls can be used but there are a number of functions that must be called before writing raw bytes to an open file descriptor that points to a network socket.

The additional functionality that is needed for communicating with network sockets, shared memory, and the like, can be implemented in classes that are derived from the C++ iostream class.  It is for this reason that the IOStreams classes are extended via inheritance.

Over the years, several popular C++ libraries have implemented classes that are derived from the base classes in the iostreams hierarchy.  The C++ Boost library is a popular example.  However; this has not always been the case.  Going back to 1999, the Boost library did not exist and there were one or two examples on the entire Internet as to how to properly extend the C++ IOStreams classes.  

In 1999, I pulled the source code for the GNU compiler toolchain that is available on gcc.gnu.org and derived a class hierarchy to support sockets, pipes, and shared memory. The methods in the classes that I derived from the base classes in the iostreams library were designed to be reentrant and easy to use.  I used generative programming techniques and template metaprogramming to create objects that could be instantiated using familiar C++ iostreams syntax and semantics. The library that I created is called mls and it is licensed under version 2 of the GPL.  MLS is available on github.

Since 1999, Boost has come a long way.  It provides support for cryptographic IOStreams, sockets, and all kinds of other fancy stuff. It uses generative programming techniques. It is very clean and I highly recommend it.

If you would prefer to roll your own, then I would suggest downloading the gnu compiler toolchain source code from gcc.gnu.org.  From there, you can run ctags over the source tree and begin to dig into the internals of the iostreams hierarchy. I would also recommend the following book 
Generative Programming - Methods, Tools, and Applications.  Last but not least, you'll need a Linux host with a reasonable distribution running on it, such as Fedora.

namespace mls 
{
  template<class BufType, int direction, class BaseType=mlbuf> class mlstreamimpl;
  template<class Parent, class BaseType=mlbuf> class mloutputimpl;
  template<class Parent, class BaseType=mlbuf> class mlinputimpl;
  template<class BufType, int direction, class BaseType=BufType>
  struct StreamConfig;
  template<class BufType, int direction, class BaseType>
  struct StreamConfig
  {
    typedef typename SWITCH<(direction),
    CASE<0,mlinputimpl<mlstreamimpl<BufType, direction, BaseType>, BufType>,
    CASE<1,mloutputimpl<mlstreamimpl<BufType, direction, BaseType>, BufType>,
    CASE<10,mlinputimpl<mloutputimpl<mlstreamimpl<BufType, direction, BaseType>, 
         BufType>, BufType >,
    CASE<DEFAULT,mlinputimpl<mlstreamimpl<BufType, 10, BaseType>, 
         BufType > > > > > >::RET Base;
  };
}

Monday, October 24, 2011

Android Command Line Dev with VI

Notes on developing Android apps from *NIX command line.

If you are a Vi user, then building your Android application from the command line can save you time.  Here are my notes on setting up Vim w/ tags and code completion for Android development.  I've also included the relevant Ant commands for building Android apps from the command line.  The example includes the commands for building and installing an Android app that links to a dependent java library which resides outside of the project source tree (in this case, the lvl lib), along with a C shared library that resides in the local jni/ directory.

Useful Vim Plugins for Android Development
Setting up Vim JDE (vjde) requires a few configuration changes in order to work well with Android projects.  First, you will need to download vjde.tgz version 2.6.18 from http://www.vim.org/scripts/download_script.phpsrc_id=16253

Place vjde.tgz in $HOME/.vim and tar -zxvf vjde.tgz from within $HOME/.vim.  Change the permissions on $HOME/.vim/plugin/vjde/readtags as follows:
$ chmod +x $HOME/.vim/plugin/vjde/readtags
Fire up an empty editor:  $ vim and enter the following in command mode:
:helptags $HOME/.vim/doc
:h vjde 
will then pull up the help page.

That should take care of setting up vjde.  Now cd to your Android project dir.  Open a blank editor and input the following in command mode:
:Vjdeas .myproject.prj
:let g:vjde_lib_path='/<path_to_android_sdk_top_level_dir>/platforms/ \
<desired_sdk_target>/android.jar:bin/classes:build.classes'
:Vjdesave
:q!
Next, Open up a source file in your project and type :Vjdeload .myproject.prj in command mode (or script and/or add to .vimrc).  You can then use <ctrl-x><ctrl-u> for code completion. For example: import android.<ctrl-x><ctrl-u> and you will get a nice little dialog box for browsing the matching frameworks.

Next, run ctags over your java and native sources as follows:
$ ctags -R src gen jni
Once NERD tree and Taglist are placed in ~/.vim/plugin/, the following lines in your .vimrc will allow you to use <ctrl-n> and <ctrl-m> to toggle the file explorer and visual tag list.
nmap <silent> <c-n> :NERDTreeToggle<CR>
nnoremap <silent> <c-m> :TlistToggle<CR>
Also, if you need a status line:
set statusline=\ %{HasPaste()}%F%m%r%h\ %w\ \ CWD:\ %r%{CurDir()}%h\ \ \ Line:\ %l/%L:%c
function! CurDir()
let curdir = substitute(getcwd(), '/Users/myhomedir/', "~/", "g")
return curdir
endfunction

function! HasPaste()
if &paste
return 'PASTE MODE  '
else
return "
endif
endfunction
Vim should be good to go at this point. cd back to $HOME/src/myproject.  This particular example accounts for a dependent Java library (the lvl) that resides outside of the project source tree, a shared library (which consists of a few C files natively compiled), and plain java source files in the appropriate src/com/ package subdir.

From within your top level project dir (assuming that you came from Eclipse, otherwise, you can use android create ...),
$ android update project --name myproject --target <desired_sdk_target> \
  --path $HOME/src/myproject
$ android update project --target <desired_sdk_target> --path $HOME/src/myproject \
  --library ../lvl_lib_dir
Make sure to check project.properties to ensure that the android.library.reference.1 variable now contains the relative pathname of the lvl lib directory.

Assuming that jni/Android.mk and jni/Application.mk are appropriately setup for your shared library, run ndk-build from the top level project directory.
ant debug should now handle the build and debug version of the application package file.

Start up an Emulator and then install your app with a
db -r install bin/myproject-debug.apk or use ant install.
Next, open the Dev tools application in the emulator and configure the following: set wait for debugger and select your application for debugging.

Next, run ddms & and check the debug port. It should be 8700.
Subsequently, start your activity with
adb shell 'am start -n com.mycohname.myproject/.BaseActivityName'
And finally, connect via jdb from the shell with
$ jdb -sourcepath $HOME/src/myproject -attach localhost:8700 
and start your debugging.

Tuesday, August 16, 2011

OpenSSH Security - Client Configuration


OpenSSH provides a suite of tools for encrypting traffic between endpoints, port forwarding, IP tunneling, and authentication. The below instructions outline a client side OpenSSH configuration where the client is running on OS X. The built in firewall, ipfw, is enabled on the client to restrict outbound and inbound traffic. Part II (currently on hold) of this guide will cover the configuration of OpenSSH on the server along with the available options and alternatives for authentication, authorization, and traffic encryption. The configuration will force AES 256 in Counter Mode and will restrict the available Message Authentication Algorithms that may be used between endpoints. Most of the options in the ssh configuration file on the server will be disabled, public key authentication will be used, password authentication will be disabled, and the ssh daemon will bind to a high number port. Multiple SSH sessions will use the same connection via the ControlMaster and ControlPath client configuration directive. Also, a server certificate will be generated and used to sign user public keys. The CA signed user public keys constitute a user certificate which the server will in turn use for client authentication. PF will be used on the server for stateful packet filtering, connection blocking, and connection throttling. The below configuration will also detail

First and foremost, the client has ipfw enabled and the firewall ruleset is configured in /etc/ipfw.conf. ipfw has been configured to block all inbound traffic and block all outbound traffic except for the ports and IP addresses that are necessary for connecting to the OpenSSH server. The server is running FreeBSD 8.2.

FreeBSD 8.2 - sshd on a.b.c.d:21465 pf |  <--------Internet---------->  | ipfw  OS X Lion - ssh client
To start with, you will need to install coreutils and apg on the client. coreutils and apg can be obtained from Mac ports and can be installed as follows:

client: $ sudo port install coreutils 
client: $ sudo port install apg 

Before generating your public/private keypair, You will need to generate a strong passphrase for your private key. It is important to store this passphrase in a secure location, not on your computer.

client: $ openssl rand -base64 1000 | shasum-5.12 -a 512 | apg -M SNCL -a 1 -m 20 -x 20

Depending on your version of OpenSSH (should be using latest stable for your OS), ECDSA may be used in addition to DSA and RSA. Certificates may also be used for user and host authentication. See the ssh-keygen man page for details.
You can generate your keypair using the following command. When prompted for the passphrase, use the output from the above command.

client: $ ssh-keygen -b 4096 -t rsa -C"$(id -un)@$hostname)-$(gdate --rfc-3339=date)"

Here is an example of how to use ssh-keygen to generate a public/private keypair using the Eliptic Curve Digital Signature Algorithm. Both the client and server must be running a version of OpenSSH >= 5.7.

client: $ ssh-keygen -b 521 -t ecdsa -C"$(id -un)@$hostname)-$(gdate --rfc-3339=date)"

Now, we need to push the public key to the server and place it in the authorized_keys file of the user that we are going to log in as over ssh.
The ssh-copy-id command can be used to automate this process. On the OS X client, the ssh-copy-id command does not come preinstalled with SSH. The ssh-copy-id command can be obtained from http://www.freebsd.org/cgi/cvsweb.cgi/~checkout~/ports/security/ssh-copy-id/files/ssh-copy-id?rev=1.1;content-type=text%2Fplain. After downloading the script, change its permissions and place it in your path.
At this point, you should already have a server that is running OpenSSH on port 22 with the default configuration. Thus, you can transfer your public key with the following command:

client: $ ssh-copy-id -i ~/.ssh/id_xxxyy.pub bryan@a.b.c.d \

It is time to setup connection sharing. Create the following file if it does not currently exist.

client: $ ls -l ~/.ssh/config -rw------- 1 bryan scclp 104 Aug 13 10:55 config

The file should contain these lines.

ServerAliveInterval 60 Host a.b.c.d ControlMaster auto ControlPath ~/.ssh/sockets/%r@%h:%p

The goal is to only allow connections to the server in AES 256 Counter mode, with umac-64 or hmac-ripemd160 MACs, and compression, on a non-standard SSH port from a designated IP range using public key authentication. Connections will also be throttled and SSHGuard along with a few custom PF rules on the server will be used to block and log attackers. The commands that the client will use to connect to the server will look like this:client: 

$ alias sshconnect="ssh -l bryan a.b.c.d -p 21465 -C -c aes256-ctr -m umac-64@openssh.com,hmac-ripemd160 client: 
$ alias sshtunnel="ssh -v -ND 8090 bryan@a.b.c.d -p 21465 -C -c aes256-ctr -m umac-64@openssh.com,hmac-ripemd160 client:
$ alias sshmonitor="yes | pv | ssh -l bryan a.b.c.d -p 21465 -C -c aes256-ctr -m umac-64@openssh.com,hmac-ripemd160 \"cat > /dev/null\"" client: 
$ alias sshportforward="ssh -f bryan@a.b.c.d -p 21465 -C -c aes256-ctr -m umac-64@openssh.com,hmac-ripemd160 -L 15478:localhost:15479 -N" client: 
$ alias sshportforward2="ssh -f bryan@a.b.c.d -p 21465 -C -c aes256-ctr -m umac-64@openssh.com,hmac-ripemd160 -L 17293:localhost:17294 -N"

Alternatively, Ciphers, MACs, and compression can be specified in the user config file as follows:

ServerAliveInterval 60 
Host host.name.com 
    ControlMaster auto 
    ControlPath ~/.ssh/sockets/%r@%h:%p 
    Port 21465 
    User bryan 
    Ciphers aes256-ctr 
    Compression yes 
    MACs umac-64@openssh.com,hmac-ripemd160 
    StrictHostKeyChecking yes


User and Host certificates provide a more convenient method of authentication for multiple clients (users) and servers (hosts). Certificate revocation can also provide an easier method of quickly invalidating user access.A certificate authority key pair is first generated as follows. The ca is then placed in the /etc/ssh directory on the host.

ca $ ssh-keygen -t ecdsa -b 521 -f user_ca server $ sudo mv user_ca* /etc/ssh/

On the client, generate a public/private key pair and then copy the public key to the server so that it can be signed with the ca. Make sure to set the validity period of the certificate. Alternatively, a host key may be signed with a ca key that is stored in a PKCS11 token. OpenSSH supports ca keys stored PCKS11 tokens. Check your version of SSH and see ssh-keygen for more information.client 

client $ ssh-keygen -t ends -b 521 -f ~/.ssh/id_ecdsa 
client $ scp .ssh/id_ecdsa.pub bryan@server-ca:~/user_public_keys 
server-ca $ ssh-keygen -s /etc/ssh/user_ca \ 
      -O source-address=clientip 
      -O permit-pty 
      -O no-port-forwarding       
      -O no-user-rc 
      -O no-x11-forwarding \ -V -1d:+52w1d -z 6739301351 -I "bryan"       -n bryan,clienthostname id_ecdsa.pub
id "bryan" serial 6739301351 for bryan,clienthostname valid from 2011-08-18T15:05:24 to 2012-08-17T15:05:24 

Copy the signed user cert back to the client.

client $ scp bryan@server:~/user_public_keys/id_ecdsa-cert.pub ~/.ssh/ 

Setup TrustedUserCAKeys and AuthorizedPrincipalsFile files. Subsequently, set appropriate options in /etc/ssh/sshd_config on the server.

server-ca $ sudo cat /etc/ssh/user_ca.pub > /etc/ssh/trusted_user_ca_keys 

Modify /etc/ssh/authorized_principals to include the following lines.bryan from="clientip" bryan
Modify /etc/ssh/sshd_config on the server to include the following lines

TrustedUserCAKeys /etc/ssh/trusted_user_ca_keys 
AuthorizedPrincipalsFile /etc/ssh/authorized_principals 

Now, restart sshd on the server and add an appropriate host configuration for certificate authentication to ~/.ssh/config on the client.

Last of all, if you want to setup a host certificate, you will need to use the -h option with ssh-keygen when signing a host key.

It is important to always keep OpenSSH updated with the latest, stable version that has been released for your operating system.

Resources

OpenSSH, http://openssh.org

fwknop, http://cipherdyne.org/fwknop/

OpenBSD, http://openbsd.org

FreeBSD, http://freebsd.org