Friday, February 23, 2024

Choosing the Right Linux distribution

 There are numerous Linux distributions and for many people, it is hard to know which one to choose. A few of the choices are Ubuntu, CentOS, Debian, Kali, Gentoo, Alpine, Fedora, Slackware, Suse, Arch, and so on.  In today's world, everyone seems to value data privacy.  The tasks you perform on your desktop are your personal business or in the case of corporations, their personal business.  This is why it is important to choose a Linux distribution that is both minimal and customizable.  GNU/Linux is the kernel and is licensed under GLPv2. The below information is a more traditional view of Linux.  While not always practical, it is the way Linux was designed.  Software subscription services, closed source firmware, third party device drivers, proprietary hardware abstraction layers, and non GPL licensing are many of the issues that modern corporations deal with on a daily basis.  Many corporations choose to completely ignore GNU/Linux licensing and instead choose to layer in a multitude of software, both closed and open source, with restrictive licensing that commonly violates GNU/Linux licensing and the components that are linked to.   Such systems often contain a mix of open source, distribution specific packages and custom compiled software with restrictive or absent licensing. These systems are then packaged and resold as custom appliances. 

Software should be compiled and not pulled from arbitrary, distribution specific, package sources across the Internet.  This is a huge source of a myriad of issues, from file system block issues to security vulnerabilities, and complex device driver issues.  By comparison, FreeBSD handles package management well.  On Linux systems, a minimal set of Linux kernel modules should be loaded and it is important to know exactly what each one does.  Many Linux distributions boot by default with over 30 kernel modules dynamically loaded.  Only a few are typically needed. Outbound network connections to third party companies should not exist on boot.  All outbound network connections should be monitored and easily configured at all times.  GPU drivers should be thoroughly examined, audited, community reviewed, and properly licensed.  This is very important. Only those specific to the video device on the host should be loaded.  As one moves their mouse across the screen and types on the keyboard, is their activity private and can only be seen by them? Many people ask this question and there are some things one can do, as described here, to mitigate a lot of the risk.  Selecting the right monitor and video card is important.  When X and the window manager are running, can one easily pull up a list of all network connections?  Finally, all BIOS and processor specific components, including those in kernel, and user space, should be clearly documented and understood.

With these basic considerations in hand, the first choice would be to create a custom Linux distribution and compile the kernel and user-space with only the necessary components. While this is not always practical, Arch Linux is the next logical choice that meets the above requirements.  For those interested in creating custom appliances with Linux, it is important to follow properly licensing.  And for those purchasing custom appliances and cloud services that use such appliances, performing the proper due diligence is critical.

Thursday, April 6, 2023

Multidimensional arrays of function pointers in C

Embedded hardware typically includes an application processor and one or more adjacent processor(s) attached to the printed circuit board. The firmware that resides on the adjacent processor(s) responds to instructions or commands.  Different processors on the same board are often produced by different companies.  For the system to function properly, it is imperative that the processors communicate without any issues, and that the firmware can handle all types of possible errors.

Formal requirements for firmware related projects may include the validation and verification of the firmware on a co-processor via the application programming interface (API).  Co-processors typically run 8, 16, or 32-bit embedded operating systems.  If the co-processor manufacturer provides a development board for testing the firmware on a specific co-processor, then the development board may have it's own application processor. Familiarity with all of the applicable bus communication protocols including synchronous and asynchronous communication is important.  High-volume testing of firmware can be accomplished using function-like macros and arrays of function pointers.  Processor specific firmware is written in C and assembly - 8, 16, 32, or 64-bit.  Executing inline assembly from C is straightforward and often required.  Furthermore, handling time-constraints such as real-time execution on adjacent processors is easier to deal with in C and executing syscalls, low-level C functions, and userspace library functions, is often more efficient.  Timing analysis is often a key consideration when testing firmware, and executing compiled C code on a time-sliced OS, such as Linux, is already constrained.

To read tests based on a custom grammar, a scanner and parser in C can be used. Lex is ideal for building a computationally efficient lexical analyzer that outputs a sequence of tokens. For this case, the tokens comprise the function signatures and any associated function metadata such as expected execution time. Creating a context-free grammar and generating the associated syntax tree from the lexical input is straightforward.   Dynamic arrays of function pointers can then be allocated at run-time, and code within external object files or libraries can be executed in parallel using multiple processes or threads. The symbol table information from those files can be stored in multi-dimensional arrays. While C is a statically typed language, the above design can be used for executing generic, variadic functions at run-time from tokenized input, with constant time lookup, minimal overhead, and specific run-time expectations (stack return value, execution time, count, etc.).

At a high level, lists of pointers to type-independent, variadic functions and their associated parameters can be stored within multi-dimensional arrays.  The following C code uses arrays of function pointers to execute functions via their addresses.  The code uses list management functions from the Linux kernel which I ported to userspace.

Wednesday, January 12, 2022

Concurrency, Parallelism, and Barrier Synchronization - Multiprocess and Multithreaded Programming

On preemptive, timed-sliced UNIX or Linux operating systems such as Solaris, AIX, Linux, BSD, and OS X, program code from one process executes on the processor for a time slice or quantum. After this time has elapsed, program code from another process executes for a time quantum. Linux divides CPU time into epochs, and each process has a specified time quantum within an epoch. The execution quantum is so small that the interleaved execution of independent, schedulable entities – often performing unrelated tasks – gives the appearance of multiple software applications running in parallel.

When the currently executing process relinquishes the processor, either voluntarily or involuntarily, another process can execute its program code. This event is known as a context switch, which facilitates interleaved execution. Time-sliced, interleaved execution of program code within an address space is known as concurrency.

The Linux kernel is fully preemptive, which means that it can force a context switch for a higher priority process. When a context switch occurs, the state of a process is saved to its process control block, and another process resumes execution on the processor.

A UNIX process is considered heavyweight because it has its own address space, file descriptors, register state, and program counter. In Linux, this information is stored in the task_struct. However, when a process context switch occurs, this information must be saved, which is a computationally expensive operation.

Concurrency applies to both threads and processes. A thread is an independent sequence of execution within a UNIX process, and it is also considered a schedulable entity. Both threads and processes are scheduled for execution on a processor core, but thread context switching is lighter in weight than process context switching.

In UNIX, processes often have multiple threads of execution that share the process's memory space. When multiple threads of execution are running inside a process, they typically perform related tasks. The Linux user-space APIs for process and thread management abstract many details. However, the concurrency level can be adjusted to influence the time quantum so that the system throughput is affected by shorter and longer durations of schedulable entity execution time.

While threads are typically lighter weight than processes, there have been different implementations across UNIX and Linux operating systems over the years. The three models that typically define the implementations across preemptive, time-sliced, multi-user UNIX and Linux operating systems are defined as follows - 1:1, 1:N, and M:N where 1:1 refers to the mapping of one user-space thread to one kernel thread, 1:N refers to the mapping of multiple user-space threads to a single kernel thread. M:N refers to the mapping of N user-space threads to M kernel threads.

In the 1:1 model, one user-space thread is mapped to one kernel thread. This allows for true parallelism, as each thread can run on a separate processor core. However, creating and managing a large number of kernel threads can be expensive.

In the 1:N model, multiple user-space threads are mapped to a single kernel thread. This is more lightweight, as there are fewer kernel threads to create and manage. However, it does not allow for true parallelism, as only one thread can execute on a processor core at a time.

In the M:N model, N user-space threads are mapped to M kernel threads. This provides a balance between the 1:1 and 1:N models, as it allows for both true parallelism and lightweight thread creation and management. However, it can be complex to implement and can lead to issues with load balancing and resource allocation.

Parallelism on a time-sliced, preemptive operating system means the simultaneous execution of multiple schedulable entities over a time quantum. Both processes and threads can execute in parallel across multiple cores or processors. Concurrency and parallelism are at play on a multi-user system with preemptive time-slicing and multiple processor cores. Affinity scheduling refers to scheduling processes and threads across multiple cores so that their concurrent and parallel execution is close to optimal.

It's worth noting that affinity scheduling refers to the practice of assigning processes or threads to specific processors or cores to optimize their execution and minimize unnecessary context switching. This can improve overall system performance by reducing cache misses and increasing cache hits, among other benefits. In contrast, non-affinity scheduling allows processes and threads to be executed on any available processor or core, which can result in more frequent context switching and lower performance.

Software applications are often designed to solve computationally complex problems. If the algorithm to solve a computationally complex problem can be parallelized, then multiple threads or processes can all run at the same time across multiple cores. Each process or thread executes by itself and does not contend for resources with other threads or processes working on the other parts of the problem to be solved. When each thread or process reaches the point where it can no longer contribute any more work to the solution of the problem, it waits at the barrier if a barrier has been implemented in software. When all threads or processes reach the barrier, their work output is synchronized and often aggregated by the primary process. Complex test frameworks often implement the barrier synchronization problem when certain types of tests can be run in parallel. Most individual software applications running on preemptive, time-sliced, multi-user Linux and UNIX operating systems are not designed with heavy, parallel thread or parallel, multiprocess execution in mind.

Minimizing lock granularity increases concurrency, throughput, and execution efficiency when designing multithreaded and multiprocess software programs. Multithreaded and multiprocess programs that do not correctly utilize synchronization primitives often require countless hours of debugging. The use of semaphores, mutex locks, and other synchronization primitives should be minimized to the maximum extent possible in computer programs that share resources between multiple threads or processes. Proper program design allows schedulable entities to run parallel or concurrently with high throughput and minimum resource contention. This is optimal for solving computationally complex problems on preemptive, time-sliced, multi-user operating systems without requiring hard, real-time scheduling.

Wednesday, February 24, 2021

A hardware design for variable output frequency using an n-bit counter

The DE1-SoC from Terasic is an excellent board for hardware design and prototyping. The following VHDL process is from a hardware design created for the Terasic DE1-SoC FPGA. The ten switches and four buttons on the FPGA are used as an n-bit counter with an adjustable multiplier to increase the output frequency of one or more output pins at a 50% duty cycle.

As the switches are moved or the buttons are pressed, the seven-segment display is updated to reflect the numeric output frequency, and the output pin(s) are driven at the desired frequency. The onboard clock runs at 50MHz, and the signal on the output pins is set on the rising edge of the clock input signal (positive edge-triggered). At 50MHz, the output pins can be toggled at a maximum rate of 50 million cycles per second or 25 million rising edges of the clock per second. An LED attached to one of the output pins would blink 25 million times per second, not recognizable to the human eye. The persistence of vision, which is the time the human eye retains an image after it disappears from view, is approximately 1/16th of a second. Therefore, an LED blinking at 25 million times per second would appear as a continuous light to the human eye.

scaler <= compute_prescaler((to_integer(unsigned( SW )))*scaler_mlt);
gpiopulse_process : process(CLOCK_50, KEY(0))
if (KEY(0) = '0') then -- async reset
count <= 0;
elsif rising_edge(CLOCK_50) then
if (count = scaler - 1) then
state <= not state;
count <= 0;
elsif (count = clk50divider) then -- auto reset
count <= 0;
count <= count + 1;
end if;
end if;
end process gpiopulse_process;
The scaler signal is calculated using the compute_prescaler function, which takes the value of a switch (SW) as an input, multiplies it with a multiplier (scaler_mlt), and then converts it to an integer using to_integer. This scaler signal is used to control the frequency of the pulse signal generated on the output pin.

The gpiopulse_process process is triggered by a rising edge of the CLOCK_50 signal and a push-button (KEY(0)) press. It includes an asynchronous reset when KEY(0) is pressed.

The count signal is incremented on each rising edge of the CLOCK_50 signal until it reaches the value of scaler - 1. When this happens, the state signal is inverted and count is reset to 0. If count reaches the value of clk50divider, it is also reset to 0.

Overall, this code generates a pulse signal with a frequency controlled by the value of a switch and a multiplier, which is generated on a specific output pin of the FPGA board. The pulse signal is toggled between two states at a frequency determined by the scaler signal.

It is important to note that concurrent statements within an architecture are executed concurrently, meaning that they are evaluated concurrently and in no particular order. However, the sequential statements within a process are executed sequentially, meaning that they are evaluated in order, one at a time. Processes themselves are executed concurrently with other processes, and each process has its own execution context.

Tuesday, August 25, 2020

Creating stronger keys for OpenSSH and GPG

Create Ed25519 SSH keypair (supported in OpenSSH 6.5+). Parameters are as follows:

-o save in new format
-a 128 for 128 kdf (key derivation function) rounds
-t ed25519 for type of key
ssh-keygen -o -a 128 -t ed25519 -f .ssh/ed25519-$(date '+%m-%d-%Y') -C ed25519-$(date '+%m-%d-%Y')
Create Ed448-Goldilocks GPG master key and sub keys.
gpg --quick-generate-key ed448-master-key-$(date '+%m-%d-%Y') ed448 sign 0
gpg --list-keys --with-colons "ed448-master-key-08-03-2021" | grep fpr
gpg --quick-add-key "$fpr" cv448 encr 2y
gpg --quick-add-key "$fpr" ed448 auth 2y
gpg --quick-add-key "$fpr" ed448 sign 2y

Sunday, September 2, 2018

96Boards - JTAG and serial UART configuration for ARM powered, single-board computers

The 96boards CE specification calls for an optional JTAG connection. The specification also indicates that the optional JTAG connection shall use a 10 pin through hole, .05" (1.27mm) pitch JTAG connector. The part is readily available on most electronics sites. Breaking out the pins with long wires and shrink wrapping them is ideal for making sure that each connection is labeled and separate when connecting to a JTAG debugger. While a JTAG connection is not required for flashing or loading the bootloaders onto the board, the JTAG connection is useful for advanced chip-level debugging. The serial UART connection is sufficient for loading release or debug versions of bl0, bl1, bl2, bl31, bl32, the kernel, and userspace.  Last but not least, ARM-powered boards, with 12V power input, often require external fans to keep the board cool. As seen in the below photos, two 5V fans were powered from an external power supply. Any work on microcontroller boards should be performed on a grounded surface.  Proper grounding procedures should always be followed as most microcontroller boards contain ESD sensitive components.

In the below photos, a 96Boards SBC is mounted on an IP65, ABS plastic junction box for durability. The pins are extended and mounted with screws underneath the junction box. The electrical conduit holes on the side of the junction box are ideal for holding small, project fans. The remaining electrical conduit holes provide a clean place to place the remaining wires from the board - micro USB, USB-C, and 12V power.

Thursday, June 7, 2018

HiKey 960 Linux Bridged Firewall

The Kirin 960 SoC and on-board USB 3.0 make the HiKey 960 SBC an ideal platform for running a Linux Bridged firewall. The number of single-board computers with an SoC as powerful as the HiSilicon Kirin 960 are limited.

When compared with the Raspberry Pi series of single board computers (SBC), the HiKey 960 SBC is significantly more powerful. The Kirin 960 also stands above the ARM powered SoCs which reside in most commercial routers.

USB 3.0 makes the HiKey 960 board an attractive option for bridging or routing, filtering network traffic, or connecting to an external gateway via IPSec. Both network traffic filtering and IPSec tunneling can be computationally expensive operations. However; the multicore Kirin 960 is well suited for these types of tasks.

In order to be able to run an IPSec client tunnel and a Linux Bridged firewall connected over 1G ethernet links, certain kernel configuration modifications are needed. Furthermore, the Android Linux kernel for the HiKey 960 board does not boot on a standard Linux root filesystem because it is designed to boot an Android customized rootfs.

The latest googlesource Linux kernel (hikey-linaro-4.9) for Android (designed to boot Android on the HiKey 960 board) has been customized to remove the Android specific components so that the kernel boots on a standard Linux root filesystem, with the proper drivers enabled for network connectivity via attached 1000Mb/s USB 3.0 to ethernet adapters. The standard UART interface on the board should be used for serial connectivity and shell access. WiFi and Bluetooth have been removed from the kernel configuration. The kernel should be booted off of a microSDHC UHS-I card. The 96boards instructions should be followed for configuring the HiKey 960 board, setting the jumpers on the board, building and flashing the l-loader, firmware package, partition tables, UEFI loader, ARM Trusted Firmware, and optional Op-TEE. Links for the normal Linux kernel configuration, multi-interface bridge configuration, and single interface IPSec configuration are below. Additional kernel config modifications may be needed for certain types of applications.

kernel build instructions

mkdir /usr/local/toolchains
cd /usr/local/toolchains/
tar -xJf gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu.tar.xz
export ARCH=arm64
export CROSS_COMPILE=/usr/local/toolchains/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-
export PATH=/usr/local/toolchains/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/gcc-aarch64-linux-gnu/bin:$PATH
cd /usr/local/src
git clone
cd hikey-linaro
git checkout -b android-hikey-linaro-4.9 
make hikey960_defconfig
make -j8

multi-interface bridge configuration 

Bridged configuration, no ip addresses on dual nic interfaces. (crossover cable is useful for testing). Bridge interface obtains dhcp address(/11) from wlan router. aliased interface added to br0 and assigned private subnet ip on different subnet (/8). Spanning tree set on bridge interface. Basic ebtables and iptables ruleset below.

brctl addbr <br>
brctl addif <br> <eth1> <eth2>
ifconfig <br> up
ifconfig <eth1> up
ifconfig <eth2> up
brctl stp <br> yes
dhclient <br>
ifconfig <br>:0 <a.b.c.d/sn> up

iptables --table nat --append POSTROUTING --out-interface <br> -j MASQUERADE
iptables -P INPUT DROP
iptables --append FORWARD --in-interface <br>:0 -j ACCEPT
ebtables -P FORWARD DROP
ebtables -P INPUT DROP
ebtables -P OUTPUT DROP
ebtables -t filter -A FORWARD -p IPv4 -j ACCEPT
ebtables -t filter -A INPUT -p IPv4 -j ACCEPT
ebtables -t filter -A OUTPUT -p IPv4 -j ACCEPT
ebtables -t filter -A INPUT -p ARP -j ACCEPT
ebtables -t filter -A OUTPUT -p ARP -j ACCEPT
ebtables -t filter -A FORWARD -p ARP -j REJECT
ebtables -t filter -A FORWARD -p IPv6 -j DROP
ebtables -t filter -A FORWARD -d Multicast -j DROP
ebtables -t filter -A FORWARD -p X25 -j DROP
ebtables -t filter -A FORWARD -p FR_ARP -j DROP
ebtables -t filter -A FORWARD -p BPQ -j DROP
ebtables -t filter -A FORWARD -p DEC -j DROP
ebtables -t filter -A FORWARD -p DNA_DL -j DROP
ebtables -t filter -A FORWARD -p DNA_RC -j DROP
ebtables -t filter -A FORWARD -p LAT -j DROP
ebtables -t filter -A FORWARD -p DIAG -j DROP
ebtables -t filter -A FORWARD -p CUST -j DROP
ebtables -t filter -A FORWARD -p SCA -j DROP
ebtables -t filter -A FORWARD -p TEB -j DROP
ebtables -t filter -A FORWARD -p RAW_FR -j DROP
ebtables -t filter -A FORWARD -p AARP -j DROP
ebtables -t filter -A FORWARD -p ATALK -j DROP
ebtables -t filter -A FORWARD -p 802_1Q -j DROP
ebtables -t filter -A FORWARD -p IPX -j DROP
ebtables -t filter -A FORWARD -p NetBEUI -j DROP
ebtables -t filter -A FORWARD -p PPP -j DROP
ebtables -t filter -A FORWARD -p ATMMPOA -j DROP
ebtables -t filter -A FORWARD -p PPP_DISC -j DROP
ebtables -t filter -A FORWARD -p PPP_SES -j DROP
ebtables -t filter -A FORWARD -p ATMFATE -j DROP
ebtables -t filter -A FORWARD -p LOOP -j DROP
ebtables -t filter -A FORWARD --log-level info --log-ip --log-prefix FFWLOG
ebtables -t filter -A OUTPUT --log-level info --log-ip --log-arp --log-prefix OFWLOG -j DROP
ebtables -t filter -A INPUT --log-level info --log-ip --log-prefix IFWLOG

single-interface ipsec gateway configuration

iptables -t nat -A POSTROUTING -s <clientip>/32 -o <eth> -j SNAT --to-source <virtualip>
iptables -t nat -A POSTROUTING -s <clientip>/32 -o <eth> -m policy --dir out --pol ipsec -j ACCEPT