dinsdag 19 mei 2026

Frankfurt Am Main

There are some places that seem really familiar, kind of like home. Frankfurt is one of them for me. Perhaps things have changed a bit since the 70s. But with such a deep, rich history, the city's architecture remains. But for some reason, North Holland feels more than kind of like home. There is a place in North Holland that I've carried with me for a while now. 

North Holland has been in my vocabulary and brought up in some context, at every place I've lived in my life, since I learned to speak in the 1970's. I was only a few years old then. This includes outside of Texas... It's not the Anne Frank house, which is a great place to visit if you haven't been. I do wish they would open the original front door. The place I'm referring to is near that cafe by the canal. It holds a very personal meaning to me along with Frankfurt, especially by that tree in Frankfurt Am Main....

 I may post some pics in the future but for now, perhaps I will write about it, especially since I'm so far away from it here in Texas. Texas has been my home for most of my life.

720-nm-Filter on full spectrum mirrorless
© 2026 Bryan R. Hinton
provenance · integrity record
Hashes refer to the canonical processed image files, not Blogger-generated display derivatives.
fingerprint: A580 94D4 1256 B541 3893 3D94 D524 3E42 0906 8499
OpenTimestamps proof verified; block attestation confirms existence as of 2026-05-20 CDT

zaterdag 6 april 2024

Multidimensional arrays of function pointers in C

Embedded hardware typically includes an application processor and one or more adjacent processor(s) attached to the printed circuit board. The firmware that resides on the adjacent processor(s) responds to instructions or commands.  Different processors on the same board are often produced by different companies.  For the system to function properly, it is imperative that the processors communicate without any issues, and that the firmware can handle all types of possible errors.

Formal requirements for firmware related projects may include the validation and verification of the firmware on a co-processor via the application programming interface (API).  Co-processors typically run 8, 16, or 32-bit embedded operating systems.  If the co-processor manufacturer provides a development board for testing the firmware on a specific co-processor, then the development board may have it's own application processor. Familiarity with all of the applicable bus communication protocols including synchronous and asynchronous communication is important.  High-volume testing of firmware can be accomplished using function-like macros and arrays of function pointers.  Processor specific firmware is written in C and assembly - 8, 16, 32, or 64-bit.  Executing inline assembly from C is straightforward and often required.  Furthermore, handling time-constraints such as real-time execution on adjacent processors is easier to deal with in C and executing syscalls, low-level C functions, and userspace library functions, is often more efficient.  Timing analysis is often a key consideration when testing firmware, and executing compiled C code on a time-sliced OS, such as Linux, is already constrained.

To read tests based on a custom grammar, a scanner and parser in C can be used. Lex is ideal for building a computationally efficient lexical analyzer that outputs a sequence of tokens. For this case, the tokens comprise the function signatures and any associated function metadata such as expected execution time. Creating a context-free grammar and generating the associated syntax tree from the lexical input is straightforward.   Dynamic arrays of function pointers can then be allocated at run-time, and code within external object files or libraries can be executed in parallel using multiple processes or threads. The symbol table information from those files can be stored in multi-dimensional arrays. While C is a statically typed language, the above design can be used for executing generic, variadic functions at run-time from tokenized input, with constant time lookup, minimal overhead, and specific run-time expectations (stack return value, execution time, count, etc.).

At a high level, lists of pointers to type-independent, variadic functions and their associated parameters can be stored within multi-dimensional arrays.  The following C code uses arrays of function pointers to execute functions via their addresses.  The code uses list management functions from the Linux kernel which I ported to userspace.

https://github.com/brhinton/bcn

woensdag 12 januari 2022

Concurrency, Parallelism, and Barrier Synchronization - Multiprocess and Multithreaded Programming

On preemptive, timed-sliced UNIX or Linux operating systems such as Solaris, AIX, Linux, BSD, and OS X, program code from one process executes on the processor for a time slice or quantum. After this time has elapsed, program code from another process executes for a time quantum. Linux divides CPU time into epochs, and each process has a specified time quantum within an epoch. The execution quantum is so small that the interleaved execution of independent, schedulable entities – often performing unrelated tasks – gives the appearance of multiple software applications running in parallel.

When the currently executing process relinquishes the processor, either voluntarily or involuntarily, another process can execute its program code. This event is known as a context switch, which facilitates interleaved execution. Time-sliced, interleaved execution of program code within an address space is known as concurrency.

The Linux kernel is fully preemptive, which means that it can force a context switch for a higher priority process. When a context switch occurs, the state of a process is saved to its process control block, and another process resumes execution on the processor.

A UNIX process is considered heavyweight because it has its own address space, file descriptors, register state, and program counter. In Linux, this information is stored in the task_struct. However, when a process context switch occurs, this information must be saved, which is a computationally expensive operation.

Concurrency applies to both threads and processes. A thread is an independent sequence of execution within a UNIX process, and it is also considered a schedulable entity. Both threads and processes are scheduled for execution on a processor core, but thread context switching is lighter in weight than process context switching.

In UNIX, processes often have multiple threads of execution that share the process's memory space. When multiple threads of execution are running inside a process, they typically perform related tasks. The Linux user-space APIs for process and thread management abstract many details. However, the concurrency level can be adjusted to influence the time quantum so that the system throughput is affected by shorter and longer durations of schedulable entity execution time.

While threads are typically lighter weight than processes, there have been different implementations across UNIX and Linux operating systems over the years. The three models that typically define the implementations across preemptive, time-sliced, multi-user UNIX and Linux operating systems are defined as follows - 1:1, 1:N, and M:N where 1:1 refers to the mapping of one user-space thread to one kernel thread, 1:N refers to the mapping of multiple user-space threads to a single kernel thread. M:N refers to the mapping of N user-space threads to M kernel threads.

In the 1:1 model, one user-space thread is mapped to one kernel thread. This allows for true parallelism, as each thread can run on a separate processor core. However, creating and managing a large number of kernel threads can be expensive.

In the 1:N model, multiple user-space threads are mapped to a single kernel thread. This is more lightweight, as there are fewer kernel threads to create and manage. However, it does not allow for true parallelism, as only one thread can execute on a processor core at a time.

In the M:N model, N user-space threads are mapped to M kernel threads. This provides a balance between the 1:1 and 1:N models, as it allows for both true parallelism and lightweight thread creation and management. However, it can be complex to implement and can lead to issues with load balancing and resource allocation.

Parallelism on a time-sliced, preemptive operating system means the simultaneous execution of multiple schedulable entities over a time quantum. Both processes and threads can execute in parallel across multiple cores or processors. Concurrency and parallelism are at play on a multi-user system with preemptive time-slicing and multiple processor cores. Affinity scheduling refers to scheduling processes and threads across multiple cores so that their concurrent and parallel execution is close to optimal.

It's worth noting that affinity scheduling refers to the practice of assigning processes or threads to specific processors or cores to optimize their execution and minimize unnecessary context switching. This can improve overall system performance by reducing cache misses and increasing cache hits, among other benefits. In contrast, non-affinity scheduling allows processes and threads to be executed on any available processor or core, which can result in more frequent context switching and lower performance.

Software applications are often designed to solve computationally complex problems. If the algorithm to solve a computationally complex problem can be parallelized, then multiple threads or processes can all run at the same time across multiple cores. Each process or thread executes by itself and does not contend for resources with other threads or processes working on the other parts of the problem to be solved. When each thread or process reaches the point where it can no longer contribute any more work to the solution of the problem, it waits at the barrier if a barrier has been implemented in software. When all threads or processes reach the barrier, their work output is synchronized and often aggregated by the primary process. Complex test frameworks often implement the barrier synchronization problem when certain types of tests can be run in parallel. Most individual software applications running on preemptive, time-sliced, multi-user Linux and UNIX operating systems are not designed with heavy, parallel thread or parallel, multiprocess execution in mind.

Minimizing lock granularity increases concurrency, throughput, and execution efficiency when designing multithreaded and multiprocess software programs. Multithreaded and multiprocess programs that do not correctly utilize synchronization primitives often require countless hours of debugging. The use of semaphores, mutex locks, and other synchronization primitives should be minimized to the maximum extent possible in computer programs that share resources between multiple threads or processes. Proper program design allows schedulable entities to run parallel or concurrently with high throughput and minimum resource contention. This is optimal for solving computationally complex problems on preemptive, time-sliced, multi-user operating systems without requiring hard, real-time scheduling.

woensdag 24 februari 2021

A hardware design for variable output frequency using an n-bit counter

The DE1-SoC from Terasic is an excellent board for hardware design and prototyping. The following VHDL process is from a hardware design created for the Terasic DE1-SoC FPGA. The ten switches and four buttons on the FPGA are used as an n-bit counter with an adjustable multiplier to increase the output frequency of one or more output pins at a 50% duty cycle.

As the switches are moved or the buttons are pressed, the seven-segment display is updated to reflect the numeric output frequency, and the output pin(s) are driven at the desired frequency. The onboard clock runs at 50MHz, and the signal on the output pins is set on the rising edge of the clock input signal (positive edge-triggered). At 50MHz, the output pins can be toggled at a maximum rate of 50 million cycles per second or 25 million rising edges of the clock per second. An LED attached to one of the output pins would blink 25 million times per second, not recognizable to the human eye. The persistence of vision, which is the time the human eye retains an image after it disappears from view, is approximately 1/16th of a second. Therefore, an LED blinking at 25 million times per second would appear as a continuous light to the human eye.

scaler <= compute_prescaler((to_integer(unsigned( SW )))*scaler_mlt);
gpiopulse_process : process(CLOCK_50, KEY(0))
begin
if (KEY(0) = '0') then -- async reset
count <= 0;
elsif rising_edge(CLOCK_50) then
if (count = scaler - 1) then
state <= not state;
count <= 0;
elsif (count = clk50divider) then -- auto reset
count <= 0;
else
count <= count + 1;
end if;
end if;
end process gpiopulse_process;
The scaler signal is calculated using the compute_prescaler function, which takes the value of a switch (SW) as an input, multiplies it with a multiplier (scaler_mlt), and then converts it to an integer using to_integer. This scaler signal is used to control the frequency of the pulse signal generated on the output pin.

The gpiopulse_process process is triggered by a rising edge of the CLOCK_50 signal and a push-button (KEY(0)) press. It includes an asynchronous reset when KEY(0) is pressed.

The count signal is incremented on each rising edge of the CLOCK_50 signal until it reaches the value of scaler - 1. When this happens, the state signal is inverted and count is reset to 0. If count reaches the value of clk50divider, it is also reset to 0.

Overall, this code generates a pulse signal with a frequency controlled by the value of a switch and a multiplier, which is generated on a specific output pin of the FPGA board. The pulse signal is toggled between two states at a frequency determined by the scaler signal.

It is important to note that concurrent statements within an architecture are executed concurrently, meaning that they are evaluated concurrently and in no particular order. However, the sequential statements within a process are executed sequentially, meaning that they are evaluated in order, one at a time. Processes themselves are executed concurrently with other processes, and each process has its own execution context.

dinsdag 25 augustus 2020

Creating stronger keys for OpenSSH and GPG

Create Ed25519 SSH keypair (supported in OpenSSH 6.5+). Parameters are as follows:

-o save in new format
-a 128 for 128 kdf (key derivation function) rounds
-t ed25519 for type of key
ssh-keygen -o -a 128 -t ed25519 -f .ssh/ed25519-$(date '+%m-%d-%Y') -C ed25519-$(date '+%m-%d-%Y')
Create Ed448-Goldilocks GPG master key and sub keys.
# gpg --quick-generate-key ed448-master-key-$(date '+%m-%d-%Y') ed448 sign 0
# gpg --list-keys --with-colons "ed448-master-key-08-03-2021" | grep fpr
# gpg --quick-add-key "$fpr" cv448 encr 2y
# gpg --quick-add-key "$fpr" ed448 auth 2y
# gpg --quick-add-key "$fpr" ed448 sign 2y

zondag 2 september 2018

96Boards - JTAG and serial UART configuration for ARM powered, single-board computers

The 96boards CE specification calls for an optional JTAG connection. The specification also indicates that the optional JTAG connection shall use a 10 pin through hole, .05" (1.27mm) pitch JTAG connector. The part is readily available on most electronics sites. Breaking out the pins with long wires and shrink wrapping them is ideal for making sure that each connection is labeled and separate when connecting to a JTAG debugger. While a JTAG connection is not required for flashing or loading the bootloaders onto the board, the JTAG connection is useful for advanced chip-level debugging. The serial UART connection is sufficient for loading release or debug versions of bl0, bl1, bl2, bl31, bl32, the kernel, and userspace.  Last but not least, ARM-powered boards, with 12V power input, often require external fans to keep the board cool. As seen in the below photos, two 5V fans were powered from an external power supply. Any work on microcontroller boards should be performed on a grounded surface.  Proper grounding procedures should always be followed as most microcontroller boards contain ESD sensitive components.

In the below photos, a 96Boards SBC is mounted on an IP65, ABS plastic junction box for durability. The pins are extended and mounted with screws underneath the junction box. The electrical conduit holes on the side of the junction box are ideal for holding small, project fans. The remaining electrical conduit holes provide a clean place to place the remaining wires from the board - micro USB, USB-C, and 12V power.


donderdag 7 juni 2018

HiKey 960 Linux Bridged Firewall

The Kirin 960 SoC and on-board USB 3.0 make the HiKey 960 SBC an ideal platform for running a Linux Bridged firewall. The number of single-board computers with an SoC as powerful as the HiSilicon Kirin 960 are limited.

When compared with the Raspberry Pi series of single board computers (SBC), the HiKey 960 SBC is significantly more powerful. The Kirin 960 also stands above the ARM powered SoCs which reside in most commercial routers.

USB 3.0 makes the HiKey 960 board an attractive option for bridging or routing, filtering network traffic, or connecting to an external gateway via IPSec. Both network traffic filtering and IPSec tunneling can be computationally expensive operations. However; the multicore Kirin 960 is well suited for these types of tasks.

In order to be able to run an IPSec client tunnel and a Linux Bridged firewall connected over 1G ethernet links, certain kernel configuration modifications are needed. Furthermore, the Android Linux kernel for the HiKey 960 board does not boot on a standard Linux root filesystem because it is designed to boot an Android customized rootfs.

The latest googlesource Linux kernel (hikey-linaro-4.9) for Android (designed to boot Android on the HiKey 960 board) has been customized to remove the Android specific components so that the kernel boots on a standard Linux root filesystem, with the proper drivers enabled for network connectivity via attached 1000Mb/s USB 3.0 to ethernet adapters. The standard UART interface on the board should be used for serial connectivity and shell access. WiFi and Bluetooth have been removed from the kernel configuration. The kernel should be booted off of a microSDHC UHS-I card. The 96boards instructions should be followed for configuring the HiKey 960 board, setting the jumpers on the board, building and flashing the l-loader, firmware package, partition tables, UEFI loader, ARM Trusted Firmware, and optional Op-TEE. Links for the normal Linux kernel configuration, multi-interface bridge configuration, and single interface IPSec configuration are below. Additional kernel config modifications may be needed for certain types of applications.

kernel build instructions


mkdir /usr/local/toolchains
cd /usr/local/toolchains/
wget https://releases.linaro.org/components/toolchain/binaries/latest/aarch64-linux-gnu/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu.tar.xz
tar -xJf gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu.tar.xz
export ARCH=arm64
export CROSS_COMPILE=/usr/local/toolchains/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/bin/aarch64-linux-gnu-
export PATH=/usr/local/toolchains/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu/gcc-aarch64-linux-gnu/bin:$PATH
cd /usr/local/src
git clone https://android.googlesource.com/kernel/hikey-linaro
cd hikey-linaro
git checkout -b android-hikey-linaro-4.9 
make hikey960_defconfig
make -j8

multi-interface bridge configuration 

Bridged configuration, no ip addresses on dual nic interfaces. (crossover cable is useful for testing). Bridge interface obtains dhcp address(/11) from wlan router. aliased interface added to br0 and assigned private subnet ip on different subnet (/8). Spanning tree set on bridge interface. Basic ebtables and iptables ruleset below.

brctl addbr <br>
brctl addif <br> <eth1> <eth2>
ifconfig <br> up
ifconfig <eth1> up
ifconfig <eth2> up
brctl stp <br> yes
dhclient <br>
ifconfig <br>:0 <a.b.c.d/sn> up

iptables --table nat --append POSTROUTING --out-interface <br> -j MASQUERADE
iptables -P INPUT DROP
iptables --append FORWARD --in-interface <br>:0 -j ACCEPT
ebtables -P FORWARD DROP
ebtables -P INPUT DROP
ebtables -P OUTPUT DROP
ebtables -t filter -A FORWARD -p IPv4 -j ACCEPT
ebtables -t filter -A INPUT -p IPv4 -j ACCEPT
ebtables -t filter -A OUTPUT -p IPv4 -j ACCEPT
ebtables -t filter -A INPUT -p ARP -j ACCEPT
ebtables -t filter -A OUTPUT -p ARP -j ACCEPT
ebtables -t filter -A FORWARD -p ARP -j REJECT
ebtables -t filter -A FORWARD -p IPv6 -j DROP
ebtables -t filter -A FORWARD -d Multicast -j DROP
ebtables -t filter -A FORWARD -p X25 -j DROP
ebtables -t filter -A FORWARD -p FR_ARP -j DROP
ebtables -t filter -A FORWARD -p BPQ -j DROP
ebtables -t filter -A FORWARD -p DEC -j DROP
ebtables -t filter -A FORWARD -p DNA_DL -j DROP
ebtables -t filter -A FORWARD -p DNA_RC -j DROP
ebtables -t filter -A FORWARD -p LAT -j DROP
ebtables -t filter -A FORWARD -p DIAG -j DROP
ebtables -t filter -A FORWARD -p CUST -j DROP
ebtables -t filter -A FORWARD -p SCA -j DROP
ebtables -t filter -A FORWARD -p TEB -j DROP
ebtables -t filter -A FORWARD -p RAW_FR -j DROP
ebtables -t filter -A FORWARD -p AARP -j DROP
ebtables -t filter -A FORWARD -p ATALK -j DROP
ebtables -t filter -A FORWARD -p 802_1Q -j DROP
ebtables -t filter -A FORWARD -p IPX -j DROP
ebtables -t filter -A FORWARD -p NetBEUI -j DROP
ebtables -t filter -A FORWARD -p PPP -j DROP
ebtables -t filter -A FORWARD -p ATMMPOA -j DROP
ebtables -t filter -A FORWARD -p PPP_DISC -j DROP
ebtables -t filter -A FORWARD -p PPP_SES -j DROP
ebtables -t filter -A FORWARD -p ATMFATE -j DROP
ebtables -t filter -A FORWARD -p LOOP -j DROP
ebtables -t filter -A FORWARD --log-level info --log-ip --log-prefix FFWLOG
ebtables -t filter -A OUTPUT --log-level info --log-ip --log-arp --log-prefix OFWLOG -j DROP
ebtables -t filter -A INPUT --log-level info --log-ip --log-prefix IFWLOG

single-interface ipsec gateway configuration


iptables -t nat -A POSTROUTING -s <clientip>/32 -o <eth> -j SNAT --to-source <virtualip>
iptables -t nat -A POSTROUTING -s <clientip>/32 -o <eth> -m policy --dir out --pol ipsec -j ACCEPT

donderdag 1 februari 2018

a Hardware Design for XOR gates using sequential logic in VHDL



ModelSim Full Window view with wave form output of xor simulation. ModelSim-Intel FPGA Starter Edition © Intel


XOR logic gates are a fundamental component in cryptography, and many of the typical stream and block ciphers use XOR gates. A few of these ciphers are ChaCha (stream cipher), AES (block cipher), and RSA (block cipher).

While many compiled and interpreted languages support bitwise operations such as XOR, the software implementation of both block and stream ciphers is computationally inefficient compared to FPGA and ASIC implementations.

Hybrid FPGA boards integrate FPGAs with multicore ARM and Intel application processors over high-speed buses. The ARM and Intel processors are general-purpose processors. On a hybrid board, the ARM or Intel processor is termed the hard processor system or HPS. Writing to the FPGA from the HPS is typically performed via C from an embedded Linux build (yocto or buildroot) running on the ARM or Intel core. A simple bitstream can also be loaded into the FPGA fabric without using any ARM design blocks or functionality in the ARM core for a hybrid ARM configuration.

The following is a simple hardware design written in VHDL and simulated in ModelSim. The image contains the waveform output of a simulation in ModelSim. The HPS is not used. On boot, the bitstream is loaded into the FPGA fabric. VHDL components are utilized, and a testbench is defined for testing the design. The entity and architecture VHDL design units are below.
- --three input xnor gate entity declaration - external interface to design entity
entity xnorgate is
port (
a,b,c : in std_logic;
q : out std_logic);
end xnorgate;

architecture xng of xnorgate is
begin
q <= a xnor b xnor c;
end xng;

- --chain of xor / xnor gates using components and sequential logic
entity xorchain is
port (
A,B,C,D,E,F : in std_logic;
Av,Bv : in std_logic_vector(31 downto 0);
CLOCK_50 : in std_logic;
Q : out std_logic;
Qv : out std_logic_vector(31 downto 0));
end xorchain;

architecture rtl of xorchain is
component xorgate is
port (
a,b : in std_logic;
q : out std_logic);
end component;

component xnorgate is
port (
a,b,c : in std_logic;
q : out std_logic);
end component;

component xorsgate is
port (
av : in std_logic_vector(31 downto 0);
bv : in std_logic_vector(31 downto 0);
qv : out std_logic_vector(31 downto 0));
end component;

signal a_in, b_in, c_in, d_in, e_in, f_in : std_logic;
signal av_in, bv_in : std_logic_vector(31 downto 0);

signal conn1, conn2, conn3 : std_logic;

begin
xorgt1 : xorgate port map(a => a_in, b => b_in, q => conn1);
xorgt2 : xorgate port map(a => c_in, b => d_in, q => conn2);
xorgt3 : xorgate port map(a => e_in, b => f_in, q => conn3);
xnorgt1 : xnorgate port map(conn1, conn2, conn3, Q);
xorsgt1 : xorsgate port map(av => av_in, bv => bv_in, qv => Qv);

process(CLOCK_50)
begin
if rising_edge(CLOCK_50) then --assign inputs on rising clock edge
a_in <= A;
b_in <= B;
c_in <= C;
d_in <= D;
e_in <= E;
f_in <= F;
av_in(31 downto 0) <= Av(31 downto 0);
bv_in(31 downto 0) <= Bv(31 downto 0);
end if;
    end process;
end rtl;

entity xorchain_tb is
end xorchain_tb;

architecture xorchain_tb_arch of xorchain_tb is
signal A_in,B_in,C_in,D_in,E_in,F_in : std_logic := '0';
signal Av_in : std_logic_vector(31 downto 0);
signal Bv_in : std_logic_vector(31 downto 0);
signal CLOCK_50_in : std_logic;
signal BRK : boolean := FALSE;
signal Q_out : std_logic;
signal Qv_out : std_logic_vector(31 downto 0);

component xorchain
port (
A,B,C,D,E,F : in std_logic;
Av : in std_logic_vector(31 downto 0);
Bv : in std_logic_vector(31 downto 0);
CLOCK_50 : in std_logic;
Q : out std_logic;
Qv : out std_logic_vector(31 downto 0));
end component;

begin
xorchain_instance: xorchain port map (A => A_in,B => B_in, C => C_in,
D => D_in, E => E_in, F => F_in, Av => Av_in,
Bv => Bv_in, CLOCK_50 => CLOCK_50_in, Q => Q_out,
Qv => Qv_out);
clockprocess: process
begin
while not BRK loop
CLOCK_50_in <= '0';
wait for 20 ns;
CLOCK_50_in <= '1';
wait for 20 ns;
end loop;
wait;
end process clockprocess;

testprocess : process
begin
A_in <= '1';
B_in <= '0';
C_in <= '1';
D_in <= '0';
E_in <= '1';
F_in <= '1';
wait for 40 ns;
A_in <= '1';
B_in <= '0';
C_in <= '1';
D_in <= '0';
E_in <= '1';
F_in <= '0';
wait for 20 ns;
A_in <= '0';
B_in <= '0';
C_in <= '1';
D_in <= '0';
E_in <= '1';
F_in <= '0';
wait for 40 ns;
BRK <= TRUE;
wait;
end process testprocess;
end xorchain_tb_arch;

entity xorgate is
port (
a,b : in std_logic;
q : out std_logic);
end xorgate;

architecture xg of xorgate is
begin
q <= a xor b;
end xg;

entity xorsgate is
port (
av : in std_logic_vector(31 downto 0);
bv : in std_logic_vector(31 downto 0);
qv : out std_logic_vector(31 downto 0));
end xorsgate;

architecture xsg of xorsgate is
begin
qv <= av xor bv;
end xsg;

vrijdag 16 september 2016

Implementing Software-defined radio and Infrared Time-lapse Imaging with Tensorflow on a custom Linux distribution for the Raspberry Pi 3

GNURadio Companion Qt Gui Frequency Sync - multiple FIR filter taps
sample running on Raspberry Pi 3 custom Linux distribution

The Raspberry Pi 3 is powered by the ARM Cortex-A53 processor. This 1.2GHz 64-bit quad-core processor fully supports the ARMv8-A architecture. For this project, a custom Linux distribution was created for the Raspberry Pi 3.  

The custom Linux distribution includes support for GNURadio, several FPGA and ARM Powered SDR devices, D-STAR (hotspot, repeater, and dongle support), hsuart, libusb, hardware real-time clock support, Sony 14 megapixel NoIR image sensor, HDMI and 3.5mm audio, USB Microphone input, X-windows with Xfce, Lighttpd and PHP, Bluetooth, WiFi, SSH, TCPDump, Docker, Docker registry, MySQL, Perl, Python, QT, GTK, IPTables, x11vnc, SELinux, and full native-toolchain development support.

The Sony 14 megapixel image sensor with the infrared filter removed can be connected to the Raspberry Pi 3's MIPI camera serial interface. Image capture and recognition can then be performed over contiguous periods of time, and time-lapsed video can be created from the images. With support for Tensorflow and OpenCV, object recognition within images can be performed.

D-STAR hotspot with time-lapsed infrared imaging.


For the initial run, an infrared Time-lapse Video was created from an initial image capture run of one 3280x2460 infrared jpeg image captured every 15 seconds for three hours. 40, 5mm, 940nm LEDs, powered by 500ma over 12v DC, provided infrared illumination in the 940nm wavelength.

Tensorflow ran in the background (on v4l2 kmod) and provided continuous object recognition and scoring within each image via a sample model. Finally, OpenCV was also installed in the root file system.

The time-lapse infrared video was captured of the living room using the above setup. Below this image are images of Tensorflow running in a terminal in the background on the Raspberry Pi 3 and recognizing/scoring objects in the living room.

Tensorflow running on the Raspberry Pi 3 and continuously capturing frames from the image sensor and scoring objects



 

GNURadio Companion running on xfce on the Raspberry Pi 3

dinsdag 16 augustus 2016

Profiling Multiprocess C programs with ARM DS-5 Streamline

The ARM DS-5 Streamline Performance Analyzer is a powerful tool for debugging, profiling, and analyzing multithreaded and multiprocess C programs.  Instructions can easily be traced between load and store operations.  Per process and per thread function call paths can be broken down by system utilization percentage.  Branch mispredictions and multi-level CPU caches can be analyzed. Furthermore, disk I/O usage, stack and heap usage, and a number of other useful metrics can quickly be referenced within the debugger. These are just a few of its capabilities.

In order to capture meaningful information from the DS-5 Streamline Performance Analyzer tool, a Linux, multiprocess, C program was modified to insert 1000 packets into a packet processing simulation buffer.  A code excerpt from the program is below.  The child processes were modified to sleep and then wake 1000 times in order to simulate process activity.  The program was analyzed using the DS-5 Streamline Performance Analyzer tool.  There are two screenshots below the code excerpt where the program is loaded into the DS-5 Streamline Performance Analyzer.

void *insertpackets(void *arg) {

struct pktbuf *pkbuf;
struct packet *pkt;
int idx;

if(arg != NULL) {

pkbuf = (struct pktbuf *)arg;

/* seed random number generator */
...

/* insert 1000 packets into the packet buffer */
for(idx = 0; idx < 1000; ++idx) {

pkt = (struct packet *)malloc(sizeof(struct packet));

if(pkt != NULL) {

/* set the packet processing simulation multiplier to 3 */
pkt->mlt=...()%3;

/* insert packet in the packet buffer */
if(pkt_queue(pkbuf,pkt) != 0) {

...
...
...
...
...
...

int fcnb(time_t secs, long nsecs) {

struct timespec rqtp;
struct timespec rmtp;
int ret;
int idx;

rqtp.tv_sec = secs;
rqtp.tv_nsec = nsecs;

for(idx = 0; idx < 1000; idx++) {

ret = nanosleep(&rqtp, &rmtp);

...
...
... 
 
ARM DS-5 Streamline - Profiling the process creation application

ARM DS-5 Streamline - Code View with C code in the top window
and ARM assembly instructions in the bottom window

https://github.com/brhinton/de0-nano-soc/blob/main/run.c

donderdag 30 juni 2016

VHDL Processes for Pulsing Multiple GPIO Pins at Different Frequencies on Altera FPGA

 
DE1-SoC GPIO Pins connected to 780nm Infrared Laser Diodes, 660nm Red Laser Diodes, and Oscilloscope

The following VHDL processes pulse the GPIO pins at different frequencies on the Altera DE1-SoC using multiple Phase-Locked Loops. Several diodes were connected to the GPIO banks and pulsed at a 50% duty cycle with 16mA across 3.3V. Each GPIO bank on the DE1-SoC has 36 pins. Pin 1 is pulsed at 20Hz from GPIO bank 0, and pins 0 and 1 are pulsed at 30Hz from GPIO bank 1. A direct mode PLL with locked output was configured using the Altera Quartus Prime MegaWizard. The PLL reference clock frequency is set to 50MHz, the output clock frequency is set to 50MHz, and the duty cycle is set to 50%. The pin mappings for GPIO banks 0 and 1 are documented on the DE1-SoC datasheet.

Pulsed Laser Diodes via GPIO pins on DE1-SoC FPGA

- -- ---------------------
- -- CLOCK A AND B PROCESSES --
- -- INPUT: direct mode pll with locked output
- -- and reference clock frequency set to 50MHz,
- -- output clock frequency set to 50MHz with 50% duty
- -- cycle and output frequency scaled by freq divider constant
- -- ----------------------------------------------------------- 
clk_a_process : process (lkd_pll_clk_a)
begin
if rising_edge(lkd_pll_clk_a) then
if (cycle_ctr_a < FREQ_A_DIVIDER) then
cycle_ctr_a <= cycle_ctr_a + 1;
else
cycle_ctr_a <= 0;
end if;
end if;
end process clk_a_process;

clk_b_process : process (lkd_pll_clk_b)
begin
if rising_edge(lkd_pll_clk_b) then
if (cycle_ctr_b < FREQ_B_DIVIDER) then
cycle_ctr_b <= cycle_ctr_b + 1;
else
cycle_ctr_b <= 0;
end if;
end if;
end process clk_b_process; 
- -- ---------------------
- -- GPIO A AND B PROCESSES --
- -- INPUT: direct mode pll with locked output
- -- ------------------------------------------------------- 
gpio_a_process : process (lkd_pll_clk_a)
begin
if rising_edge(lkd_pll_clk_a) then
if (cycle_ctr_a = 0) then
gpio_sig_0 <= NOT gpio_sig_0;
end if;
end if;
end process gpio_a_process;

gpio_b_process : process (lkd_pll_clk_b)
begin
if rising_edge(lkd_pll_clk_b) then
if (cycle_ctr_b = 0) then
gpio_sig_1 <= NOT gpio_sig_1;
end if;
end if;
end process gpio_b_process;
GPIO_0 <= gpio_sig_0;
GPIO_1 <= gpio_sig_1;

donderdag 2 juni 2016

FPGA Audio Processing with the Cyclone V Dual-Core ARM Cortex-A9

The DE1-SoC FPGA Development board from Terasic is powered by an integrated Altera Cyclone V FPGA and ARM MPCore Cortex-A9 processor. The FPGA and ARM core are connected by a high-speed interconnect fabric. Linux can be booted on the ARM core and the FPGA and ARM core can communicate.

The DE1-SoC board below has been programmed via Quartus Prime running on Fedora 23, 64-bit Linux. The FPGA bitstream was compiled from the Terasic Audio codec design reference. After the bitstream was loaded on to the FPGA over the USB blaster II interface, the NIOS II command shell was used to load the NIOS II software image onto the chip. A menu-driven, debug interface is running from a terminal on the host via the NIOS II shell with the target connected over the USB Blaster II interface.

A low-level hardware abstraction layer was programmed in C to configure the on-board audio codec chip. The NIOS II chip is stored in on-chip memory and a PLL driven, clock signal is fed into the audio chip. The Verilog code for the hardware design was generated from Qsys. The design supports configurable sample rates, mic in, and line in/out.

Additional components are connected to the DE1-SoC board in this photo. The Linear DC934A (LTC2607) DAC is connected to the DE1-SoC and an oscilloscope is connected to the ground and vref pins on the DAC.

The DC934A features an LTC2607 16-Bit Dual DAC with i2c interface and an LTC2422 2-Channel 20-Bit uPower No Latency Delta Sigma ADC.

3.5mm audio cables are connected to the mic in and line out ports, respectively. The DE1-SoC is connected to an external display over VGA so that a local console can be managed via a connected keyboard and mouse when Linux is booted from uSD.

With GPIO pins accessible via the GPIO 0 and 1 breakouts, external LEDs can be pulsed directly from the Hard Processor System (HPS), FPGA, or the FPGA via the HPS.

zondag 8 november 2015

Configuring the Altera Cyclone V FPGA SoC Boot loader on a DE0-Nano-SoC board

Understanding the boot loader on a computer system is probably the most important aspect of security. Most computer systems have multiple boot loaders that run in sequence immediately after a power reset is applied to the processor on the computer system.  This applies to embedded, desktop, and server systems.

The Altera Cyclone V SoC has an FPGA and a Hard Processor System (HPS) woven into a single processor package.  The HPS is a dual core ARM Cortex A9.  Building everything from scratch is the best way to figure out how the system works.

The boot sequence on a Cyclone V HPS works like this:

The On-chip ROM (for which source code is not provided) loads the preloader (1st stage bootloader). The preloader then loads U-boot. U-boot then loads the kernel and root file system.

There are two well thought out options for the preloader according to the Cyclone V boot guide.  The two options are licensed differently depending on how the source code is built. One is licensed under a BSD license and the other under GPL v2 with U-Boot.

Building a pre-loader image for the DE0-Nano-SoC board was straightforward.  Altera provides the bsp-editor utility for customizing the preloader configuration and generating the BSP HPS preloader source code, after which, make is used to build the sources using the Mentor ARM cross toolchain.
The preloader settings directory can be found on the DE0-Nano-SoC CD in the DE0_NANO_SOC_GHRD subdirectory.



The preloader load address can be set via the bsp-editor so that the on chip ROM either loads the preloader from an absolute zero address on the sdcard or from a fat partition with id equal to a2 on the sdcard.  These are the options for booting from the sdcard.


After the sources are generated and the preloader image is built using the Makefile, U-boot must be compiled. An Altera port of U-Boot is available on github for the Cyclone V FPGA SoC. U-Boot is built using the Linaro ARM cross toolchain.

There's quite a bit that can be done with the Cyclone V FPGA SoC boot configuration.  FPGA images can be loaded from U-boot.  The jumpers on the board can be configured to boot from the on-board serial flash (QSPI), bare metal applications can be loaded from the preloader, the FPGA can be configured from serial flash, and the list goes on.  The HPS SoC Boot Guide for the Cyclone V SoC  is a valuable reference and contains all of the boot configuration information.

woensdag 28 oktober 2015

The "Three Fives" Discrete 555 Timer Kit

The NE555 timer IC is a classic and widely used component in electronic circuits, so building a transistor-scale replica of it is a great way to understand how it works at a fundamental level. It's also a good way to develop your soldering skills and learn how to use an oscilloscope to measure signals in a circuit.

I picked up a "Three Fives" Discrete Timer Kit this weekend. As it turns out the kit was well worth the money. The "Three Fives" Discrete Timer Kit is a transistor-scale replica of the NE555 timer IC. The printed circuit board (PCB) is high-quality and soldering the transistors and resistors was alot of fun. Thanks to Eric Schlaepfer and Evil Mad Scientist Labs for this high quality circuit kit.

The size of the board makes it easy to measure what's going on inside the circuit. Just connect the probes from an oscilloscope to any of the solder or test points on the board.

A photo of the board that I built is below. I also wired a sample test circuit for blinking a pink LED and then connected a scope to the board so that I could look at the square wave.

 






donderdag 16 juli 2015

Creating a custom Linux BSP for an ARM Cortex-A9 SBC with Yocto 1.8 - Part III

In part III of this guide, the installation of the final image to the SD card will be covered.  The SD card will then be booted on the target.  Finally, audio recording and playback will be tested.

Part III of this guide consists of the following sections.

  1. Write the GNU/Linux BSP image to an SD card.
  2. Set the physical switches on the RioTboard (Internet of Things) to boot from the uSD or SD card.
  3. Connect the target to the necessary peripherals for boot.
  4. Test audio recording, audio playback, and Internet connectivity.

1.  Write the GNU/Linux BSP image to an SD card

At this point, the build should be complete, without errors.  The output should be as follows.


 real 254m28.335s
user 737m9.307s
sys 133m39.529s

Insert an SD card into an SD card reader, connect it to the host, and execute the following commands on the host.


 host]$ cd $HOME/src/fsl-community-bsp/build /tmp/deploy/images/imx6dl-riotboard
host]$ sudo umount /dev/sd<X>
host]$ sudo dd if=bsec-image-imx6dl-riotboard.sdcard of=/dev/sd<X> bs=1M
host]$ sudo sync


2. Set the physical switches on the RioTboard to boot from the uSD or SD card.


For booting from the SD card on the bottom of the target, set the physical switches as follows.
SD (J6, bottom) 1 0 1 0 0 1 0 1

For booting from the uSD card on the top of the target, set the physical switches as follows.
uSD (J7, top) 1 0 1 0 0 1 1 0

3. Connect the target to the necessary peripherals for boot.

There are two options

Option 1

Connect one end of an ethernet cable to the target. Connect the other end of the ethernet cable to a hub or DHCP server.  

Connect the board to the host computer via the J18 serial UART pins on the target.  This will require a serial to USB breakout cable.  Connect TX, RX, and GND to RX, TX, and GND on the cable. The cable must have an FTDI or similar level shifter chip. Connect the USB end of the cable to the host computer.

Connect the speakers to the light green 3.5 mm audio out jack and the microphone to the pink 3.5 mm MIC In jack.

Connect a 5V / 4 AMP DC power source to the target.

Run minicom on the host computer. Configure minicom at 115200 8N1 with no hardware flow control and no software flow control. If a USB to serial cable with an FTDI chip in it is used, then the cable should show up in /dev as ttyUSB0 in which case, set the serial device in minicom to /dev/ttyUSB0.

If this option was chosen, drop into U-boot after power on by pressing Enter on the host keyboard with minicom open and connected.

If enter is not pressed after power-on, the target will boot and a login prompt will appear.

A login prompt will not appear.

Option 2

Connect one end of an ethernet cable to the target. Connect the other end of the ethernet cable to a hub or DHCP server.  

Connect a USB keyboard, USB mouse, and monitor (via an HDMI cable) to the target.

Connect the speakers to the light green 3.5 mm audio out jack and the microphone to the pink 3.5 mm MIC In jack.

Connect a 5V / 4 AMP DC power source to the target.

A login prompt will now appear.

4. Test audio recording, audio playback, and Internet connectivity


Type root to log in to the target. The root password is not set.

Execute the following commands on the target

 root@imx6dl-riotboard: alsamixer 

Press F6.
Press arrow down so that 0 imx6-riotboard-sgtl5000 is highlighted.
Press Enter.
Increase Headphone level to 79<>79.
Increase PCM level to 75<>75.
Press Tab.
Increase Mic level to 59.
Increase Capture to 80<>80.
Press Esc.

 root@imx6dl-riotboard: cd /usr/share/alsa/sounds
root@imx6dl-riotboard: aplay *.wav

A sound should be played through the speakers.

 root@imx6dl-riotboard: cd /tmp
root@imx6dl-riotboard: arecord -d 10 micintest.wav

Talk into the microphone for ten seconds.

 root@imx6dl-riotboard: aplay micintest.wav

A recording should play through the speakers.

 root@imx6dl-riotboard: ping riotboard.org

An ICMP reply should be received.