2/24/21

A hardware design for variable output frequency using an n-bit counter

The Terasic DE1-SoC is an ideal board for hardware design and prototyping. The following VHDL code is from a hardware design that I created and wrote for the DE1-SoC.  The ten switches and four buttons on the board are used as an n-bit counter.  The counter is used as an adjustable multiplier to increase the frequency of the signal on the output pins.
 
As the switches are moved and the buttons are pressed, the seven-segment display is updated to reflect the numeric frequency, and the output pin(s) are driven at the desired frequency. The design is positive edge triggered with the clock running at 50MHz.
 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

- --Copyright (C) 2019-2021. Bryan R Hinton
- --All rights reserved.
- --
- --Redistribution and use in source and binary forms, with or without
- --modification, are permitted provided that the following conditions
- --are met:
- --1. Redistributions of source code must retain the above copyright
- --   notice, this list of conditions and the following disclaimer.
- --2. Redistributions in binary form must reproduce the above copyright
- --   notice, this list of conditions and the following disclaimer in the
- --   documentation  and/or other materials provided with the distribution.
- --3. Neither the names of the copyright holders nor the names of any
- --   contributors may be used to endorse or promote products derived from this
- --   software without specific prior written permission.
- --
- --THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- --AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- --IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- --ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
- --LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- --CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- --SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- --INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- --CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- --ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- --POSSIBILITY OF SUCH DAMAGE.
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity gpiocontrol is
port (
   CLOCK_50 : in std_logic;
   SW       : in unsigned(9 downto 0);
   KEY      : in unsigned(3 DOWNTO 0);
   B1_GPIO  : out unsigned(35 downto 0);
   HEX0     : out unsigned(6 downto 0);
   HEX1     : out unsigned(6 downto 0);
   HEX2     : out unsigned(6 downto 0);
   HEX3     : out unsigned(6 downto 0);
   HEX4     : out unsigned(6 downto 0);
   HEX5     : out unsigned(6 downto 0));
end gpiocontrol;

architecture gpiopulse of gpiocontrol is
   constant clk50divider : natural := 25000000;
   signal delay_u        : std_logic_vector(5 downto 0);
   signal scaler_mlt     : natural := 1;
   signal state          : std_logic := '0';
   signal pulse_u        : std_logic := '1';
   signal scaler         : natural range 0 to clk50divider := 0;
   signal count          : natural range 0 to clk50divider := 0;

function compute_prescaler (
   frequency : in natural range 0 to clk50divider)
   return natural is
   variable prescaler : natural := clk50divider;
begin
   if frequency = 0 then
      prescaler := clk50divider;
   else
      prescaler := clk50divider / frequency;
   end if;
return prescaler;
end function compute_prescaler;

begin
scaler <= compute_prescaler((to_integer(unsigned( SW )))*scaler_mlt);
gpiopulse_process : process(CLOCK_50, KEY(0))
begin
   if (KEY(0) = '0') then
      count <= 0;
   elsif rising_edge(CLOCK_50) then
      if (count = scaler - 1) then
         state <= not state;
         count <= 0;
      elsif (count = clk50divider) then -- automatic reset
         count <= 0;
      else
         count <= count + 1;
      end if;
   end if;
end process gpiopulse_process;

display_frequency_process : process(scaler) 
   variable frequency : natural;
   variable dividend  : natural;
   variable divisor   : natural;
   variable quotient  : natural;
   variable remainder : natural;
begin
   frequency := clk50divider / scaler;
   dividend := frequency;
   divisor := 10;
   for i in 0 to 5 loop
      quotient := dividend / divisor;
      remainder := dividend rem divisor;
      dividend := quotient;
      if(i = 0) then
         HEX0 <= get_7_segment_vector(remainder);
      elsif(i = 1) then
         HEX1 <= get_7_segment_vector(remainder);
      elsif(i = 2) then
         HEX2 <= get_7_segment_vector(remainder);
      elsif(i = 3) then
         HEX3 <= get_7_segment_vector(remainder);
      elsif(i = 4) then
         HEX4 <= get_7_segment_vector(remainder);
      elsif(i = 5) then
         HEX5 <= get_7_segment_vector(remainder);
      end if;
   end loop;
end process display_frequency_process;

B1_GPIO(0) <= state;

end gpiopulse;
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEhwq5gb3EVpTtl/3gjYtliwzjQIcFAmCbSyUACgkQjYtliwzj
QIeGTQ//dwek0dgF4qvJU7kBjgWL6N//e4eD2ofZCevCPeUq8npXUvMoareB5fMP
2Fonu4+qFKsE6wtmzK31fjK2XMkccEFqeJlTOIzrB2HhE/BHI1Ftu2aJ+q01ah3F
ge6T+XiKGJTBYSJBASWS7iu54yNIRBnDeDGFhWOYwz6/89YRlTwX64X6Mo27JPNc
H0KFCSKDdin9CCrHc81Aa7BIMXclZqVcpXnAggE5Qdq579TxhwXraJA/83MiJ5B8
FnGi76NSkAni+cPexrrV+r/FrrtWKOoTVvM13ypcYw1HG/Ymey3y7o6je6LG0Kne
vWF17BSphO3M4gG4ykylyiX6eDiB2hNPIEJG4kNONx7+1MXOky5j7Pr81zOY2+Pp
kQLt9L9v9HU8OOVFTvQPD9XfRFg1Z3NQMWRs6Y4c5ff2WOZK067E8QLga5zoAjz9
LerihfWkOkmoiTDd708EaHWHTXvHyYos0cTjVOTHMNFAvuzvDBxDH27TcoNfFLtD
7pNyDhyiqiKWnId63YmZidiSKJ6NyuQ648oBJ090AGdF5VedFtHFUwCC2QufKArC
CPM6NXD6VxT0asH44s3WHE+C9bkkleVRACVPHcTf0vSmG6bAtKOK2SuMUNZFTTVm
DfFaZfJH4jAUWH2nyIDwAZ0a84vrxrv68njfXi2sEAL9dtg3ZWY=
=vt2w
-----END PGP SIGNATURE-----
-----BEGIN PGP PUBLIC KEY BLOCK-----

mDMEXCh9NhYJKwYBBAHaRw8BAQdAkoXaaeE1bNqeJVl+VEhpPQsNWCO2yaJUGebC
l+7PsD+0O0JyeWFuIFIuIEhpbnRvbiAoM3kgRUQyNTUxOSBwdWJrZXkpIDxicnlh
bkBicnlhbmhpbnRvbi5jb20+iJYEExYIAD4WIQTjkbsHy9m0cqYmcmYgVps7FzIR
1gUCXCh9NgIbAwUJBaOagAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRAgVps7
FzIR1kRDAQCZzerLgpspYPKREWNsoqR+QWuitMZIZTMtw56r2D8n9gEApt6hGgfS
H7zmC93WkhVOx8xudDFu2ZOeceDDwiZEUgK4OARcKH02EgorBgEEAZdVAQUBAQdA
Q1cm/FFyLmHImB26l5dagr35MCu1gJoemKA/nS9qbx0DAQgHiH4EGBYIACYWIQTj
kbsHy9m0cqYmcmYgVps7FzIR1gUCXCh9NgIbDAUJBaOagAAKCRAgVps7FzIR1vGg
AP90A5dHB745g7qe0Q64wF1P0q1/2R8KYioebrPwB3n0lwD+NnpOW1V4XBMySa4K
qI269KFj6SSV+7YpfsjP8hktaQM=
=Y2nG
-----END PGP PUBLIC KEY BLOCK-----

4/5/18

a Hardware Design for XOR gates using sequential logic in VHDL

XOR gates are a fundamental component in cryptography. Many of the common symmetric and asymmetric, stream and block ciphers use XOR gates.  A few of these ciphers are AES, RSA, Ed25519, and Twofish.  While many of the compiled and interpreted languages support bitwise operations such as XOR, the software implementation of both block and stream ciphers is computationally inefficient compared to FPGA and ASIC implementations.

Hybrid boards integrate FPGAs with multicore ARM processors over high speed buses.  On hybrid boards, the ARM processor is termed the hard processor system or HPS.  Writing to the FPGA from the ARM processor is typically performed via C from an embedded Linux build (yocto or buildroot) running on the ARM core.  A simple bitstream can also be loaded into the FPGA fabric without using any ARM design blocks or functionality in the ARM core.

The following is a simple hardware design that I wrote in VHDL and simulated in ModelSim.  The HPS is not used. The bitstream is loaded into the FPGA fabric on boot. VHDL components are utilized and a testbench is defined for testing the design.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

- --three input xnor gate
- --Copyright (C) 2018-2021. Bryan R Hinton
- --All rights reserved.
- --
- --Redistribution and use in source and binary forms, with or without
- --modification, are permitted provided that the following conditions
- --are met:
- --1. Redistributions of source code must retain the above copyright
- --   notice, this list of conditions and the following disclaimer.
- --2. Redistributions in binary form must reproduce the above copyright
- --   notice, this list of conditions and the following disclaimer in the
- --   documentation  and/or other materials provided with the distribution.
- --3. Neither the names of the copyright holders nor the names of any
- --   contributors may be used to endorse or promote products derived from this
- --   software without specific prior written permission.
- --
- --THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- --AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- --IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- --ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
- --LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- --CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- --SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- --INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- --CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- --ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- --POSSIBILITY OF SUCH DAMAGE.
library ieee;
use ieee.std_logic_1164.all;

- --three input xnor gate entity declaration - external interface to design entity
entity xnorgate is
port (
   a : in std_logic;
   b : in std_logic;
   c : in std_logic;
   q : out std_logic);
end xnorgate;

architecture xng of xnorgate is
begin
   q <= a xnor b xnor c;
end xng;

- --chain of xor / xnor gates using components and sequential logic
entity xorchain is
port (
   A        : in std_logic;
   B        : in std_logic;
   C        : in std_logic;
   D        : in std_logic;
   E        : in std_logic;
   F        : in std_logic;
   Av       : in std_logic_vector(31 downto 0);
   Bv       : in std_logic_vector(31 downto 0);
   CLOCK_50 : in std_logic;
   Q        : out std_logic;
   Qv       : out std_logic_vector(31 downto 0));
end xorchain;

architecture rtl of xorchain is
component xorgate is
port (
   a  : in std_logic;
   b  : in std_logic;
   q  : out std_logic);
end component;

component xnorgate is
port (
   a  : in std_logic;
   b  : in std_logic;
   c  : in std_logic;
   q  : out std_logic);
end component;

component xorsgate is
port (
   av : in std_logic_vector(31 downto 0);
   bv : in std_logic_vector(31 downto 0);
   qv : out std_logic_vector(31 downto 0));
end component;

signal a_in, b_in, c_in, d_in, e_in, f_in : std_logic;
signal av_in, bv_in : std_logic_vector(31 downto 0); 

signal conn1, conn2, conn3 : std_logic;

begin
   xorgt1  : xorgate port map(a => a_in, b => b_in, q => conn1); 
   xorgt2  : xorgate port map(a => c_in, b => d_in, q => conn2);
   xorgt3  : xorgate port map(a => e_in, b => f_in, q => conn3); 
   xnorgt1 : xnorgate port map(conn1, conn2, conn3, Q);
   xorsgt1 : xorsgate port map(av => av_in, bv => bv_in, qv => Qv);

   process(CLOCK_50)
   begin
      if rising_edge(CLOCK_50) then --assign inputs on rising clock edge
         a_in <= A;
         b_in <= B;
         c_in <= C;
         d_in <= D;
         e_in <= E;
         f_in <= F;
         av_in(31 downto 0) <= Av(31 downto 0);
         bv_in(31 downto 0) <= Bv(31 downto 0);
      end if;
   end process;
end rtl;

entity xorchain_tb is
end xorchain_tb;

architecture xorchain_tb_arch of xorchain_tb is
   signal A_in         : std_logic := '0';
   signal B_in         : std_logic := '0';
   signal C_in         : std_logic := '0';
   signal D_in         : std_logic := '0';
   signal E_in         : std_logic := '0';
   signal F_in         : std_logic := '0';
   signal Av_in        : std_logic_vector(31 downto 0);
   signal Bv_in        : std_logic_vector(31 downto 0);
   signal CLOCK_50_in  : std_logic;
   signal BRK          : boolean := FALSE;
   signal Q_out        : std_logic;
   signal Qv_out       : std_logic_vector(31 downto 0);

component xorchain
port (
   A          : in std_logic;
   B          : in std_logic;
   C          : in std_logic;
   D          : in std_logic;
   E          : in std_logic;
   F          : in std_logic;
   Av         : in std_logic_vector(31 downto 0);
   Bv         : in std_logic_vector(31 downto 0);
   CLOCK_50   : in std_logic;
   Q          : out std_logic;
   Qv         : out std_logic_vector(31 downto 0));
end component;

begin
   xorchain_instance: xorchain port map (A => A_in,B => B_in, C => C_in, 
                                         D => D_in, E => E_in, F => F_in, Av => Av_in,
                                         Bv => Bv_in, CLOCK_50 => CLOCK_50_in, Q => Q_out,
                                         Qv => Qv_out);
clockprocess: process
   begin
      while not BRK loop
         CLOCK_50_in <= '0';
            wait for 20 ns;
            CLOCK_50_in <= '1';
            wait for 20 ns;
      end loop;
      wait;
   end process clockprocess;
  
testprocess : process
   begin
      A_in <= '1';
      B_in <= '0';
      C_in <= '1';
      D_in <= '0';
      E_in <= '1';
      F_in <= '0';
      wait for 40 ns;
      A_in <= '1';
      B_in <= '0';
      C_in <= '0';
      D_in <= '0';
      E_in <= '0';
      F_in <= '1';
      wait for 40 ns;
      A_in <= '1';
      B_in <= '1';
      C_in <= '0';
      D_in <= '1';
      E_in <= '0';
      F_in <= '1';
      wait for 40 ns;
      A_in <= '1';
      B_in <= '0';
      C_in <= '1';
      D_in <= '0';
      E_in <= '1';
      F_in <= '0';
      wait for 20 ns;
      A_in <= '1';
      B_in <= '1';
      C_in <= '1';
      D_in <= '0';
      E_in <= '1';
      F_in <= '0';
      wait for 20 ns;
      A_in <= '0';
      B_in <= '0';
      C_in <= '1';
      D_in <= '0';
      E_in <= '1';
      F_in <= '0';
      wait for 40 ns;
      BRK <= TRUE;
      wait;
   end process testprocess;
end xorchain_tb_arch;

entity xorgate is
port (
   a : in std_logic;
   b : in std_logic;
   q : out std_logic);
end xorgate;

architecture xg of xorgate is
begin
   q <= a xor b;
end xg;

entity xorsgate is
port (
   av : in std_logic_vector(31 downto 0);
   bv : in std_logic_vector(31 downto 0);
   qv : out std_logic_vector(31 downto 0));
end xorsgate;

architecture xsg of xorsgate is
begin
   qv <= av xor bv;
end xsg;
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEhwq5gb3EVpTtl/3gjYtliwzjQIcFAmCV9JYACgkQjYtliwzj
QIetTg/+MIvz2hnkVJ3kCIma67KMJVHvWg+ErGoOZAZ2fqrWuRmiBcJrsxcn0rJp
tZCZtBEmTHIRp1Htqx5vvo/vCp6uokfvI2aTTQIBMxy4od/jIbhDoJN8nAQwrE+h
TVEJoIUhUPc0xqMY7jGiIifVXwXMyZYNkKh2CTZTvvGyqNzQ00CP5vel5U63pKEd
tSAzHHup3Q1fWSwFfvKV5x8BnY/GhYoxZYM+PgOmH3ydApYGkNN8hBBTL+WkzF0u
3/M8ClSSPyA5VFOlD76uvTQSy/QJmpcKgcokKAwO3ELnuosvA0F9BkUhUzI8Sr4+
6FRYbMVsrqyvUY/e4K/HKWUQPLsE0J8kkiRGwuWpa494Fs0oF5lk6tP/NkIMIYR4
OTaILLqR0+AMkyUIbiwUpAf6UJZ+xv+PIwb750yvcBq/PLFQxTgRgXUU356Z8qFF
+msUH/kUEJj2+7V3aCfFMwjIYrjdfVL8Nalmy85pqUusWaaVQxsBHSV3k/UH0ojZ
e09v6PgJ2PGWJ5eyfQDmXM8KSOnVX/S6VSA5wa4oXJM06aDcD0hMPR4ndxHCQlsE
78lzHf6WPL2/inEWX+0a5qBETNqKDM08N8ZGi/baFJiYtEA5ht+D8TcK8ojTaIea
8g6nO55EKOsyHN+YdE3vhZYxpQXZWKMFUOrkrUUxPggXOpgGGn8=
=tbF7
-----END PGP SIGNATURE-----
-----BEGIN PGP PUBLIC KEY BLOCK-----

mDMEXCh9NhYJKwYBBAHaRw8BAQdAkoXaaeE1bNqeJVl+VEhpPQsNWCO2yaJUGebC
l+7PsD+0O0JyeWFuIFIuIEhpbnRvbiAoM3kgRUQyNTUxOSBwdWJrZXkpIDxicnlh
bkBicnlhbmhpbnRvbi5jb20+iJYEExYIAD4WIQTjkbsHy9m0cqYmcmYgVps7FzIR
1gUCXCh9NgIbAwUJBaOagAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRAgVps7
FzIR1kRDAQCZzerLgpspYPKREWNsoqR+QWuitMZIZTMtw56r2D8n9gEApt6hGgfS
H7zmC93WkhVOx8xudDFu2ZOeceDDwiZEUgK4OARcKH02EgorBgEEAZdVAQUBAQdA
Q1cm/FFyLmHImB26l5dagr35MCu1gJoemKA/nS9qbx0DAQgHiH4EGBYIACYWIQTj
kbsHy9m0cqYmcmYgVps7FzIR1gUCXCh9NgIbDAUJBaOagAAKCRAgVps7FzIR1vGg
AP90A5dHB745g7qe0Q64wF1P0q1/2R8KYioebrPwB3n0lwD+NnpOW1V4XBMySa4K
qI269KFj6SSV+7YpfsjP8hktaQM=
=Y2nG
-----END PGP PUBLIC KEY BLOCK-----

3/17/18

Building the ARM hard float and arch64 toolchains for the octa-core ARM Cortex®-A73

The following toolchains will be used to build the bootloaders and operating systems for the octa-core ARM Cortex®-A73.   
$ mkdir ~/toolchain ~/build ~/build/arm-linux-gnueabihf ~/build/aarch64-linux-gnu
$ curl -SL http://releases.linaro.org/components/toolchain/binaries/7.2-2017.11/aarch64-linux-gnu/gcc-linaro-7.2.1-2017.11-linux-manifest.txt -o ~/build/aarch64-linux-gnu/gcc-linaro-7.2.1-2017.11-linux-manifest.txt
$ curl -SL http://releases.linaro.org/components/toolchain/binaries/7.2-2017.11/arm-linux-gnueabihf/gcc-linaro-7.2.1-2017.11-linux-manifest.txt -o ~/build/arm-linux-gnueabihf/gcc-linaro-7.2.1-2017.11-linux-manifest.txt$ cd ~/build/
$ COMMITID=$(grep -oP '(?<=abe_revision=).*' aarch64-linux-gnu/gcc-linaro-7.2.1-2017.11-linux-manifest.txt)
$ git clone https://git.linaro.org/toolchain/abe.git
$ cd abe
$ git checkout $COMMITID
$ sed -i 's/python-devel/python2-devel/g' configure.ac
$ autoreconf
$ cd ../aarch64-linux-gnu
$ sudo cp /usr/share/doc/git/contrib/workdir/git-new-workdir /usr/local/bin
$ sudo chmod +x /usr/local/bin/git-new-workdir
$ ../abe/configure
$ ../abe/abe.sh --manifest gcc-linaro-7.2.1-2017.11-linux-manifest.txt --build all --tarbin --tarsrc --release "2017.11" --list-artifacts
arch64-linux-gnu
$ tar -C ~/toolchain/ -xJvf  snapshots/gcc-linaro-7.2.1-2017.11-x86_64_aarch64-linux-gnu.tar.xz
$ cd ../arm-linux-gnueabihf
$ ../abe/configure
$ ../abe/abe.sh --manifest gcc-linaro-7.2.1-2017.11-linux-manifest.txt --build all --tarbin --tarsrc --release "2017.11" --list-artifacts artifacts.txt
Running and Profiling ARM NN on the Hikey 960

7/30/16

Concurrency, Parallelism, and Barrier Synchronization - Multiprocess and Multithreaded Programming

Concurrency, parallelism, threads, and processes are often misunderstood concepts.

On a preemptive, timed-sliced, UNIX or Linux operating system (Solaris, AIX, Linux, BSD, OS X), sequences of program code from different software applications are executed over time on a single or multiple core processor. The execution of program code on a processor core constitutes what is called a process. If that process is executing from program code that comprises a UNIX operating system and the accompanying executable and linking format, then the process is called a UNIX process.   A UNIX process is a schedulable entity.   On a UNIX system, program code from one process executes on the processor for a time quantum, after which, program code from another process executes for a time quantum.  The first process relinquishes the processor either voluntarily or involuntarily so that another process can execute its program code. This is known as context switching.  When a process context switch occurs, the state of a process is saved to its process control block and another process resumes execution on the processor.  A UNIX process is heavyweight because it has its own virtual memory space, file descriptors, register state, scheduling information, memory management information, etc.  When a process context switch occurs, this information has to be saved, and this is a computationally expensive operation.

Concurrency refers to the interleaved execution of schedulable entities over time.  Context switching facilitates interleaved execution.  On a modern Linux system, the execution time quantum is so small that the interleaved execution of independent, schedulable entities, often performing unrelated tasks, gives the appearance that multiple software applications are running in parallel.  The on/off scheduling of a process in this manner occurs in non-realtime operating systems - windows, osx, linux, freebsd.

Concurrency applies to both threads and processes.  A thread is also a schedulable entitity and is defined as an independent sequence of execution within a UNIX process. UNIX processes often have multiple threads of execution that share the memory space of the process.  When multiple threads of execution are running inside of a process, they are typically performing related tasks.

While threads are typically lighter weight than processes, there have been different implementations of both across UNIX and Linux operating systems over the years.  The three models that typically define the implementations across preemptive, time sliced, multi user UNIX and Linux operating systems are defined as follows: 1:1, 1:N, and M:N where 1:1 refers to the mapping of one user space thread to one kernel thread, 1:N refers to the mapping of multiple user space threads to a single kernel thread, and M:N refers to the mapping of N user space threads to M kernel threads.

In summary, both threads and processes are scheduled for execution.  Thread context switching is lighter in weight than process context switching.  Both threads and processes are schedulable entities and concurrency is defined as the interleaved execution over time of schedulable entities.

The Linux user space APIs for process and thread management are abstracted from lower level details. However, the APIs typically provide calls for setting the level of concurrency in order to influence the time quantum so that system throughput is affected by shorter and longer durations of schedulable entity execution time.

Conversely, parallelism refers to the simultaneous execution of multiple schedulable entities over a time quanta.  Both processes and threads can execute in parallel across multiple cores or multiple processors. On a multiuser system with preemptive time slicing and multiple processor cores, both concurrency and parallelism are often at play.  Affinity scheduling refers to the scheduling of both processes and threads across multiple cores so that their concurrent and often parallel execution is close to optimal.

Software applications are often designed to solve computationally complex problems.  If the algorithm to solve a computationally complex problem can be parallelized, then multiple threads or processes can all run at the same time across multiple cores.  Each process or thread executes by itself and does not contend for resources with other threads or processes that are working on the other parts of the problem to be solved. When each thread or process reaches the point where it can no longer contribute any more work to the solution of the problem, it waits at the barrier.  When all threads or processes reach the barrier, the output of their work is synchronized, and often aggregated by the master process.  Complex test frameworks often implement the barrier synchronization problem when certain types of tests can be run in parallel.

Most individual software applications running on preemptive, time sliced, multiuser Linux and UNIX operating systems are not designed with heavy, parallel thread or parallel, multi-process execution in mind.  Expensive, parallel algorithms often require multiple, dedicated processor cores with hard real time scheduling constrains.  The following paper describes the solution to a popular, parallel algorithm; flight scheduling.

Last, when designing multithreaded and multiprocess software programs, minimizing lock granularity greatly increases concurrency, throughput, and execution efficiency.  Multithreaded and multiprocess programs that do not utilize course-grained synchronization strategies do not run efficiently and often require countless hours of debugging.  The use of semaphores, mutex locks, and other synchronization primitives should be minimized to the maximum extent possible in computer programs that share resources between multiple threads or processes.  Proper program design allows for schedulable entities to run in parallel or concurrently with high throughput and minimum resource contention, and this is optimal for solving computationally complex problems on preemptive, time scliced, multi user operating systems without requiring hard real time scheduling.

After a fairly considerable amount of research in the above areas, I utilized the above design techniques for several successful, multi threaded and multi process software programs.

6/30/16

VHDL Processes for Pulsing Multiple GPIO Pins at Different Frequencies on Altera FPGA

The following VHDL processes pulse the GPIO pins at different frequencies on the Altera DE1-SoC using multiple Phase-Locked Loops (PLL). Several diodes are connected to the GPIO banks and pulsed at a 50% duty cycle with 16mA across 3.3V.  Each GPIO bank on the DE1-SoC has 36 pins. Pin 1 is pulsed at 20hz from GPIO bank 0, and pins 0 and 1 are pulsed at 30hz from GPIO bank 1.  A direct mode PLL with locked output was configured using the Altera Quartus Prime MegaWizard.  The PLL reference clock frequency is set to 50mhz, the output clock frequency is set to 50mhz, and the duty cycle is set to 50%.  The pin mappings for GPIO banks 0 and 1 are documented on the DE1-SoC datasheet.
 
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

- - --Copyright (C) 2016. Bryan R Hinton
- - --All rights reserved.
- - --
- - --Redistribution and use in source and binary forms, with or without
- - --modification, are permitted provided that the following conditions
- - --are met:
- - --1. Redistributions of source code must retain the above copyright
- - --   notice, this list of conditions and the following disclaimer.
- - --2. Redistributions in binary form must reproduce the above copyright
- - --   notice, this list of conditions and the following disclaimer in the
- - --   documentation  and/or other materials provided with the distribution.
- - --3. Neither the names of the copyright holders nor the names of any
- - --   contributors may be used to endorse or promote products derived from this
- - --   software without specific prior written permission.
- - --
- - --THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- - --AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- - --IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- - --ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
- - --LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- - --CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- - --SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- - --INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- - --CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- - --ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- - --POSSIBILITY OF SUCH DAMAGE.

- - -- INPUT: direct mode pll with locked output 
- - -- and reference clock frequency set to 50mhz, 
- - -- output clock frequency set to 50mhz with 50% duty 
- - -- cycle and output frequency scaled by freq divider constant
clk_a_process : process (lkd_pll_clk_a)
begin
   if rising_edge(lkd_pll_clk_a) then
      if (cycle_ctr_a < FREQ_A_DIVIDER) then
         cycle_ctr_a <= cycle_ctr_a + 1;
      else
         cycle_ctr_a <= 0;
      end if;
    end if;
end process clk_a_process;
 
- - -- INPUT: direct mode pll with locked output 
- - -- and reference clock frequency set to 50mhz, 
- - -- output clock frequency set to 50mhz with 50% duty 
- - -- cycle and output frequency scaled by freq divider constant
clk_b_process : process (lkd_pll_clk_b)
begin
   if rising_edge(lkd_pll_clk_b) then
      if (cycle_ctr_b < FREQ_B_DIVIDER) then
         cycle_ctr_b <= cycle_ctr_b + 1;
      else
         cycle_ctr_b <= 0;
      end if;
   end if;
end process clk_b_process;
 
- - -- INPUT: direct mode pll with locked output
gpio_a_process : process (lkd_pll_clk_a)
begin
   if rising_edge(lkd_pll_clk_a) then
      if (cycle_ctr_a = 0) then
         gpio_sig_0(1) <= NOT gpio_sig_0(1);
      end if;
   end if;
end process gpio_a_process;

- - -- INPUT: direct mode pll with locked output
- - ---------------------------------------------------------
gpio_b_process : process (lkd_pll_clk_b)
begin
   if rising_edge(lkd_pll_clk_b) then
      if (cycle_ctr_b = 0) then
         gpio_sig_1 <= NOT gpio_sig_1(1 downto 0);
      end if;
   end if;
end process gpio_b_process;
 
GPIO_0 <= gpio_sig_0;
GPIO_1 <= gpio_sig_1;

end gpioarch;
-----BEGIN PGP SIGNATURE-----

iQIzBAEBCgAdFiEEhwq5gb3EVpTtl/3gjYtliwzjQIcFAmCV8x4ACgkQjYtliwzj
QIfWhQ/+LapzoBwbtVr8iaFNoCiVoghfNaaBkCliEJvTOv3f3NkLjp+IL1H09l0b
TI2aI0hOrRuIx3smKOiTrUM+rH7JxyxvwKtu0+fJJc7KjgGJ1Beu9NsWXU28ugyn
uK22NEAie/VvnRlYapVh81An4wRjIqj7RLM9usqmVVFrMlo7v77VooWurmHki3Co
Tulw2Ceqe2/HJrdsJ++XwJtMVeQzysr/mnFGI6ab9AsVxoUzeOq4X6110wWXKPFV
zTUg6Pco9fW1THTIaEb41yjEHIYDsV8XVa3RWKYmu3a9NKIZXB2mI6m9J1ZP/tPD
I0QUWgom1JuNpZ8O0QoJSvAcaMoaIbKX2jdlIDuOJ1BRzyMmyjTCFZ99qwqG8FqY
mn6HUnJjVR9IqF9UX4DK/h9hoXh0GCWs0N9hqRyNPCFWVaia+/v62ET++mwWCQel
cAQlJSlrhsoQ3nWh7i1XdWTZ+yv01IPmjqfa3refrNYtU6jVrhjHzbxOL8FpWgbG
ECOPFd49T8mvUFURrp/rF7rS7jS6JdUvEVaPhY0CYnadK4CohuvVSDAjjHk/mn10
og8RVCmfNhzhrgq2y2lJzwAd4bN9ulApSUUwKQmkX0mDxCl8t4T7KViWo7FP/bxq
EVsGVDDvd2Dhfw5+l9cH1x+Nsaixt0LgxpSIaUapb2zPTFvyU54=
=BUq1
-----END PGP SIGNATURE-----
-----BEGIN PGP PUBLIC KEY BLOCK-----

mDMEXCh9NhYJKwYBBAHaRw8BAQdAkoXaaeE1bNqeJVl+VEhpPQsNWCO2yaJUGebC
l+7PsD+0O0JyeWFuIFIuIEhpbnRvbiAoM3kgRUQyNTUxOSBwdWJrZXkpIDxicnlh
bkBicnlhbmhpbnRvbi5jb20+iJYEExYIAD4WIQTjkbsHy9m0cqYmcmYgVps7FzIR
1gUCXCh9NgIbAwUJBaOagAULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAKCRAgVps7
FzIR1kRDAQCZzerLgpspYPKREWNsoqR+QWuitMZIZTMtw56r2D8n9gEApt6hGgfS
H7zmC93WkhVOx8xudDFu2ZOeceDDwiZEUgK4OARcKH02EgorBgEEAZdVAQUBAQdA
Q1cm/FFyLmHImB26l5dagr35MCu1gJoemKA/nS9qbx0DAQgHiH4EGBYIACYWIQTj
kbsHy9m0cqYmcmYgVps7FzIR1gUCXCh9NgIbDAUJBaOagAAKCRAgVps7FzIR1vGg
AP90A5dHB745g7qe0Q64wF1P0q1/2R8KYioebrPwB3n0lwD+NnpOW1V4XBMySa4K
qI269KFj6SSV+7YpfsjP8hktaQM=
=Y2nG
-----END PGP PUBLIC KEY BLOCK-----

6/2/16

FPGA Audio Processing with the Cyclone V Dual-Core ARM Cortex-A9

The DE1-SoC FPGA Development board from Terasic is powered by an integrated Altera Cyclone V FPGA and ARM MPCore Cortex-A9 processor.  The FPGA and ARM core are connected by a high-speed interconnect fabric so you can boot Linux on the ARM core and then talk to the FPGA.

The DE1-SoC board below has been programmed via Quartus Prime running on Fedora 23, 64-bit Linux.  The FPGA bitstream was compiled from the Terasic Audio codec design reference.  After the bitstream was loaded on to the FPGA over the USB blaster II interface, the NIOS II command shell was used to load the NIOS II software image onto the chip.  A menu-driven, debug interface is running from a terminal on the host via the NIOS II shell with the target connected over the USB Blaster II interface.

A low-level hardware abstraction layer was programmed in C to configure the on-board audio codec chip.  The NIOS II chip is stored in on-chip memory and a PLL driven, clock signal is fed into the audio chip. The Verilog code for the hardware design was generated from Qsys.  The design supports configurable sample rates, mic in, and line in/out.

Additional components are connected to the DE1-SoC board in this photo.  The Linear DC934A (LTC2607) DAC is connected to the DE1-SoC and an oscilloscope is connected to the ground and vref pins on the DAC.

The DC934A features an LTC2607 16-Bit Dual DAC with i2c interface and an LTC2422 2-Channel 20-Bit uPower No Latency Delta Sigma ADC.

3.5mm audio cables are connected to the mic in and line out ports, respectively.  The DE1-SoC is connected to an external display over VGA so that a local console can be managed via a connected keyboard and mouse when Linux is booted from uSD.

With GPIO pins accessible via the GPIO 0 and 1 breakouts, external LEDs can be pulsed directly from the Hard Processor System (HPS), FPGA, or the FPGA via the HPS.


4/1/16

Elliptic-curve cryptography and curve selection

Curve selection at the library or implementation level can be time consuming.  While the type of application may have an impact on curve selection, the security of the curve should undoubtedly take priority. The following site is a valuable resource for selecting a safe curve.

Daniel J. Bernstein and Tanja Lange. SafeCurves: choosing safe curves for elliptic-curve cryptography. https://safecurves.cr.yp.to, accessed Sat May  8 02:29:40 UTC 2021.