Tuesday, August 16, 2016

Profiling Multithreaded / Multiprocess Applications on the DE0-Nano-SoC with ARM® DS-5 Streamline

The ARM® DS-5 Streamline Performance Analyzer tool within ARM® DS-5 Development Studio is an optimal tool for profiling and analyzing the performance of multithreaded / multiprocess applications. Without modifying the kernel on the Terasic DE0-Nano-SoC board, the gator daemon can be compiled using the Linaro 4.8 GCC ARM Hard Float toolchain and then uploaded to the DE0-Nano-Soc board that is running the stock Terasic Yocto build off of the uSD card.

The ARM® DS-5 Streamline Performance Analyzer is a very powerful tool for looking at CPU clock cycles, instruction execution broken down between load and store operations, memory usage, register usage, disk I/O usage - read and write, per process and per thread function call paths broken down by system utilization percentage, per process and per thread stack and heap usage, and many other useful metrics.

To capture some level of meaningful information from the DS-5 Streamline tool, the process_creation project has been modified to insert 1000 packets into the packet processing simulation buffer, and the child processes have been modified to sleep and then wake up for 1000 times in order to simulate process activity.


void *insertpackets(void *arg) {
   
   struct pktbuf *pkbuf;
   struct packet *pkt;
   int idx;

   if(arg != NULL) {
   
      pkbuf = (struct pktbuf *)arg;

      /* seed random number generator */
      ...

      /* insert 1000 packets into the packet buffer */
      for(idx = 0; idx < 1000; idx++) {

         pkt = (struct packet *)malloc(sizeof(struct packet));

         if(pkt != NULL) {

            /* set the packet processing simulation multiplier to 3 */
            pkt->mlt=...()%3;

            /* insert packet in the packet buffer */
            if(pkt_queue(pkbuf,pkt) != 0) {
            
               ...
            ... 
         ...
      ...
   ...
...

int fcnb(time_t secs, long nsecs) {
 
   struct timespec rqtp;
   struct timespec rmtp;
   int ret;
   int idx;

   rqtp.tv_sec = secs;
   rqtp.tv_nsec = nsecs; 

   for(idx = 0; idx < 1000; idx++) {

      ret = nanosleep(&rqtp, &rmtp);

      ...
   ...
...


ARM® DS-5 Streamline - Profiling the process creation application.

ARM® DS-5 Streamline - Code View



Monday, August 15, 2016

Debugging Multithreaded / Multiprocess Applications on the DE0-Nano-SoC with ARM® DS-5


ARM® DS-5 is an ideal platform for debugging multithreaded, multiprocess applications on ARM Powered® development boards that run the GNU/Linux operating system.  The DE0-Nano-SoC is an ideal reference platform for developing multithreaded, multiprocess applications in Linux user space.  Yocto provides an easy to use platform for building a bootable image and ARM® DS-5 easily integrates with the board for efficient debugging.  Altera packages a version of ARM DS-5 for the DE0-Nano-SoC.




The following requirements were in place for this project.

  • Use course-grained locking strategy. Only lock data.
  • Minimize critical sections.
  • Fork five processes, all of which are attached to the controlling terminal.
  • Create three threads in one of the five processes.
  • Two of the threads will simulate packet processing.
  • One of the threads will generate packets in a buffer.
  • Properly utilize synchronization primitives and mutex locks.
  • Maximize concurrency.
  • Minimize latency.
  • Ensure order of context switching is always random upon execution - i.e. don't control the scheduler.
  • Utilize ARM DS-5 for building and debugging the application on the attached de0-Nano-SoC FPGA.
  • Use autotools for building a shared library and link against the library with a driver program in DS-5.
  • Compile the shared library and driver program using the Linaro GCC ARM-Linux-GNUEABI Hard Float toolchain version 4.8 that is included in the Altera DS-5 download.
  • Compile the shared library and test program using the Linaro GCC ARM-Linux-GNUEABI Hard Float toolchain version 5.3 (latest stable from Linaro as of 08/15/16).
  • Debug the multiprocess, multithreaded application using both toolchains from DS-5.
  • Ensure that all possible errors from calls to pthread functions and other libc functions are properly handled.


The code, which meets the above requirements, is available at github.com/bryanhinton/de0-nano-soc.git
Note the DS-5 Settings in the following images.  In order to compile the code from Eclipse, a level of familiarity with DS-5 and Eclipse is required.

DS-5 disassembly / memory analysis - debugging multithreaded, multiprocess applications on ARM Powered® boards

DS-5 Debug Configurations - Files

DS-5 Debug Configurations - Connection

DS-5 Autootols Configure Settings

Synchronized Swimming.  For a description and overview of Parallel Computing, 
see an Introduction to Parallel Computing at computing.llnl.gov

DS-5 Toolchain Editor



Saturday, July 30, 2016

Concurrency, Parallelism, and Barrier Synchronization - Multiprocess and Multithreaded Programming

Concurrency, parallelism, threads, and processes are often misunderstood concepts.

On a preemptive, timed sliced UNIX or Linux operating system (Solaris, AIX, Linux, BSD, OS X), sequences of program code from different software applications are executed over time on a single processor.  A UNIX process is a schedulable entity.   On a UNIX system, program code from one process executes on the processor for a time quantum, after which, program code from another process executes for a time quantum.  The first process relinquishes the processor either voluntarily or involuntarily so that another process can execute its program code. This is known as context switching.  When a process context switch occurs, the state of a process is saved to its process control block and another process resumes execution on the processor.  Finally, A UNIX process is heavyweight because it has its own virtual memory space, file descriptors, register state, scheduling information, memory management information, etc.  When a process context switch occurs, this information has to be saved, and this is a computationally expensive operation.

Concurrency refers to the interleaved execution of schedulable entities on a single processor.  Context switching facilitates interleaved execution.  The execution time quantum is so small that the interleaved execution of independent, schedulable entities, often performing unrelated tasks, gives the appearance that multiple software applications are running in parallel.

Concurrency applies to both threads and processes.  A thread is also a schedulable entitity and is defined as an independent sequence of execution within a UNIX process. UNIX processes often have multiple threads of execution that share the memory space of the process.  When multiple threads of execution are running inside of a process, they are typically performing related tasks.

While threads are typically lighter weight than processes, there have been different implementations of both across UNIX and Linux operating systems over the years.  The three models that typically define the implementations across preemptive, time sliced, multi user UNIX and Linux operating systems are defined as follows: 1:1, 1:N, and M:N where 1:1 refers to the mapping of one user space thread to one kernel thread, 1:N refers to the mapping of multiple user space threads to a single kernel thread, and M:N refers to the mapping of N user space threads to M kernel threads.

In summary, both threads and processes are scheduled for execution on a single processor.  Thread context switching is lighter in weight than process context switching.  Both threads and processes are schedulable entities and concurrency is defined as the interleaved execution over time of schedulable entities on a single processor.

The Linux user space APIs for process and thread management are abstracted from alot of the details but you can set the level of concurrency and directly influence the time quantum so that system throughput is affected by shorter and longer durations of schedulable entity execution time.

Conversely, parallelism refers to the simultaneous execution of multiple schedulable entities over a time quanta.  Both processes and threads can execute in parallel across multiple cores or multiple processors.  On a multiuser system with preemptive time slicing and multiple processor cores, both concurrency and parallelism are often at play.  Affinity scheduling refers to the scheduling of both processes and threads across multiple cores so that their concurrent and often parallel execution is close to optimal.

Software applications are often designed to solve computationally complex problems.  If the algorithm to solve a computationally complex problem can be parallelized, then multiple threads or processes can all run at the same time across multiple cores.  Each process or thread executes by itself and does not contend for resources with other threads or processes that are working on the other parts of the problem to be solved. When each thread or process reaches the point where it can no longer contribute any more work to the solution of the problem, it waits at the barrier.  When all threads or processes reach the barrier, the output of their work is synchronized, and often aggregated by the master process.  Complex test frameworks often implement the barrier synchronization problem when certain types of tests can be run in parallel.

Most individual software applications running on preemptive, time sliced, multiuser Linux and UNIX operating systems are not designed with heavy, parallel thread or parallel, multi-process execution in mind.  Expensive, parallel algorithms often require multiple, dedicated processor cores with hard real time scheduling constrains.  The following paper describes the solution to a popular, parallel algorithm; flight scheduling.

Last, when designing multithreaded and multiprocess software programs, minimizing lock granularity greatly increases concurrency, throughput, and execution efficiency.  Multithreaded and multiprocess programs that do not utilize course-grained synchronization strategies do not run efficiently and often require countless hours of debugging.  The use of semaphores, mutex locks, and other synchronization primitives should be minimized to the maximum extent possible in computer programs that share resources between multiple threads or processes.  Proper program design allows for schedulable entities to run in parallel or concurrently with high throughput and minimum resource contention, and this is optimal for solving computationally complex problems on preemptive, time scliced, multi user operating systems without requiring hard real time scheduling.

After a fairly considerable amount of research in the above areas, I utilized the above design techniques for several successful, multi threaded and multi process software programs.

Monday, June 13, 2016

VHDL Processes for Pulsing Multiple Lasers at Different Frequencies on Altera FPGA


DE1-SoC GPIO Pins connected to 780nm Infrared Laser Diodes, 660nm Red Laser Diodes, and Oscilloscope

The following VHDL processes pulse the GPIO pins at different frequencies on the Altera DE1-SoC using multiple Phase-Locked Loops.  Multiple Infrared Laser Diodes are connected to the GPIO banks and pulsed at a 50% duty cycle with 16mA across 3.3V.  Each GPIO bank on the DE1-SoC has 36 pins. Pin 1 is pulsed at 20hz from GPIO bank 0, and pins 0 and 1 are pulsed at 30hz from GPIO bank 1.  A direct mode PLL with locked output was configured using the Altera Quartus Prime MegaWizard.  The PLL reference clock frequency is set to 50mhz, the output clock frequency is set to 50mhz, and the duty cycle is set to 50%.  The pin mappings for GPIO banks 0 and 1 are documented on the DE1-SoC datasheet.

Pulsed Laser Diodes via GPIO pins on DE1-SoC FPGA

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

- --Copyright (C) 2016. Bryan R. Hinton
- --All rights reserved.
- --
- --Redistribution and use in source and binary forms, with or without
- --modification, are permitted provided that the following conditions
- --are met:
- --1. Redistributions of source code must retain the above copyright
- --   notice, this list of conditions and the following disclaimer.
- --2. Redistributions in binary form must reproduce the above copyright
- --   notice, this list of conditions and the following disclaimer in the
- --   documentation  and/or other materials provided with the distribution.
- --3. Neither the names of the copyright holders nor the names of any
- --   contributors may be used to endorse or promote products derived from this
- --   software without specific prior written permission.
- --
- --THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
- --AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
- --IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
- --ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
- --LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
- --CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
- --SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
- --INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
- --CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
- --ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
- --POSSIBILITY OF SUCH DAMAGE.

- ---------------------
- -- CLOCK A PROCESS --
- ---------------------
- -- INPUT: direct mode pll with locked output 
- -- and reference clock frequency set to 50mhz, 
- -- output clock frequency set to 50mhz with 50% duty 
- -- cycle and output frequency scaled by freq divider constant
- -------------------------------------------------------------
clk_a_process : process (lkd_pll_clk_a)
begin
 if rising_edge(lkd_pll_clk_a) then
  if (cycle_ctr_a < FREQ_A_DIVIDER) then
   cycle_ctr_a <= cycle_ctr_a + 1;
  else
   cycle_ctr_a <= 0;
  end if;
 end if;
end process clk_a_process;
 
- ---------------------
- -- CLOCK B PROCESS --
- ---------------------
- -- INPUT: direct mode pll with locked output 
- -- and reference clock frequency set to 50mhz, 
- -- output clock frequency set to 50mhz with 50% duty 
- -- cycle and output frequency scaled by freq divider constant
- -------------------------------------------------------------
clk_b_process : process (lkd_pll_clk_b)
begin
      if rising_edge(lkd_pll_clk_b) then
  if (cycle_ctr_b < FREQ_B_DIVIDER) then
   cycle_ctr_b <= cycle_ctr_b + 1;
  else
   cycle_ctr_b <= 0;
          end if;
     end if;
end process clk_b_process;
 
- ---------------------
- -- GPIO A PROCESS --
- ---------------------
- -- INPUT: direct mode pll with locked output
- --------------------------------------------------------- 
gpio_a_process : process (lkd_pll_clk_a)
begin
 if rising_edge(lkd_pll_clk_a) then
         if (cycle_ctr_a = 0) then
   -- toggle gpio pin1 from gpio_0
   gpio_sig_0(1) <= NOT gpio_sig_0(1);
          end if;
       end if;
end process gpio_a_process;

- ---------------------
- -- GPIO B PROCESS --
- ---------------------
- -- INPUT: direct mode pll with locked output
- ---------------------------------------------------------
gpio_b_process : process (lkd_pll_clk_b)
begin
 if rising_edge(lkd_pll_clk_b) then
         if (cycle_ctr_b = 0) then
   -- toggle gpio pins 0 and 1 from gpio_1
   gpio_sig_1 <= NOT gpio_sig_1(1 downto 0);
          end if;
       end if;
end process gpio_b_process;
 
GPIO_0 <= gpio_sig_0;
GPIO_1 <= gpio_sig_1;

end gpioarch;
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJXX2UDAAoJEPbndIT4b5KAnYYP/RgR8RGP7JPgC6CIO+gCQxe7
QXrRV7ea9vZCSuF5stCVY1UbEOfSv2jcufUc5Bg12Ddi+d9NLLPJa4/jL+ZRtwet
G+sIGcmxmviBReQCU6zVWOyPBzwoD3EJJdkHf1KtZUmq3pJKNsYefKzqIyzfhJ3t
mOtShH1mCMqxA4RD8wqfGmA1V1U3kOGd3APWnby1MKvbWaDLbNptZLovtweaw3F5
zgNDOMdCwFZpMScVHCW2tiZyoFHnMyhPes7uaBgj3CAQLRgIVKr7jUnU6HIWh4Ag
6be78TT22Zmf32+udQHzKjKcoYpMVatuBX6zY+sJ8jY92PypDi7u0wtHt+G3Hrht
XUW69s3tjR4JWw4qFX+JSYl8b2sEzDEeGAMJeB9r0+mCUH5C3f1cNWp5k1Rsne3z
3djluxQJzzZ+icvYrVz50sQyzqx1TCNJIW7tY3Va5kjF/jmH7ubbGP9YkPy5uoKt
ZNyI971A4KC5haov4PiRA8J7aUG+hNkhadY1YI8AIoP5zlk1im7vahE44SVA777r
ATLj64gzHoVdOSsEqY9ju68XBNvLDWeyN4u4AQD/yiW+9dnD324q3jan9Vx+6jWX
Y6LYmR/6HOPb7yPGw/4W11oDZ3RwfnBCrmuUYzYWC2Y0NuDebrskzTM2tXXU3gyq
qUWJiP6qmL/nsf72nBa1
=nJWe
-----END PGP SIGNATURE-----


DE1-SoC GPIO Bank 0 Pin 1

DE1-SoC GPIO Bank 1 Pins 0 and 1

Thursday, June 2, 2016

FPGA Audio Processing with the Cyclone V Dual-Core ARM Cortex-A9

The DE1-SoC FPGA Development board from Terasic is powered by an integrated Altera Cyclone V FPGA and ARM MPCore Cortex-A9 processor.  The FPGA and ARM core are connected by a high-speed interconnect fabric so you can boot Linux on the ARM core and then talk to the FPGA.
The below configuration was built from the Terasic Design Reference sources.

The DE1-SoC board below has been programmed via Quartus Prime running on Fedora 23, 64-bit Linux.  The FPGA bitstream was compiled from the Terasic Audio codec design reference.  After the bitstream was loaded on to the FPGA over the USB blaster II interface, the NIOS II command shell was used to load the NIOS II software image onto the chip.  A menu-driven, debug interface is running from a terminal on the host via the NIOS II shell with the target connected over the USB Blaster II interface.

A low-level hardware abstraction layer was programmed in C to configure the on-board audio codec chip.  The NIOS II chip is stored in on-chip memory and a PLL driven, clock signal is fed into the audio chip. The Verilog code for the hardware design was generated from Qsys.  The design supports configurable sample rates, mic in, and line in/out.

Additional components are connected to the DE1-SoC board in this photo.  The Linear DC934A (LTC2607) DAC is connected to the DE1-SoC and an oscilloscope is connected to the ground and vref pins on the DAC.

The DC934A features an LTC2607 16-Bit Dual DAC with i2c interface and an LTC2422 2-Channel 20-Bit uPower No Latency Delta Sigma ADC.

3.5mm audio cables are connected to the mic in and line out ports, respectively.  The DE1-SoC is connected to an external display over VGA so that a local console can be managed via a connected keyboard and mouse when Linux is booted from uSD.

With GPIO pins accessible via the GPIO 0 and 1 breakouts, external LEDs can be pulsed directly from the Hard Processor System (HPS), FPGA, or the FPGA via the HPS.

Tuesday, May 31, 2016

Pulse Any GPIO Pin with the ARM System Timer from Bare Metal on the Raspberry Pi 3

650nm 5mW Red Laser Pulsed at 20Hz from Bare Metal on a Raspberry Pi 3
Sony NoIR 8mp Camera Module for Raspberry Pi Model 3 B


The ARM system timer register can be used for creating a clock to pulse the GPIO pins on the Raspberry Pi 3 at a predetermined frequency.

In the below code, a 5v, 650nm 5mW Red Laser diode is pulsed at 20Hz on GPIO pin 22 via bare metal C code on the Raspberry Pi 3.

$ arm-none-eabi-gcc -O2 -mfpu=neon-vfpv4 -mfloat-abi=hard -march=armv8-a -mtune=cortex-a53 -nostartfiles pulse.c -o kernel.elf

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

/* Pulse a GPIO pin using the ARM system timer via Bare Metal on the Raspberry Pi 3 */
 * Copyright (C) Bryan R. Hinton 05-31-2016

 * This program is free software; you can redistribute it and/or
 * modify it under the terms of the GNU General Public License
 * as published by the Free Software Foundation; either version 2
 * of the License, or (at your option) any later version.

 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.

 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
 */

/* for rpi3, base addresses can be calculcated by subtracting 0x3f000000 from the 
/* base address in the ARM Peripherals data sheet for the BCM2835 */
#define GPIO_BASE    0x3f200000  /* GPIO base address */
#define ARM_SYSTIMER_CLO 0x3f003004  /* ARM System Timer Counter Lower 32 bits */
#define GPSET0_OFFSET    0x07        /* GPIO Pin Output Set 0 Offset */
#define GPCLR0_OFFSET    0x0a        /* GPIO Pin Output Clear 0 Offset */
#define MAXPINCNT   3 
#define TIMEOUT   0xC350     /* 0.05 seconds = 20hz*/

int main(void)
{
   /* static const array of usable gpio pins on raspberry pi 3 */
   static const unsigned int gpiopin[MAXPINCNT] =  {22,17,4};

   /* pin index, pin 22 */
   int pinidx = 0; 

   /* lower 32 bits of current system timer value */
   unsigned int cur_tval; 

   /* setup gpio base address */
   volatile unsigned int *gpio_base = (unsigned int *)GPIO_BASE;

   /* setup system timer base address */
   volatile unsigned int *systimer_clo = (unsigned int *)ARM_SYSTIMER_CLO;

   /* set gpio(n) for input by clearing bits (GPIO_PIN%10)*3 GPFSEL(N)
    * where (GPIO_PIN%10)*3 is FSEL(N) or the three configuration bits for GPIO(N)
    * 1. Find the gpio function select(gpfsel) register for the GPIO pin.
    *  Pins 0-9 use gpfsel 0
    *  Pins 10-19 use gpfsel 1
    *  Pins 20-29 use gpfsel 2
    *  Pins 30-39 use gpfsel 3
    *  Pins 40-49 use gpfsel 4
    *  Pins 50-53 use gpfsel 5
    * 2. Set or Clear the function select bits in that register for the designated gpio pin
    *  gpio pin 5 is set for input in gpfsel 0 by clearing bits 15-17  
    *  gpio pin 5 is set for output in gpfsel 0 by setting bits 15-17 to 001  
    *  gpio pin 22 is set for input in gpfsel 2 by clearing bits 6-8  
    *  gpio pin 22 is set for output in gpfsel 2 by setting bits 6-8 to 001 */

   /*  set gpiopin[idx] for input by clearing bits k,k+1,K+2 in gpfsel n.
    *  gpfsel(2) gpiopin 22 bit string mask for gp input 11111111111111111111111000111111 (rval)
    *  gpfsel(0) gpiopin 3 bit string mask for gp input 11111111111111111111000111111111 (rval) */
    *(gpio_base + (*(gpiopin+pinidx)%10)) &= ~(7  < ((*(gpiopin+pinidx)%10)*3));

   /*  set gpiopin[idx] for output by setting bits k,k+1,k+2 to 001 in gpfsel n.
    *  gpfsel(2) gpiopin 22 bit string mask for gp output 00000000000000000000000001000000 (rval)
    *  gpfsel(2) gpiopin 3 bit string mask for gp output 00000000000000000000001000000000 (rval) */
    *(gpio_base + (*(gpiopin+pinidx)%10)) |= (1 << ((*(gpiopin+pinidx)%10)*3));
 
    /* iterate through each usable gpio pin and pulse it */
    /* one single iteration of this loop equals the frequency of the timer or 20hz for this example */
    while(1) {

 /* set gpio pin */
       *(gpio_base + GPSET0_OFFSET) |= (1 << (*(gpiopin+pinidx)));

 for(;;) {
  /* prefetch current lower 32 bit val of read only timer register */
  cur_tval = *(systimer_clo); 
  /* if timer val was just reset, loop, 1/2*period or half cycle mark */
  if((cur_tval & TIMEOUT) == TIMEOUT)
   break;
 }
        /* clear gpio pin */
 *(gpio_base + GPCLR0_OFFSET) &= (1 << (*(gpiopin+pinidx)));

 for(;;) {
  /* prefetch current lower 32 bit val of read only timer register */
  cur_tval = *(systimer_clo); 
  /* if timer val is zero, loop, 1/2*period or half cycle mark */
  if((cur_tval & TIMEOUT) == 0)
   break;
 }
    }
}
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBAgAGBQJXTml/AAoJEPbndIT4b5KAHuoQALeHhpc32TSfKt632Vwn92Zv
cd93o6a9lnQLgBLYUCfW1y23UUgoUGgd7aXm2ssUjt9vwdqj5AZSq7lNcNdOb5nM
PR81A1VVu8QuOj/gI6sMOLotcbI70hFZWqD3x8CNTYQXTz1J8wPae85oY3rEmSxz
a8U43QBpx98RcHZIsc52MiPDzPTiqBSzaPsJMFQqlUDSo9T8G1J2BqLnL35dI1Yw
7jr8vZGNDHcujLFIIt/GY24M71Cyw1J+VqxAguvgzebc+pj/4kKTG8AKXueSed0A
ztE5DOQDOqs+HmNEkF/LU+SASvQz3czl0dOhIbhO+HFVDjlX7Qv7I/iLwQ3hWbef
XROdoP2wWRR8Bs7fFTKtaQ5+32O31xdovwtmUXCwGT4i0cckD/97LND8ntdpiA8n
w6ek19QZvxVIeZ5Mm4H7C/zbLN3n6OEstfxRzHpmgqntOcrCSuKhtx5em6XGfhLT
uJv4Xgo35eh5aCKV2GsOxLE9eymHVMQpV6GQGCIeMCIxawmlRu2k/whpBR/u1F/l
4CyYhXHoCpEP6Lr3YFiZPbYueRySo9uPkz77u/RhmnMamSytnGCiyYr5BhpeZYMu
OwtjdN8vPFHged4hroDAC4RKdGD/gVPKqvlZ9zgCMIvlzLWNgJdev/iVXqOOf1FW
ASSma5+BFeK5VUL1ITjI
=Wxx/
-----END PGP SIGNATURE-----

Monday, May 30, 2016

Bare Metal C with the Raspberry Pi 3 and Sony 8mp NoIR camera

780nm IR Laser Diode pulsed at ~20.2Hz
Sony NoIR 8mp Camera Module for Raspberry Pi 3
Pulsing a laser diode at a frequency over 38 to 40 megahertz from bare metal necessitates an FPGA, microcontroller board with an RTOS running on it, or a dedicated circuit.  Some of the laser diodes that were tested required a simple voltage divider circuit. Others stayed well under 5v and 20ma.

Several different methods of pulsing the GPIO pins were tested; memory mapped IO where the physical ARM peripheral base addresses were mapped into Linux user space via virtual addressing, C and python user space programs that link against user space dynamically linked libraries which in turn read and write the ARM peripheral address space to toggle the GPIO pins, a device driver that disables interrupts and writes to the underlying ARM peripheral address space after being called from a syscall in user space, and bare metal C code that directly writes to the ARM peripheral address space.
780nm IR Laser Diode pulsed at ~20.2Hz
 Sony NoIR 8mp Camera Module

These images were shot using the Sony NoIR 8mp Camera module for the Raspberry Pi 3.

850nm IR Laser Continuous Wave
Sony NoIR 8mp Camera Module