Tuesday, August 16, 2016

Profiling Multithreaded / Multiprocess Applications on the DE0-Nano-SoC with ARM® DS-5 Streamline

The ARM® DS-5 Streamline Performance Analyzer tool within ARM® DS-5 Development Studio is an optimal tool for profiling and analyzing the performance of multithreaded / multiprocess applications. Without modifying the kernel on the Terasic DE0-Nano-SoC board, the gator daemon can be compiled using the Linaro 4.8 GCC ARM Hard Float toolchain and then uploaded to the DE0-Nano-Soc board that is running the stock Terasic Yocto build off of the uSD card.

The ARM® DS-5 Streamline Performance Analyzer is a very powerful tool for looking at CPU clock cycles, instruction execution broken down between load and store operations, memory usage, register usage, disk I/O usage - read and write, per process and per thread function call paths broken down by system utilization percentage, per process and per thread stack and heap usage, and many other useful metrics.

To capture some level of meaningful information from the DS-5 Streamline tool, the process_creation project has been modified to insert 1000 packets into the packet processing simulation buffer, and the child processes have been modified to sleep and then wake up for 1000 times in order to simulate process activity.


void *insertpackets(void *arg) {
   
   struct pktbuf *pkbuf;
   struct packet *pkt;
   int idx;

   if(arg != NULL) {
   
      pkbuf = (struct pktbuf *)arg;

      /* seed random number generator */
      ...

      /* insert 1000 packets into the packet buffer */
      for(idx = 0; idx < 1000; idx++) {

         pkt = (struct packet *)malloc(sizeof(struct packet));

         if(pkt != NULL) {

            /* set the packet processing simulation multiplier to 3 */
            pkt->mlt=...()%3;

            /* insert packet in the packet buffer */
            if(pkt_queue(pkbuf,pkt) != 0) {
            
               ...
            ... 
         ...
      ...
   ...
...

int fcnb(time_t secs, long nsecs) {
 
   struct timespec rqtp;
   struct timespec rmtp;
   int ret;
   int idx;

   rqtp.tv_sec = secs;
   rqtp.tv_nsec = nsecs; 

   for(idx = 0; idx < 1000; idx++) {

      ret = nanosleep(&rqtp, &rmtp);

      ...
   ...
...


ARM® DS-5 Streamline - Profiling the process creation application.

ARM® DS-5 Streamline - Code View



Monday, August 15, 2016

Debugging Multithreaded / Multiprocess Applications on the DE0-Nano-SoC with ARM® DS-5


ARM® DS-5 is an ideal platform for debugging multithreaded, multiprocess applications on ARM Powered® development boards that run the GNU/Linux operating system.  The DE0-Nano-SoC is an ideal reference platform for developing multithreaded, multiprocess applications in Linux user space.  Yocto provides an easy to use platform for building a bootable image and ARM® DS-5 easily integrates with the board for efficient debugging.  Altera packages a version of ARM DS-5 for the DE0-Nano-SoC.




The following requirements were in place for this project.

  • Use course-grained locking strategy. Only lock data.
  • Minimize critical sections.
  • Fork five processes, all of which are attached to the controlling terminal.
  • Create three threads in one of the five processes.
  • Two of the threads will simulate packet processing.
  • One of the threads will generate packets in a buffer.
  • Properly utilize synchronization primitives and mutex locks.
  • Maximize concurrency.
  • Minimize latency.
  • Ensure order of context switching is always random upon execution - i.e. don't control the scheduler.
  • Utilize ARM DS-5 for building and debugging the application on the attached de0-Nano-SoC FPGA.
  • Use autotools for building a shared library and link against the library with a driver program in DS-5.
  • Compile the shared library and driver program using the Linaro GCC ARM-Linux-GNUEABI Hard Float toolchain version 4.8 that is included in the Altera DS-5 download.
  • Compile the shared library and test program using the Linaro GCC ARM-Linux-GNUEABI Hard Float toolchain version 5.3 (latest stable from Linaro as of 08/15/16).
  • Debug the multiprocess, multithreaded application using both toolchains from DS-5.
  • Ensure that all possible errors from calls to pthread functions and other libc functions are properly handled.


The code, which meets the above requirements, is available at github.com/bryanhinton/de0-nano-soc.git
Note the DS-5 Settings in the following images.  In order to compile the code from Eclipse, a level of familiarity with DS-5 and Eclipse is required.

DS-5 disassembly / memory analysis - debugging multithreaded, multiprocess applications on ARM Powered® boards

DS-5 Debug Configurations - Files

DS-5 Debug Configurations - Connection

DS-5 Autootols Configure Settings

Synchronized Swimming.  For a description and overview of Parallel Computing, 
see an Introduction to Parallel Computing at computing.llnl.gov

DS-5 Toolchain Editor