RTC Magazine by RTC Media

Strategies Develop to Extend the Life of VME Manage and Guard the Internet of Things CUDA Unleashes the Power of NVIDIA CPUs The Magazine of Record for the Embedded Computer Industry

10 16 30

Vol 16 / No 2 / FEB 2015

Microcontrollers Morph to Take on Bigger Jobs

CORTEX

M7 An RTC Group Publication

CONTENTS

The Magazine of Record for the Embedded Computing Industry

EDITORS REPORT

EXTENDING THE LIFE OF VME

Extending VME Life to 2020 and Beyond by Tom Williams, Editor-in-Chief

TECHNOLOGY CORE

THE MICROCONTROLLER:TAKING ON BIGGER JOBS

CORTEX

Easing development for the next generation connected embedded intelligence by Joseph Yiu, ARM

12 The Microcontroller: Taking on Bigger Jobs DEPARTMENTS 06

EDITORIAL

INDUSTRY INSIDER

TECHNOLOGY CONNECTED MANAGING THE INTERNET OF THINGS

by Amir Friedman, ConnectOne

Beyond the Secure RTOS-Protecting Wirelessly Connected Endpoints from Cyber-attacks by Alan Grau, Icon Labs

The IoT is Embedded . . . Plus a Bit More

Latest Developments in the Embedded Marketplace

Building Blocks for the Internet of Things

TECHNOLOGY IN SYSTEMS STRATEGIES FOR WEARABLE DEVICES

PRODUCTS & TECHNOLOGY Newest Embedded Technology Used by Industry Leaders

Putting the Mobile into Wearable Devices by Joy Wrigley, Lattice Semiconductor

TECHNOLOGY DEVELOPMENT CUDA

CUDA: An Introduction by Dustin Franklin, GE Intelligent Platforms

TECHNOLOGY WATCH SMALL BOARD PROTOTYPING

Beyond the Secure RTOS-Protecting Wirelessly Connected Endpoints from Cyber-attacks

Creating an Application-Specific Operating System to Power an Arduino Robot by Igor Serikov and Jacob Harel, Zeidman Technologies

RTC Magazine FEBRUARY 2015 | 3

RTC MAGAZINE

PUBLISHER President John Reardon, johnr@rtcgroup.com Vice President Aaron Foellmi, aaronf@rtcgroup.com

congatec’s NEW Intel®-based Thin Mini-ITX Motherboards

EDITORIAL Editor-In-Chief Tom Williams, tomw@rtcgroup.com Senior Editor Clarence Peckham, clarencep@rtcgroup.com

conga-IC87 Intel® Core™ conga-IA3 Intel® Atom™SoC

Contributing Editors Colin McCracken and Paul Rosenfeld

ART/PRODUCTION Art Director Jim Bell, jimb@rtcgroup.com Graphic Designer Hugo Ricardo, hugor@rtcgroup.com

www.congatec.us

ADVERTISING/WEB ADVERTISING Western Regional Sales Manager Mike Duran, michaeld@rtcgroup.com (949) 226-2024

6262 Ferris Square | San Diego | CA 92121 USA 1 858 457 2600 | sales-us@congatec.com

Western Regional Sales Manager Mark Dunaway, markd@rtcgroup.com (949) 226-2023

2U EXTREME POWER + COOLING

Eastern U.S. and EMEA Sales Manager Ruby Brower, rubyb@rtcgroup.com (949) 226-2004

BILLING Vice President of Finance Cindy Muir, cmuir@rtcgroup.com (949) 226-2021

TO CONTACT RTC MAGAZINE: Home Office

XIOS 2U has: • • • •

Ten slots (PCIeGen2 x8) in a 2U chassis 45W per slot with high-volume cooling 1-2 Xeon processors 1-4 removable disks Contact us at: info@edt.com www.edt.com

Editorial Office Tom Williams, Editor-in-Chief 1669 Nelson Road, No. 2, Scotts Valley, CA 95066 Phone: (831) 335-1509 tomw@rtcgroup.com Published by The RTC Group Copyright 2015, The RTC Group. Printed in the United States. All rights reserved. All related graphics are trademarks of The RTC Group. All other brand and product names are the property of their holders.

4 | RTC Magazine FEBRUARY 2015

The RTC Group, 905 Calle Amanecer, Suite 150, San Clemente, CA 92673 Phone: (949) 226-2000 Fax: (949) 226-2050 Web: www.rtcgroup.com

TwinPro™ TwinPro ™ Up to 4 NVMe • Four or Two Hot-Pluggable Nodes in 2U Supports up to: 36 Cores and 145W TDP Intel® Xeon® processor E5-2600 v3 Product Family 1TB DDR4-2133MHz in 16 DIMMs • SAS 3.0 (12Gbps) • 12 HDDs/1U • 10GBase-T/ FDR IB NVMe • Redundant Titanium/Platinum Level Digital Power Supplies Node 1

Node 2

SYS-2028TP-DNC/(T)(F)R Up to 4x NVMe + 8x SAS 3.0 drives (per node)

Oil & Gas Exploration

Node 1

Simulation

(Rear View)

TwinPro™

Node 2

SYS-6028TP-DNC/(T)(F)R Up to 4x NVMe + 2x SAS 3.0 drives (per node), or Up to 6x SAS 3.0 support (per node)

GPGPU Application

Node 1

Engineering

(Rear View)

Data Center

Medical

Node 3

Node 4

SYS-2028TP-H Series 4 DP Nodes in 2U, 24x 2.5” SAS/SATA drives

Cloud Computing

Search Engine

Node 2

TwinPro2™

Node 1

Node 2

Node 3

Node 4

SYS-6028TP-H Series 4 DP Nodes in 2U, 12x 3.5” SAS3/SATA3 drives

Four or Two hot-pluggable Systems (Nodes) in a 2U form factor. Each Node supports up to: • • • • • • • • • • • •

1TB DDR4-2133MHz memory in 24 DIMM slots 1 PCI-E 3.0 x16, 1 PCI-E 3.0 x16 “0” slot; and 1 PCI-E 3.0 x8 slot (TwinPro™) 4 ports NVMe 8 SAS 3.0 (12Gbps) ports with LSI® 3108/3008 controller, with optional SuperCap (CacheVault) 8 SATA 3.0 (6Gbps) ports with Intel® C612 controller 12 (TwinPro) or 6 (TwinPro²) hot-swap 2.5” HDD drives per Node FDR (56Gbps) InfiniBand, Dual 10GBase-T or Dual Gigabit Ethernet LAN options Redundant Titanium (96%+) / Platinum (95%+) Level Digital power supplies Integrated IPMI 2.0 plus KVM with dedicated LAN GPU/Xeon Phi™ option SATA-DOM and mSATA support Up to 36 Cores per system and 145W TDP dual Intel® Xeon® Processor E5-2600 v3 product family

www.supermicro.com/TwinPro

© Super Micro Computer, Inc. Specifications subject to change without notice. Intel, the Intel logo, Xeon, and Xeon Inside are trademarks or registered trademarks of Intel Corporation in the U.S. and/or other countries. All other brands and names are the property of their respective owners.

EDITORIAL

The IoT is Embedded . . . Plus a Bit More by Tom Williams, Editor-In-Chief

It seems that most major technology trends or developments go through a number of fairly predictable stages before they reach stability and general use and acceptance. Those of us who remember the emergence of switched fabric interconnects can probably recall the time when there was mostly a lot of discussion, hype, innumerable PowerPoint presentations and a wide variety of actual specifications and implementations. I seem to recall that at one point there were over 100 proposed switched fabric proposals. This is the stage we’ll call TTUAW— Throwing Things Up Against the Wall. After a number of years and a number of fortunes made and lost we have the reasonable choice of fabric interconnects we know today such as PCI Express, RapidIO, InfiniBand and several others. They are well established, defined as standard specifications and appropriate for different application areas. The era of hype has subsided, the specifications are both stable and more advanced versions are being developed and designers are mostly comfortable with the options they offer. Which brings us to the current situation around the Internet of Things. From this perspective, we seem to be in the late stages of the era of hype and moving into TTUAW. We are being told, with no particular indication of the actual origin of the numbers, that by 2030 or so there will be between 50 billion and one trillion Internet-connected devices all feeding Big Data up to the Cloud to make our lives better. On the questions of security and privacy things are a bit less clear. But the new world is coming. Let the masses rejoice!

6 | RTC Magazine FEBRUARY 2015

It is this period of throwing things up against the wall to see what will stick that will help determine what the ultimate world of ioT will look like. Right now we have all sorts of ideas and implementations and of course, fortunes will be made and lost once more. Ideas are both practical and ridiculous ranging from industrial control, agricultural monitoring and tending to connected GPS dog collars. Now that the IoT has most definitely spread from the industrial arena into the realm of consumer products, we can expect to see quite a variety of proposed products and services. This is all good. It is a normal process of technological development. But what is really the nature of this technology? Well at its base it is the embedded technology we are all familiar with. Of course, processors and other silicon components have gotten more powerful and more integrated; they consume less power and their systems are more connected. But these are developments that have been going on for some time and will naturally continue. There have been major advances in sensor technology making the more powerful, diverse, compact and lower cost. In addition, it depends on the existence of the Internet infrastructure, which continues to grow and proliferate. These, of course, represent advances in embedded systems but do they necessarily lead to what we are now calling the Internet of Things? I think not inevitably. What really makes the Internet of Things and all the ideas that are getting thrown up against the wall is the existing (and, yes, developing) technology combined with ideas that make business sense. There ae possibly better ways to

determine this than simply getting a hot idea, cobbling together an embedded device with CPU, sensors, software and Wi-Fi and trying to convince the world to take a look. However, it is one method and there seems to be quite a bit of it going on. This is especially true now that embedded systems have definitely broken out of the industrial/military realm and entered the consumer market with a vengeance. Now every toaster, dish washer, automobile and dog collar that has intelligence anyway is a “thing” that can be connected to the IoT. The “bit” more in the title above involves the creation of attractive services that go with these “things” and actually give them value to a customer. That requires the connectivity and the Cloudbased analysis and software to add value and make them useful. But above all, it involves a plan, a strategy to determine the design of the device, the “thing” that will be developed by the embedded computer industry. In that end we already have the knowledge, the technology and the resources to build just about any kind of device that a business plan calls for and hook it up to the Web. Even the software to collect, format and transmit the data is already available. Beyond that, on the end were the actual user application lives, that is where real insight and creativity is required. So we can issue a call to entrepreneurs: “We can build almost anything you want, whether carefuly analyzed or planned or as an experiment.” As designers who want to make a living at this, though, we’d all prefer to make things with a future. And I know that will happen. It may, however, yet take a few things not sticking to that wall.

INDUSTRY INSIDER

Mouser Launches New Motor Control Application Site Mouser Electronics has announced the introduction of their new Motor Control Applications site. Mouser’s new applications site provides developers with the resources they need to learn about the latest advances in motor control, and the newest components available from Mouser Electronics for building motor control systems. The new Motor Control Applications site, available on Mouser.com, contains valuable resources for developers interested in expanding their knowledge about motor control systems. The Applications section segments motor control into five main subsections: Permanent Magnet Synchronous motors, Brushless DC motors, Stepper motors, AC Induction motors, and Low Voltage DC motors. These subsections describe the use and operation of each motor. Functional block diagrams are provided with explanations of each block, along with a parts list of products available for same-day shipping. The Articles section discusses topics such as Introduction to Rotary Resolvers & Encoders and Passive Components for Advanced Motor Control. All articles offer an area to post comments and questions to facilitate further discussions on the topic. The Featured Products section focuses on key products available from Mouser.com that speed and enhance the construction of motor control systems. Products include the Vishay Widebody VOW3120 2.5A IGBT and MOSFET Driver, Molex Sealed Industrial USB Solutions, and the Fairchild FAN9673 CCM PFC Controller. Additional products for motor control systems are listed including products for EMI suppression, circuit protection, passives, sensors, and motor control development kits. Finally, the Resources section lists videos, application notes, and white papers that discuss device selection and system considerations when designing motor control systems. Systems discussed include selecting motor drivers, implementing control feedback loops, Power Factor Correction (PFC) techniques, and designing for thermal management.

CreatiVasc Medical and Sealevel Systems Partner on Technology to Improve Dialysis Care Two companies are working together to create a medical product with the potential to become the standard of care for dialysis grafts. CreatiVasc Medical started full-time operations in 2007 to create intuitive medical devices that bring novel solutions to the growing population that suffer from End Stage Renal Disease (ESRD). The company’s latest innovation, the Hemoaccess Valve System, allows blood flow in a dialysis graft to be selectively turned on and off. The technology is designed to improve the lives of patients and reduce the need for certain repetitive surgeries, potentially saving billions of dollars in Medicare costs. CreatiVasc has partnered with Sealevel Systems to design and manufacture the electronics required to operate and monitor the system. Sealevel Systems designs and manufactures industrial computing solutions in addition to a variety of communications and I/O products used in medical, process control, military, and other mission critical applications. The Hemoaccess Valve System was originally developed by Greenville vascular surgeon David Cull, CreatiVasc founder. The system is one of three nationwide selected for the U.S. Food and Drug Administration’s (FDA) Innovation Pathway, designed to accelerate the process of approval without compromising patient safety.

Greenvity and Mitsumi Collaborate on IoT Solution for Low-Voltage Lighting Greenvity Communications and Mitsumi Electric have expanded their collaborative relationship to include development and manufacturing of a complete Internet of Things (IoT) turnkey solution with cloud and mobile apps for low voltage LED lights in outdoor applications. LED lighting OEMs and ODMs can transform conventional low-voltage outdoor lights into energy-efficient smart lights with controllability for turning on/off, dimming and color temperature mixing to enhance the aesthetic of the environment in all weather conditions. This innovative smart LED solution is feature-rich with capability to control lights from anywhere in the world using mobile apps (Android and iOS) along with zone grouping and scheduling. The intelligence and communication of the IoT turnkey solution is powered by Greenvity’s GV7013 system-onchip (SoC) with powerline communications (PLC) and an embedded 8051 microcontroller for lighting control. The GV-LED-Mini-DP kit is now available from Greenvity and customers can use Greenvity’s cloud and mobile apps without license fee. The IoT turnkey solution for low voltage LED lights consists of Greenvity’s IoT Hub board (GV-Controller-S), Smart LED Controller module for low voltage LED (GV-LEDMini), IoT cloud and mobile apps (Android and Apple’s iOS). The GV-LED-Mini is a small form-factor module that can be integrated into various types of low voltage LED lights from decorative lights, parking lot lights, and landscape & deck lights. The solution can support from 12 volts up to 30 volts for low voltage LED lighting. The IoT Hub is equipped with ARM9 microprocessor and WiFi with application software that enables cloud and mobile app connection. RTC Magazine FEBRUARY 2015 | 7

INDUSTRY INSIDER

Cooperation between German and Chinese Test Specialists Asia remains the main focus of Goepel electronics global expansion of innovative JTAG/Boundary Scan test solutions. With the high-tech enterprise Watertek Information Technology Co. Ltd, Goepel electronics welcomes another Asian member in his GATE partner network. The aim of this cooperation is the special application development and practical implementation of the latest JTAG/Boundary Scan solutions. Watertek Information Technology Co. Ltd. was founded in 1997 and is headquartered in Beijing, with offices in different locations across China. The company is specialized in

PikeOS Powers Communication Platform from X-ES

Sysgo is announcing the support of its precertified hypervisor PikeOS for the XPedite6101 Single Board Computer (SBC) from Extreme Engineering Solutions (X-ES). The XPedite6101 is based on a Freescale QorIQ architecture with T1042 processor, designed for high-performance networks. PikeOS adds key features like real-time, as well as safety and security, to a cost-effective and energy-efficient platform for critical networks and telecommunication applications in the Internet of Things (IoT). The XPedite6101 provides a compact and costeffective rugged computing solution with excellent processing performance-per-watt. It supports multiple processor configurations and up to 8 GB of DDR3 ECC SDRAM and also supports a number of high-performance I/O options. By supporting DPAA (Data Path Acceleration Architecture), PikeOS enables the developer to reach network performance goals while distributing the networkprocessing workload to different cores of the QorIQ T1042 architecture, which has four e5500 cores running at up to 1.4 GHz. PikeOS is a hypervisor intended for embedded systems with safety and security requirements. With real-time, virtualization and partitioning, it provides all the features needed to build today’s multi-functional and highly-integrated devices. The PikeOS architecture creates a foundation for critical systems allowing official approval by the authorities regarding safety and security standards. PikeOS is the only European software platform for smart devices in the Internet-of-Things. 8 | RTC Magazine FEBRUARY 2015

embedded system and testing services for various sectors such as transport, energy and military & defense. The aim of the global alliance program, GATE, is the application-specific transfer of Goepel electronic’s JTAG/Boundary Scan product portfolio into a variety of specific custom test instruments through close cooperation. Thereby, ESA technologies (Embedded System Access) are intended to be offered even more easily to the user. The GATE program is available in three different levels, ranging from the Associated Member followed by Selected Members and to the highest level – Center of Expertise (CoE), cooperating with Goepel electronic as a strategic development partner, such as Watertek Information & Technology.

Dedicated Computing Joins the Intel Cluster Ready Program for Bioinformatics Research Dedicated Computing, a global technology company, announced its participation in the Intel Cluster Ready program to deliver integrated high-performance computing (HPC) cluster solutions to the Life Sciences market. Powered by Intel Xeon processors, Dedicated Computing is offering a family of purpose-built analytics and storage clusters for use in bioinformatics research. Dedicated Computing’s HPC cluster, built upon Intel Cluster Ready architecture, will provide a fully integrated and optimized cluster. From entry level, compact clusters, featuring four multi-core homogeneous nodes, to those with hundreds of heterogeneous computing nodes, and combined with 20 Terabytes to multiple Petabytes of object store, these scalable solutions will be able to support the massive amounts of data that needs to be analyzed, interpreted and stored. These integrated cluster solutions contain securely maintained open source software stacks and cluster management that are designed to accelerate time-to-market, minimize costly validation and verification and enable 24x7x365 system monitoring for increased uptime.

Green Hills and NVIDIA Bring Advanced Graphics and Computer Vision Platform to Automotive Green Hills has announced that it is integrating its automotive safety and security products and services with high-performance automotivegrade NVIDIA Tegra mobile processors and NVIDIA’s vast library of 3D graphics and computer vision software to the Green Hills Platforms for Automotive. This combination addresses the growing demand for superior single system performance combined with proven safety and security. These are requirements when building nextgeneration automotive electronic systems for Advanced Driver Assist Systems (ADAS), 3D digital instrument clusters and center console consolidation. By leveraging the Integrity RTOS as the dominant in-vehicle real-time kernel for instrument cluster and ADAS systems, this next-generation automotive platform will enable automotive OEMs and Tier 1s to reduce design/ integration complexity and time-tomarket while delivering unmatched advancements in the next generations of in-vehicle electronics. The NVIDIA Tegra processor is a mobile superchip that incorporates a high performance multicore ARM Cortex CPU complex, the industry’s most powerful GPU technology, and dedicated audio, video, and image processors. The size of a thumbnail, this highly energy-efficient processor enables vibrant 3D graphics, smooth video playback, and advanced audio processing, while placing fewer demands on vehicle electrical systems. In addition, Tegra is able to deliver an unprecedented amount of computing power to drive computer vision and deep learning systems.

Got Tough Software Radio Design Challenges?

Unleash The New Virtex-7 Onyx Boards! Pentekâ&#x20AC;&#x2122;s OnyxÂŽ Virtex-7 FPGA boards deliver unprecedented levels of performance in wideband communications, SIGINT, radar and beamforming. These high-speed, multichannel modules include: Â&#x2021; Â&#x2021; Â&#x2021; Â&#x2021; Â&#x2021; Â&#x2021; Â&#x2021; Â&#x2021; Â&#x2021; Â&#x2021; Â&#x2021;

A/D sampling rates from 10 MHz to 3.6 GHz D/A sampling rates up to 1.25 GHz Multi-bandwidth DUCs & DDCs Gen3 PCIe with peak speeds to 8 GB/sec 4 GB SDRAM for capture & delay Intelligent chaining DMA engines Multichannel, multiboard synchronization ÂŽ ReadyFlow Board Support Libraries ÂŽ GateFlow FPGA Design Kit & Installed IP ÂŽ GateXpress FPGA - PCIe configuration manager OpenVPX, AMC, XMC, PCIe, cPCI, rugged, conduction cooled Â&#x2021; Pre-configured development system for PCIe Â&#x2021; Complete documentation & lifetime support

With more than twice the resources of previous Virtex generations plus advanced power reduction techniques, the Virtex-7 family delivers the industryâ&#x20AC;&#x2122;s most advanced FPGA technology. Call 201-818-5900 or go to www.pentek.com/go/rtconyx for your FREE online Putting FPGAs to Work in Software Radio Handbook and Onyx product catalog.

3HQWHN ,QF 2QH 3DUN :D\ 8SSHU 6DGGOH 5LYHU 1- Â&#x2021; 3KRQH Â&#x2021; )D[ Â&#x2021; H PDLO LQIR#SHQWHN FRP Â&#x2021; ZZZ SHQWHN FRP :RUOGZLGH 'LVWULEXWLRQ 6XSSRUW &RS\ULJKW Â&#x2039; 3HQWHN ,QF 3HQWHN 2Q\[ 5HDG\)ORZ *DWH)ORZ *DWH;SUHVV DUH WUDGHPDUNV RI 3HQWHN ,QF 2WKHU WUDGHPDUNV DUH SURSHUWLHV RI WKHLU UHVSHFWLYH RZQHUV

EDITORS REPORT EXTENDING THE LIFE OF VME

Extending VME Life to 2020 and Beyond Something of a shock has hit the world of VMEbus board manufacturers with the End of Life (EOL) notice on the TSI148 bridge chip with some doom-saying the “End of VME.” This news seems to have left suppliers with few options to continue serving customers who are still building and servicing systems with existing board designs. Yet this is a resilient industry and there are already paths forward. Here is one example. by Tom Williams, Editor-in-Chief

Like most other VME suppliers, Concurrent Technologies has been using the TSI148 VMEbus bridge device on its current designs. In August 2014, the supplier of the TSI148 device, IDT, notified the industry that it would be going End of Life towards the end of 2015. That was a bit of a bombshell, despite close relationships with IDT, manufacturers had no prior notice and at first it appeared as though they would be left with few options. The reason for the sudden End of Life was that IBM, who makes the device for IDT, are in the process of shutting down the fab and were unable to make a business case to transition the device to an alternate fab. Concurrent Technologies, for one, has reviewed its options and has embarked on a plan to continue VME board supply. As a result of this end of life announcement, Concurrent Technologies management quickly reviewed the status of its VME product line and also surveyed customers to find out their ongoing requirements. One of the clear messages was that many of the programs that use Concurrent Technologies Intel-based VMEbus processor boards are unable to transition to an alternative architecture due to the complexity of their application, the amount of hardware and software to be ported and the economic and logistic viability of swapping out chassis already deployed in the field. In addition, those customers who are looking to move to an alternative architecture such as VPX need a considerable period of overlap. During this time, which might extend to a few years, they need to continue to purchase existing VMEbus boards to ease their transition. Essentially, the EOL notice from IDT left manufacturers with a limited number of options: They could issue their own EOLs and walk away from the matter. They could replicate the function of the TSI 148 in an FPGA with the associated expense and delay of redesigning the on-board circuitry. They could aggressively buy up as many remaining bridge chips as possible with no certainty as to how many they might actually need or use. Or they could fall back on the lower-performance but still available Universe-II bridge chip. The path taken will no doubt vary with the circumstances faced by each manufacturer but the following describes Concurrent 10 | RTC Magazine FEBRUARY 2015

Figure 1 The VP B1x/msd board is based on the Universe-II bridge chip

Technologies approach, which includes the unique step of building a virtually busless VME board. Fundamentally, it was apparent to Concurrent Technologies that issuing End of Life notices for all its VME products and walking away from this still popular architecture was certainly not an option for them or their customers. Their research delivered some interesting results and established three levels of requirements: 1. Some customers must retain the ability to purchase existing VME boards to satisfy critical program needs. 2. A small but significant group of customers use VME form factor boards in their systems but don’t use the VMEbus for communication. 3. Less than 20% of Concurrent Technologies customers use the higher speed VMEbus transfer protocols like 2eVME and 2eSST. These protocols were extensions to the original VMEbus specification and are supported by the TSI148 bridge. Customers needed to have firm reassurance regarding the availability of VME products going forward and the solutions already implemented by Concurrent Technologies are:

1. “We issued speedy End of Life notices on all our TSI148 based VMEbus boards to clearly communicate the issue and give our customers time for them to choose a suitable option.” 2. “We are offering extended manufacturing contracts for current generation VME boards. In this case we procure sufficient TSI148 devices to meet the program requirements and store them in our warehouse. These devices are then issued to our in-house manufacturing lines to build boards for the specific customer program on the timescales defined in the contract. Having our own on-site storage and manufacturing facilities means we retain control of the devices and have the flexibility to build relatively small batch sizes. In addition we will be procuring a stock of TSI148 devices to allow us to continue to support repair requirements of boards in the field for several years.” 3. “For those customers not using the VMEbus interface on our processor boards, we have designed a macro-component that fits on our existing boards replacing the TSI148 device. This connects a few key signals from the VMEbus connector and routes them through to our board logic. This enables these boards to respond in the expected way to a number of VMEbus signals like SYSRESET, the VMEbus System Reset. This solution was relatively easy to implement: the printed circuit boards remain unchanged; the TSI148 was replaced and a few associated VMEbus interface components removed; the board logic firmware had a minor update and we released updated board support packages for common OSs such as Windows and Linux. The biggest challenge we faced was updating our internal test suite to use Ethernet as the communication interface instead of VMEbus to allow us to continue to test multiple boards in a single VME chassis. We have very quickly released versions of our popular VP 91x/x1x and VP 717/08x boards with this ‘bus-less’ solution and can repeat this for other products based on customer demand. This option is almost risk free for customers able to avoid VMEbus transfers and removes any current component obsolescence.” 4. “To provide assurance that we will continue to support the VMEbus market, Concurrent Technologies has introduced two new VME boards based on the latest Intel Core i7/i5 and Intel Atom processors (Figures 1 & 2). These boards use the Universe-II VMEbus bridge device which IDT claims will be supported ‘indefinitely’ as it isn’t tied to a specific manufacturing fab. Our expectation is that these two boards will be subject to our standard life cycle: a minimum of 5 years general availability; up to an additional 5 years of extended manufacturing for existing customers; followed by another 5 years of service and repair support. There are some differences between the TSI148 and Universe-II bridges; however we had previously designed boards with the Universe-II and so have all the hardware and firmware expertise to design new boards with this earlier generation device. To ensure the lowest impact at the application level, Concurrent Technologies provides an Enhanced VME Application Programming Interface (API). This API provides a consistent approach to the VME interface on all Concurrent Technologies’ VME/VXS boards and already supported both the TSI148 and Universe-II devices. Previously our VME Application Programming API had

Figure 2 The new VP E2x/msd board provides VMEbus compatibility using the Universe-II bridge chip.

helped customers migrate from Universe-II bridge based boards to TSI148 based boards and now it minimizes the changes needed for the reverse migration from TSi148 back to Universe-II. In addition, Concurrent Technologies has produced a Technical Information Note detailing the minor technical differences as a reference for our customers in this transition.” A sudden EOL on such a vital single source part can be traumatic. Concurrent Technologies reaction shows that there are ways to overcome such events but it takes quick response. These pages will look forward to hearing the reactions of other players. While Concurrent Technologies example may not be the one taken by everyone, it does reflect quick action to acknowledge this specific end of life issue and come up with options to enable customer choice. The action includes the launch of two brand new VMEbus boards and two new bus-less variants in plenty of time before the end of life deadline to minimize any changes to customer application software through their VME Enhanced API and board support packages. For the small percentage of customers using the higher speed VME transfer protocols such as 2eSST, the company is also offering an extended manufacturing option and has additionally completed an evaluation of an alternate FPGA-based bridge with this functionality. An FPGA-based solution has the advantage of providing independence from specific silicon obsolescence but requires more fundamental hardware and firmware design changes that would have added risk and delayed provision of a choice of solutions in the same timeframe. Doubtless there will be additional discussions and ideas based on customers’ needs to assess what VMEbus solutions they need with the expectation of maintaining a strong portfolio of VME products along with an easy migration path to alternate architectures. Concurrent Technologies Woburn, MA (781) 933-5900 www.gocct.com

RTC Magazine FEBRUARY 2015 | 11

TECHNOLOGY CORE THE MICROCONTROLLER: TAKING ON BIGGER JOBS

Easing development for the next generation connected embedded intelligence ARMâ&#x20AC;&#x2122;s new high-performance Cortex-M7 processor brings advanced opportunities to engineers looking to bring more connectivity and performance to their embedded applications. By understanding the relative advantages to Cortex-M4 and adaptations, developers can realize substantial advantages and reduce time-to-market. by Joseph Yiu, ARM

CORTEX

M7 12 | RTC Magazine JANUARY 2015

The ARM Cortex-M processor family is a range of scalable and compatible, energy-efficient, easy to use processors designed to help developers meet the needs of tomorrowâ&#x20AC;&#x2122;s smart and connected embedded applications. The Cortex-M4, unveiled in 2010, built on the Cortex-M3 foundation with a set of instruction set extensions explicitly tailored for digital signal processing, along with an optional single-precision floating-point unit delivering 1.25DMIPS/MHz. Since its launch, over 10 semiconductor vendors have introduced Cortex-M4 based general MCU products along with a wide range of sensor hub products based on Cortex-M4. In past few years, the capability and processing needs of connected embedded systems has become more demanding, with even the simplest of systems expected to have a graphical user interface, HMI or audio recognition or other natural ways of interaction and multiple connectivity options. Processors need to become more capable and offer more local processing capabilities. Microcontrollers in growing automotive and industrial automation applications need to support higher processing requiring CPU performance uplift. Industrial plants require an increasing amount of precision and operate on large amounts of data in a short space of time. These future system demands include delivering more features at a lower cost, increasing connectivity, better code reuse and improved energy efficiency. It is with future in mind that ARM along with its partners designed the ARM Cortex-M7 processor, the most recent and highest performance member of the Cortex-M family.

A Closer Look at the Cortex-M7

Doubling the performance of the Cortex-M4 and delivering 5 CoreMark/MHz, the Cortex-M7 is designed to address the more demanding applications and remove the barrier that previously faced Cortex-M CPU based solutions. Cortex-M7 is designed for a wide range of embedded applications including microcontrollers, automotive controllers, industrial control systems and wireless communication controllers (e.g. Wi-Fi). For those who are familiar with the wide range of Cortex-M family CPUs available for embedded applications, Cortex-M7 is based on the ARMv7-M architecture and brings architectural compatibility all the way from Cortex-M0 (Figure 1). The Cortex-M7 sports a six-stage superscalar pipeline and provides integer, floating point and DSP performance along with tightly coupled memories, caches and options to enable larger memory systems while handling deterministic behavior. The advanced pipeline relative to Cortex-M4 enables greater performance, allowing the Cortex-M7 to execute up to two instructions per clock cycle. A large focus of the development of the Cortex-M7 was on improving the instructions-per-clock (IPC) efficiency relative to earlier Cortex-M family processors. Cortex-M7 is the first Cortex-M profile processor to integrate both the option of instruction and data caches of up to 64KB each. The cache

Figure 1 ARM Cortex-M7 Processor features

enables efficient operations with a larger memory system (which is typically slower than the processor). Additional support for tightly coupled interfaces to memory arrays is integrated with support for custom Error Correction Code (ECC) implementation for each of the tightly coupled memory interfaces so that fast access to memory enables time-critical interrupt handling and real-time application tasks. This integration allows engineers to execute a large proportion of code from the internal cache to reduce the number of read and write occurrences from the external memory, leading to the power savings. Cortex-M7 also offers application engineers the option of ECC support for each of the cache memories hence enhancing the reliability of the system. For a given solution, if a memory location is corrupted with a single bit error, the data can be corrected and restored. In addition to the ECC, the memory system can also be enhanced through the optional Memory Protection Unit (MPU) with 8 or 16 regions for better system reliability. The memory system has also been advanced to support the increased CPU capabilities with a 64-bit AXI bus interface offering greater bandwidth than the 32-bit AHB and allowing multiple outstanding transfers for maximum bus system performance. For easy integration with legacy peripherals used in previous Cortex-M designs, there is an optional low-latency AHB peripheral bus interface. To allow flexible interrupt management and low interrupt latency, the Integrated Nested Vectored Interrupt Controller (NVIC) with 1 to 240 interrupts, and with 3 to 8-bit programmable priority level registers is closely integrated with the processor. There is also support for ETM, designed for use with CoreSight, ARMâ&#x20AC;&#x2122;s extensible, system-wide debug and trace architecture.

RTC Magazine FEBRUARY 2015 | 13

TECHNOLOGY CORE THE MICROCONTROLLER: TAKING ON BIGGER JOBS • Update CMSIS-DSP library to the Cortex-M7 specific version. The Cortex-M7 specific version is optimized for the pipeline behavior of the Cortex-M7 processor and therefore can offer higher performance. • New APIs are included in the CMSIS-CORE headers for cache configuration. If the Cortex-M7 device being used executes program from a slow memory (e.g. flash memory) via the AXI interface, the caches should be enabled for better performance. Figure 2 2X Performance improvement over the Cortex-M4

Cortex-M7 further expands the family’s floating-point facilities to include a double-precision option; the simultaneous issue of integer and floating point instructions is also now supported if the FPU is present. Given the range of applications that the Cortex-M7 based MCUs may enable in the future, it is fully supported with powerful debug features, with optional full instruction and data trace. These features make the processor an attractive solution for applications requiring a performance upgrade on devices already using the Cortex-M4 processor.

Migrating Designs to Cortex-M7

Given most embedded engineers and developers are familiar with Cortex-M4, let’s look at some of the software development benefits Cortex-M7 brings. From a developer’s perspective, the Cortex-M7 supports all the instructions available on the Cortex-M4 processor, and uses the same exception model for interrupt handling. In most cases, program code written for Cortex-M4 processor should run on the Cortex-M7 processors without any problem. However, there are a few cases where changes may be needed, and software developers must understand these to reduce the time required when migrating applications from Cortex-M4 to the Cortex-M7 processors. In order to get the best performance out of the Cortex-M7 processor, a number of C compilers and their runtime libraries have been optimized and updated (Figure 2). In addition, a number of changes in the debug system for the Cortex-M7 processor compared to Cortex-M4 mean that software developers must update their tool chains to newer versions in order to debug applications on Cortex-M7 based microcontroller products. In a few cases the firmware on the debug adaptor might also need an update. As a result, updating to the latest development tool chain is strongly recommended. Typically the following changes should be done when migrating software from the Cortex-M4 to the Cortex-M7 processor: • Update the CMSIS- CORE header to use Cortex-M7 header files. The CMSIS-CORE header files for the Cortex-M7 processor is available from CMSIS version 4.2. The most updated CMSIS package is available from www.arm.com/cmsis

14 | RTC Magazine FEBRUARY 2015

In addition, all code should be recompiled in other to allow the compiler to optimize the instruction sequencing better for the Cortex-M7 processor pipeline. In some cases, additional cache maintenance operation might be needed during runtime. For example, if a cacheable memory location is shared between the processor and a separate bus master such as a DMA controller: a. If the memory location updated by the Cortex-M7 processor has to be accessed by another bus master, a cache clean is needed to ensure the other bus master can see the new data. b. If the memory location has been updated by a different bus master, the Cortex-M7 processor has to do a cache invalidate so that next time it reads the memory location, it will fetch the information from the main memory system. The Cortex-M7 processor supports several floating point support options, which allow for no FPU, single precision FPU and for single and double precision FPU. If the application can benefit from the double precision floating point unit support, the application should be updated and recompiled to make use of the double precision FPU. Even if the application uses only single precision floating point operations, recompiling the code for the Cortex-M7 processor can also be beneficial because the FPU in the Cortex-M7 is based on FPv5, whereas the FPU in the Cortex-M4 processor is FPv4. The FPv5 has additional floating point processing instructions, which might help speed up the floating point data processing in the target application.

Program Code Changes

There are a number of potential areas in the program code that might need changes. Due to the higher performance of the processor, some program code might need adjusting due to the faster execution. This is most common for applications that use hard coded timing delay loops. System memory maps often change when migrating from one microcontroller device to another. Also, in the Cortex-M7 processor the initial vector table does not necessary start at address 0x00000000. If application code assumes initial vector table as address 0, users might need to update the code so that it determines the initial vector table location from the Vector Table Offset Register. Due to the multiple bus interfaces and more capable write buffers

in the Cortex-M7 processor, users might find it necessary to insert additional memory barrier instructions in the program code. The guide line for memory barrier usage is documented in ARM application note AN321 – ARM Cortex-M Programming Guide to Memory Barrier Instructions. In the Cortex-M4 processor, due to the simple nature of the pipeline, omitting the memory barriers does not usually cause any issue. In the Cortex-M7 processor the memory barrier requirements are stricter.

Getting Started

Not only does Cortex-M7 inherit the characteristics from Cortex-M processor series, such as energy efficiency, high performance, ease of use and smaller code, but it is also designed with exceptional memory and connectivity options for design flexibility making it especially suited for the automotive, IoT and industrial connectivity markets. Announcements of Cortex-M7 based MCUs have followed soon after launch of the processor itself with the following: • S TM32 F7 series from STMicroelectronics announced at ARM TechCon in October 2014 • S AM E70 and the SAM S70 series from Atmel targeted at connectivity and general purpose industrial applications announced at CES 2015

• Automotive-qualified Atmel SAM V70 and V71 series take advantage of the Cortex-M7 DSP extensions, targeting at infotainment connectivity and audio applications, also announced at CES 2015 • Freescale has also publicly announced their plans of utilizing the high-performance of Cortex-M7 for power conversion, motor control and industrial automation Given that there are many architectural similarities between the ARM Cortex-M4 and Cortex-M7 processors, and that ensuring that the majority of application code is directly ready for migration, software developers can get started to ensure their applications are suited for the next generation of embedded connected intelligence. Migration requires some adaption and changes to be made by the user. Developers can follow up the migration process in further details with the whitepaper titled “Migrating Applications from an ARM Cortex-M4 Processor to a Cortex-M7 Processor - A Software Developer’s Guide” on the ARM Connected Community and assess in-depth technical discussions. ARM, San Jose, CA (408) 576-1500. www.arm.com

RTC Magazine FEBRUARY 2015 | 15

TECHNOLOGY CONNECTED MANAGING THE INTERNET OF THINGS

Building Blocks for the Internet of Things The concept of the Internet of Things (IoT) has become widespread in the past couple of years, making it a lucrative playground for engineers, companies and investors. What’s special about the IoT? by Amir Friedman, ConnectOne

Computers have been connected to the Internet for some time now, phones got connected a few years ago, and now it’s time for all the rest. IoT is about connecting all the rest—everything from your dog to your washing machine. There are currently about 2 billion PCs, portable PCs, tablets and smartphones out there. The IoT market is estimated to be 50 billion devices in 5-10 years, depending on which market report you pick up this week.mnBut the sheer numbers is not the only thing special about IoT. IoT is about making our lives better, and this is a message not just to the business sector, but to everyday people like you and me. IoT will help us learn about ourselves and our surroundings, help us save on scarce energy resources, make us healthier and live longer, and many more things. It will truly transform the way we live. No one company or group of companies can develop all the applications around IoT. We need to get as many people as possible involved in innovating and developing new ideas. So, how do we get people and companies from all around the globe to join in and develop the new and exciting applications, that will make everyone’s lives better? We give them the basic building blocks required to develop an IoT application, and then sit back and let them do their thing. To develop an IoT application, you need several key building blocks. Figure 1 shows the different components involved in an IoT application. 1. The connected device – The actual physical device we want to control and manage. It needs to be connected somehow, either wired or wireless. 2. The local user – This is the user who wants to interact directly with the device to either control it, or receive information regarding its operation. 3. The router – This is the part that connects the device to the Internet. The connection can be via ADSL, cable, cellular, etc.

16 | RTC Magazine FEBRUARY 2015

Figure 1 Components of an IoT Applications.

In some cases, there is no router where we want to place our device, or a standard router is not sufficient for the application, so you may need to provide a router of your own. 4. The Cloud solution – A Cloud solution can be simple storage of data flowing from your connected device, or can include complex analytic functions that are performed on the data coming from the device and reported to the local or remote user. 5. The remote user – The user who is not in the proximity of the device, but wants to control or receive information regarding the device from afar. First, you can see that an IoT application involves hardware, software and connectivity components. Security is always a concern with IoT applications where the level of security required depending on the application itself. Because IoT involves

several components, it’s difficult to know where to begin, and since the IoT market is still in its infancy, standardization between the components has not been achieved yet. But, since we do want to develop an IoT application, let’s dive in, and usually, the best place to begin is at the connected device.

The Connected Device

Whatever device we want to connect, it will require 2 main functions: application and connectivity. The application will be developed on an embedded CPU and a connectivity chip or module will be used to connect to either a local user or send data to the Cloud solution. The embedded CPU and its memory resources and software will be chosen based the device requirements just as any embedded system is developed. But, an IoT application will require connectivity capability, so connectivity software stacks will be required. There are a couple of generic, open CPU platforms that enable simple development and programming for IoT. They are Raspberry Pi and Arduino. Each platform has add-ons that enable connection to different sensors and connectivity solutions. The connectivity type depends again on what the application wants to do. If all we want is a short distance connection to a local user’s smartphone in the vicinity of the device, we can use a Bluetooth chip or module. Bluetooth has low energy (LE) mode that is very power conscious for devices working on batteries. Devices using BLE with a limited amount of data being transmitted can operate several years without changing their batteries. To send data to the Cloud, Bluetooth devices can use the local user’s smartphone as their hub to the Internet, or a special hub can be provided that routes the Bluetooth data through Ethernet/Wi-Fi/cellular to the Internet. Wi-Fi, a more power-hungry solution, but still relatively low power, will be a better choice for devices that are connected to external power, or can be charged periodically. Wi-Fi, in contrast to Bluetooth, can connect to

Figure 3 Connected Power Socket

the Internet and the Cloud directly via an existing Wi-Fi router without a special hub required.If Ethernet (LAN) is available where the device is located and the device is stationary, a wired connection may be a good choice – it is usually the lowest cost and simplest connectivity method for the device.

Real World Application Example

One example would be a connected lamp that can have WiFi embedded into it so that it can then be turned on/off and dimmed via a smartphone app or remotely via a Cloud solution. Here are the elements for the solution shown in Figure 2. Local user: A local user, usually equipped with a smartphone or tablet, wants to interact with the device or receive information relating to the device. So, an app needs to be developed for the smartphone or tablet that will receive data from the device, interact with it, and if needed, send the data to the Cloud. Router/hub: If the connected device is using Ethernet or Wi-Fi, connection to a standard router that already exists in the area of the device is easily and economically achieved. If no such router exists, you may need to supply a router, for example to route Wi-Fi data from the device or from several devices to the Internet. Cloud solution: The solution in the Cloud can have several levels of complexity: 1. Connectivity – the solution can enable a remote user to connect to the device remotely (through the Cloud solution).

Figure 2 Components for Connected Power Socket Application

2. Management – management capability can be provided by the Cloud solution whereby the operation and control of the device can be performed through the Cloud solution.

RTC Magazine FEBRUARY 2015 | 17

TECHNOLOGY CONNECTED MANAGING THE INTERNET OF THINGS

Figure 4 Connect One’s Wi-Fi Module

3. Analytics – analysis of the data coming from the device or other sources can be used to send information to a remote/ local user regarding the device operation, or this analysis can be translated to commands sent to the device to influence its operation. For example, the connected lamp can be turned on from the Cloud when analysis shows that it is dark in the lamp’s location and a remote lock shows that someone has entered the house. Remote user: Similar to a local user, a remote user equipped with a smartphone/tablet/pc wants to interact with the connected device or receive information relating to it. Where a local user may interact directly with the device, a remote user will need to do so via the Cloud solution. A relevant app or application needs to be developed that will connect to the Cloud solution, and thereby have access to the device or information relating to it.

Example System

Figure 2 shows the setup for such an example application that is built on a simple connected product with a specific industry solution. This will be based on a connected power socket for our laundry room, so we can simply manage the room light, and any other appliance connected to the power socket, remotely over the Internet. There are several companies providing end-to-end solutions for IoT applications. One of those companies providing such building blocks is Connect One. We will use Connect One’s solution to show how to build such a connected power socket. Figure 3 shows the block diagram for the connected power socket. The unique feature about this Wi-Fi module (Figure 4) used in this connected outlet is that you don’t have to do any programming at all at the device level and you don’t have to add an application CPU to your design. The module connects to Connect One’s Cloud solution automatically and becomes accessible

18 | RTC Magazine FEBRUARY 2015

for the purpose of control and management. This module basically lets you control its I/O ports via the Cloud interface. The module has an embedded web server that enables you to configure the I/O ports and name each port according to its function in a simple and intuitive manner. See Figure 4 for how each power socket port is named. So, if you place the connected power socket in your laundry room and connect your washing machine, dryer and room light to the power socket, you can monitor and control the electricity in your laundry room. You can also change the names if you decide to plug some other devices into the power sockets when you move it to another location. Along with this simple configuration ability, a smartphone app is also provided. It shows the current state of the power socket, and when each status button is pressed, it changes the state from ON to OFF or vice versa (Figure 6). You can use this app during the testing phase of your product, and then make changes to the app to fit your particular look and feel. Since the data from the Wi-Fi module regarding the state of the power sockets goes to Connect One’s Cloud solution, and the app accesses the connected power socket through the Cloud solution, the app can be used by both a local user in the proximity of the connected power socket, as well as by a remote user in some other location. IoT presents a huge opportunity not just for the business sector, but for regular people to improve their lives and their surroundings. We know what it takes to develop IoT applications and much of the technology already exists. We need to do more in the realm of standardization so that different vendor solutions can interact with one another in a seamless and efficient way. Until standards solidify, most product developers will try to find a single vendor, with a complete solution, so they can be sure all the components in the IoT application seamlessly “talk” to each other.

Figure 5 Embedded Web Server Configuration

Figure 6 Smartphone App for Managing Connected Power Socket

Knowing all the components required, we can break down each component and provide simple-to-use building blocks for people and companies that want to develop IoT applications. The better job we do in providing simple-to-use building blocks, the faster we will reach our goal of getting everything connected. ConnectOne San Jose, CA (408) 572-5675 www.connectone.com

WE ASSURE YOU HIT A BULLSEYE EVERYTIME. Electronicly and NOW in print. The sourcebook is a large industry compendium drawing off our inventory on Intelligent Systems Source (ISS) website. The book will feature over 90 OEM manufacturers, manufacturersâ&#x20AC;&#x2122; representatives, approximately 20 software vendors, listing over 1200 industry products from SBC, Systems, IO Boards, Displays, Power supplies and ARM products it will be the most thorough directory available in the industry.

Request your copy today.

intelligentsystemssource.com

RTC Magazine FEBRUARY 2015 | 19

TECHNOLOGY CONNECTED MANAGING THE INTERNET OF THINGS

Beyond the Secure RTOSâ&#x20AC;&#x201C; Protecting Wirelessly Connected Endpoints from Cyber-attacks About seventy percent of cyber-attacks target the application layer. Wireless connectivity creates an attack vector that hackers can exploit. While a secure RTOS is critical for security in embedded devices, they are just the foundation, not the complete solution. by Alan Grau, Icon Labs

20 | RTC Magazine FEBRUARY 2015

The Internet of Things continues to proliferate, with billions of new devices being connected to the Internet. The IoT is reaching into smaller and smaller devices including smart lighting systems, wearable devices, thermostats, health monitoring devices and sensors of all types. The majority of these smaller units connect either directly or indirectly to the Internet using a variety of wireless protocols including Wi-Fi, 6LoWPAN, ZigBee, Bluetooth, WiMax, cellular, Z-Wave, ANT+, etc. Many are tiny and run very low cost hardware to meet the business demands of these systems (low cost, small size and low power usage). As a result, they are very resource constrained and are not able to run a traditional operating system such as Linux, but instead run a specialized embedded operating system or RTOS (Real-time operating system). Unsurprisingly, given the rapid deployment of new devices, vulnerabilities have been reported in many different types of devices. Internet connected light bulbs using 6LoWPAN mesh networks have been hacked, smart meters have been compromised via an optical debug port, wireless smart home devices have been compromised, Wi-Fi controlled SCADA devices have been hacked, pacemakers have been proven to be vulnerable, and a Tram control system was hijacked in Poland by a teenager using a modified TV remote control. This vulnerability resulted in the derailment of four vehicles and injuring twelve people.

Security of the RTOS Advanced security capabilities are a major selling point for many RTOS vendors. Modern RTOSes provide capabilities such as MILS (Multiple, Independent Layers of Security) architecture, built in resource provisioning, and support for security certification such as IEC 62304, IEC 61508, IEC 50128, DO-178B/C, EAL and ARINC 653 certifications. In addition, many provide security services such as authentication, access controls, data encryption and security protocols. Without question, embedded design engineers now have a much richer set of security tools and a stronger security foundation available in the RTOS than was available a decade ago. As impressive as all of this is, it is still just a foundation. A device running an insecure OS and communicating over an encrypted data channel is clearly insecure. The converse is not necessarily true. Securing the OS and adding security protocols is only first step to building a secure device. Even with these pieces in place, there are still important security challenges to be considered. With security implemented at the RTOS level, a successful attack against a protocol or application may prevent the attacker from gaining full control of the system, but they may still be able to inflict considerable damage. Data or communication that is valid and passes through the RTOS security may still present security threats to the application layer. Many of the attacks listed above focused on the application layer and resulted in a security failure despite the use of a secure OS. The OS security was never breached and yet the device was compromised. And with as many as 70% of all cyber-attacks targeting the application layer, it is clear that security must extend to the application layer.

Figure 1 Rules based filtering controls the packets processed by the embedded applications, providing the foundation for application security and intrusion detections capability.

Application layer attacks In 2013, Security researcher Craig Heffner discovered a backdoor within the firmware found in a number of wireless D-Link routers. The HTTP server in these routes includes a backdoor that bypasses the standard authentication process. The web server examines the browser user agent, and if it matches “xmlset_roodkcableoj28840ybtide”, authentication checks are skipped. The string, read backwards, “edited by 04882 joel backdoor” shows this is an intentionally planted backdoor. The backdoor provides access to the device’s configuration capabilities. The web server used in this same D-Link router already contains a number of vulnerabilities (http://www.cvedetails.com/vulnerability-list/ vendor_id-521/product_id-899/Acme-Labs-Thttpd.html), some of which can be used in certain circumstances to allow for remote code execution. In Australia, Vitek Boden waged a three-month war against the SCADA (Supervisory Control and Data Acquisition) system of Maroochy Water Services beginning in January 2000, which resulted in millions of gallons of sewage spill into waterways, hotel grounds and canals around the Sunshine Coast suburb. It is an interesting case study because not only did the perpetrator cause pumps to not run when they should have been running, he also was able to prevent alarms from being reported, further complicating the problem. This example also shows the danger of insider attacks, as Boden was a former contractor of Maroochy Water Services. Boden was eventually arrested and sentenced to two years in prison after he was found with computer and radio equipment for communicating to the SCADA control systems. Other widely reported attacks against application layer services include attacks on web enabled Wi-Fi and wired IP cameras and nanny cams, which have notoriously weak security. A quick goo-

RTC Magazine FEBRUARY 2015 | 21

TECHNOLOGY CONNECTED MANAGING THE INTERNET OF THINGS gle search will reveal multiple reports against web based security cameras, nanny cams and IP cameras, and there is even a website dedicated to streaming video for IP cameras with weak security (www.insecam.com). These vulnerabilities allow unauthorized users to view the video streaming from the camera, allowing them to spy on whatever the camera is set to watch. Even worse, in some cases they can even instruct the Camera On light to not activate, leading the victim to not know that they are being spied upon.

Application layer security What all of these attacks have in common is that they did not target vulnerabilities in the underlying operating system, but rather relied on vulnerabilities at the application layer. Another thing that they all have in common is that they exploit standard interfaces of the system. In each of these cases, the application layer allowed legal commands to be executed by unauthorized parties. Wireless connected devices are particularly vulnerable to attack as anyone within range of the protocol who has a laptop, the appropriate wireless interface device and applicable software can launch an attack. To protect the application layer of embedded devices from cyber-attacks requires a set of capabilities to ensure that the application only processes commands from authorized users, ensures that all processed commands are valid (i.e. contain legal data) and that all commands are appropriate (i.e., changing the ratios of ingredients or the processing temperature in a chemical processing plant). Additional capabilities that will provide a higher level of security for the device are the ability to detect and report suspicious commands or activity, a command historian to allow auditing when a problem does occur, and data protection to ensure that device data is protected.

Application Security for an Industrial Control System Industrial control systems are in many ways typical of modern embedded and IoT devices. They are frequently built using a secure RTOS, provide a customer application that performs a critical function, and can be controlled via messages received over an Ethernet or Wi-Fi network. For our purposes, consider the example of an industrial control system utilized in the production processes of a chemical manufacturing plant. These systems frequently utilize Ethernet based control protocols such as EthernetIP or Modbus TCP for configuration, control and reporting. While these systems traditionally used closed Ethernet networks, many are starting to use Ethernet networks connected to the corporate network and Wi-Fi networks. The control protocols are used to specify the operation of a wide array of parameters involved in the chemical processing. These can include the temperature at which the processing is performed, the ratio and the ingredients, the timing of the various processing stages, flow rates, etc. In addition to the control proto-

22 | RTC Magazine FEBRUARY 2015

EMBEDDED DEVICE Application Task A

Application Task B

Application Task C

Application Task D

Application Task E

Floodgate Event, Audit, Anti-temper and protection Floodgate Anti-temper

Floodgate Aware

Floodgate Agent

Floodgate Authorize

Floodgate Defender

HTTP/Modbus, etc. TCP/UDP

Floodgate Audit, Event, Policy and Signature Database TPM/TEE (if present)

File System (if present)

HW Secure ID (if present)

HW Crypto (if present)

IP Ethernet/WiFi

RTOS (VxWorks, INTEGRITY, Embedded Linux, RTXC, etc.)

Device Component

Floodgate Component

User Component

Figure 2 A security framework for embedded applications should protect the device from invalid and unauthorized commands, protect data and detect and report unusual traffic.

col (Modbus TCP, EthernetIP, etc.), the device may also include a web interface for viewing configuration and processing information, and an FTP interface for downloading new firmware files. While most cyber-attacks against the application will attempt to exploit weaknesses in the application interfaces, they may also attack the application implementation, or the interactions between interfaces/applications supported by the device (Table 1). Protection against application interface and application implementation attacks is provided by application protocol filtering. If the device includes an embedded firewall, it may be possible to extend the firewall to perform protocol filtering. Otherwise application guarding APIs can be implemented to perform protocol filtering for the device (Figure 1). Application-specific protocol filtering should provide protocol validation to ensure that all messages conform to the protocol specification and verify that all data is valid and in range. It should also support mutual authentication. Authentication is also critical for wireless networks to prevent attacks by unauthorized devices. For low bandwidth networks such as 6LoWPAN, support for application specific authentication methods such as the JPAKE cipher are critical. It should also implement policy enforcement that will support user-defined policies to restrict data range values to device or installation specific ranges. For example, the protocol may allow a range of values for 0..100, but the operation of the device may only allow values in the range of 40..60. The protocol filter should support this more constrained set of values. Protocol filtering should also include access control. Industrial protocols such as ModbusTCP do not provide any mechanism for access control, any legal Modbus command received by the device is processed. Access control policies can be implemented in an application filter to control what devices are allowed to send

TECHNOLOGY CONNECTED MANAGING THE INTERNET OF THINGS ATTACK TYPE Application interface attack

HOW IT WORKS Exploit weaknesses in the interface itself. Many legacy protocols, including Modbus TCP and EthernetIP have no built in security mechanisms. The protocols accept and process any commands they receive regardless of who the sender is. If a hacker can gain access to the network and send commands to the device, they can change the settings on the device, modifying how the control device is performing.

Application implementation attack

These attacks exploit weaknesses in the implementation of the applications by sending illegal data to the application in an attempt to compromise the application. Examples include out of range data and buffer overflow attacks. Well-designed applications will not be vulnerable to these attacks, but in many cases such attacks can cause unpredictable behavior.

Cross application attack

These attacks attempt to exploit relationships between multiple interfaces and applications on the system. For example, if a device provides an FTP interface, the hacker can use this to try and download system settings (configuration files), operational data (log files) and device firmware (which they can then try to reverse engineer to find weaknesses in the device for future attacks). The hacker may also use the interface to change configuration information, device firmware or other information stored on the device.

Table 1 Different types of attacks can be launched against a system and it is important to be aware of all of them.

commands to a device. For example, a white list of IP addresses can be configured and Modbus commands blocked if they are not from a machine on the whitelist. Additional controls can be provided for finer grained control. Another significant challenge for industrial control devices is encoding rules to answer the question, “Does this command make sense?” While protocol enforcement, policy enforcement and access control enforcement ensure that the commands received are legal and are received from a trusted device/ machine, they still don’t solve the problem of an accidental or malicious change from an authorized insider. Semantic filtering attempts to prevent things like rapid cycling of commands or changing values in ways that are operationally incorrect such as setting an inflow rate that exceeds an outflow rate for an extended period of time. Command audit logs record all commands executed by the application for later analysis if a problem does occur. This capability is an important part of auditability for industry standards. An API should be available that lets the device engineers log and report each access, authentication attempt, or any other intrusion event. For example, if the web interface includes a username/password based authentication, each login attempt should be logged using this API. The intrusion detection API will then report this event to a management system. The management system then analyzes the received data and is able to detect attempts to probe a single device or multiple devices in a way that is not possible on the device itself. The device may not have the intelligence or information to distinguish between repeated access attempts by a system administrator who forgot a password and systematic probing by a hacker. By the same 24 | RTC Magazine FEBRUARY 2015

token, there should be a mechanism to send alerts when any unusual behavior is detected. Data anti-tamper detection allows the system to detect when unauthorized changes have been made. This is achieved by using a secure hash of static configuration data which Data anti-tamper can be used to detect cross-application attacks such as a change to configuration data via the FTP or Web interface by an unauthorized user.

An Application Layer Security Framework An application layer security framework, such as the Icon Labs Floodgate Defender product family, provides a framework for application security in embedded devices. This framework includes the Floodgate Defender, which is an application protocol filtering engine and embedded IDS/IPS. Floodgate Anti-tamper provides secure file hashing for anti-tamper protection as well as an audit task for run validation of secure hashes. A user API for event reporting and command audit logging is supported by Floodgate Aware (Figure 2) and Floodgate Agent provides an interface to cloud-based management system supporting events and audit logs, management and enforcement of security policies along with system-level firewall filtering and intrusion detection capabilities. Today’s IoT and embedded devices are complex connected computers that perform critical functions These devices frequently communicate over wireless networks, creating an attack vector that can be easily accessed. Including security in these devices is a critical design task. Using a secure OS and system level security features provides the foundation, but security features most be included at the application layer to ensure a secure overall system design. Icon Labs, Des Moines, IA. (515) 226-3443. www.iconlabs.com

Need Software for 速 PCI Express

PCI Express速 Software Dolphin PCI Express sooware is a complete sooware stack that supports Windows, Linux, and now VxWorks. is stack includes sooware for Peer to Peer connections, sockets, reeective memory connections, and TCP/IP support.

www.dolphinics.com

RTC Magazine FEBRUARY 2015 | 25

TECHNOLOGY IN SYSTEMS STRATEGIES FOR WEARABLE DEVICES

Putting the Mobile into Wearable Devices Wearable devices for consumer represent substantial market opportunities for design engineers who understand the implications of IoT, social media and ecosystem development. by Joy Wrigley, Lattice Semiconductor

26 | RTC Magazine FEBRUARY 2015

The markets for consumer-oriented wearable devices are growing dramatically due in large part to new features and capabilities made possible by the confluence of ultra-low-power semiconductor components, short-range, low-power communications protocols and a new generation of sensor technology. These new devices often rely on wireless connectivity technologies such as Bluetooth or Wi-Fi for access to the Internet of Things (IoT), which allow them to communicate with user’s smartphones or other personal electronics as well as cloud-based services.

Device Classes The smartwatch is perhaps the most hyped category of wearables, and with good reason given the marketing prowess of such industry heavyweights as Apple, Motorola, Pebble and Samsung – the key players in this evolving, but relatively nascent, category. Essentially, wearable smartphones and smartwatches offer users a plethora of features and functionality, including cellular phone, email, instant messaging, media player, advanced touch displays, activity monitoring, running apps and more. To support all these capabilities, smartwatches are inherently complicated, have intensive processing requirements and incorporate numerous sensors – much like smartphones. However the smaller size of smartwatches dictates the need for a much smaller battery, which makes maximizing battery life a key requirement. Sports watches make up a separate, though related category. Professional athletes were the early adopters of these products, which are based on ultra-low power RF transceivers, inexpensive MCUs and short distance communication protocols such as Bluetooth Smart. Very light weight and ultra-low energy consumption are vitally important features because marathoners, tri-athletes and professional cyclists want to be monitored for many hours without changing batteries. As a result, sports watches from Garmin, Polar and Suunto are powered by coin-cell batteries. Although first-generation sports watches sent physiological data to a PC for analysis, using the IoT was a natural extension. Fitness-oriented activity monitors are a fast growing segment of wearables. Lacking many of the “bells and whistles” of smartwatches and sports watches, these simple, dedicated devices monitor such activities as the number of steps taken during a day and send it via smartphone to a data center to provide users with updates and kudos when goals are reached. Design Drivers for Today’s Competitive Products Despite the many product categories and applications they span, there are several common denominators that wearables must address before they can hope to gain

Figure 1 Compact low-power designs for wearable devices can often save energy by offloading routine functions to state machines or small blocks of logic implemented in ASICs or programmable devices, allowing the device’s MCU to spend most of its time in an energy-saving sleep mode.

market traction. Unsurprisingly, ensuring optimal easeof-use and durability under everyday living conditions is the chief goal for any wearable device. Battery life is a major concern and while a precise definition of what’s considered an acceptable battery life remains elusive, the old adage that “more is better” is appropriate. Effective power management that maximizes a battery’s charge by minimizing the power required to operate the device is key to providing a good user experience. A key way of achieving this is by ensuring that power-hungry CPUs remain asleep when not needed and do not respond to false wake-up triggers Another requirement is the transparent connectivity between wearables and any host platform (e.g., smartphone, tablet, PC) and IoT-capable cloud-based applications. Almost without exception, this involves a low-power wireless interface, such as one of the low-power variants of the Bluetooth or ZigBee standards. Some devices that require higher data rates or longer ranges use the low-power variant of Wi-Fi while some products with extreme cost or power constraints employ proprietary wireless protocols. Wearables must also be interoperable with a large cohort of mobile platforms and other IoT-enabled devices. In addition to smartphones, tablets and other mobile devices, wearables may be required to work in parallel with other wearables that also share the same host platform. Today, smartwatches and fitness monitors sharing access to a smartphone are the most common examples of this type of M2M collaboration, but it’s likely that wearables will be expected to work seamlessly within body-area networks comprising many more devices in the near future.

RTC Magazine FEBRUARY 2015 | 27

TECHNOLOGY IN SYSTEMS IMPLEMENTING M2M FOR EXISTING DEVICES

Figure 2 Even the most highly-integrated SoCs may require additional internal connectivity to add peripherals or external interfaces.

It is also important that wearables be supported by a robust application ecosystem. This includes local apps, cloud-based apps and services, online presence and social media. The infamous maxim for embedded systems which states, “the processor architecture with the richest ecosystem wins,” is echoed in the wearable systems market. In order to be truly useful, a wearable must be able to port its data, either into a local application or out into an open ecosystem that facilitates information exchange and interactions with social networks. For example, activity trackers are now expected to be able to share the statistics they produce with other users, either via a private group or through public social media such as Facebook. In fact, social connectivity is often a primary consideration for consumers when they go to purchase an activity monitor or other wearable.

Technology Domains Different types of wearables are quite naturally built with different components and technologies. For example, because smartwatches depend on software apps to implement functionality, they require a high performance 32-bit microprocessor. On the other hand, the sports watches used by athletes are most frequently powered by 8-bit MCUs. In certain IoT applications, 8-bit devices can deliver better performance than 32-bit processors. IoT applications using thin clients, for example, are good fits for 8-bit MCUs despite their limited flash memory and onboard RAM. Wherever there is direct port-to-port I/O, 8-bit devices almost always have lower latency than 32-bit devices, and 8-bit MCUs typically consume less power. Sensors are always involved in activity monitors –

28 | RTC Magazine FEBRUARY 2015

Fitbit, for example, integrates a pedometer – and this usually means some sort of digital signal processing is required. Accuracy, duty cycle and sample rate will vary depending on the precision required by the application. Low Power is a Common Denominator One constraint shared by all these applications is power requirements, which must always be low – although the definition of “low” also depends on the device and its applications. For our purposes , it can be assumed that batteries are the primary energy source. Sports watches typically perform their technology magic powered by a coin cell battery such as the CR2032, which has an energy capacity of about 225 mAh. That’s not nearly enough for smartwatches that run an OS and often have a color display. A capacity of over 2,000 mAh is required to meet the power requirements of the current generation of smartphones. In MCU-powered wearables, integrated peripherals play critical roles in making the most out of the limited amounts of space and energy available. While the RF transceiver is perhaps the most useful, careful consideration must be given to I/O and power management options. For example, small, low-power state machines can perform routine data collection and monitoring tasks while the relatively energy-hungry CPU remains inactive except when needed. In many applications, this can allow the CPU to remain in an ultra-low-power sleep mode for 98% or more of the time. Many MCUs already include their own pre-integrated “smart I/O” peripherals but some applications may benefit from the addition of custom functions. Often, these functions are integrated with an ASIC or FPGA/PLD which may also contain connectivity, power management and higher-level functions (Figure 1). We will explore this further in the following section which addresses internal and external connectivity.

Internal and External Connectivity Further power and space savings can be achieved by careful integration of connectivity and power management elements. Connectivity may be divided into two categories--internal connections between subsystems and external connections to host devices or the IoT. In many instances, the new Bluetooth Smart standard is likely to provide connectivity between the sensor and the aggregating device, such as a smartphone, PC, set-top box or dedicated personal health system. An important exception is the sports watch category, which adopted the proprietary ANT+ protocol before Bluetooth Smart became a standard. ANT+, which was developed and is maintained by Garmin’s Dynastream division, has created its own ecosystem, which can communicate with PCs and equip-

ment typically found in gyms. Garmin Fitness and Polar devices utilize ANT+ in high-end training devices. A few sports watches connect directly to the smartphone, although the ANT protocol is part of the software suite of many smartphones. It only has to be activated. Returning to Bluetooth Smart, the most important single feature of recently adopted v4.1 is “dual mode” topology. This permits a device such as a smartphone to act as a Bluetooth Smart Ready hub and a Bluetooth Smart peripheral at the same time. The most obvious use scenario is the ability to pass data from a sensor or smartwatch to a mobile phone and then on to a PC if appropriate. Another attribute, which gives developers even greater freedom, is the ability to set up a scatternet. In its pre-v4.1 mode of operation, Bluetooth enabled communication by creating piconets. But its three-bit address space limits the maximum size of a piconet to eight devices – one hub and seven peripherals, which could negatively affect usability as the IoT expands. Now that the device can assume either identity, it is possible for a hub to communicate with many more than eight devices. Another important change for developers gives them more flexibility in maintaining communication sessions. With v4.0, the interval between connection “advertisements” from a Bluetooth Smart device to a Bluetooth Smart Ready device was fixed. Unfortunately, this meant that when an activity device such as a fitness monitor was physically separated from the hub, the connection could be quickly abandoned and had to be restored manually. Beginning with v4.1, the developer now sets the interval between connection advertisements. Bluetooth connectivity may be incorporated into a design using a stand-alone Bluetooth radio, which requires internal connectivity such as an LVDS, I2C or a parallel port to exchange data and commands with its host MCU. In designs based on one of the many single-chip radio/ controller system-on-chip (SoC) products available today, some level of internal connectivity is often required to interface the controller with additional sensors or peripherals it is not equipped to support. This can include mobile I/O expansion and bridging or even higher-level interface functions, such as sensor function pre-processing and co-processing, or a display controller for a small LCD (Figure 2). These functions may be implemented in one of the new generation of low-power programmable logic devices that have recently become available, such as Lattice Semiconductor’s iCE40 Ultra FPGAs. Power Management Energy management and production is probably where the wearable industry will see the most innovation over the next several years. Given the ultra-low power con-

sumption that is a top priority for wearables, energy harvesting is a definite possibility, most likely by converting body heat or kinetic energy generated by movement during exercise into electrical energy. It should be noted that when energy harvesting is mentioned, it is almost always with the understanding that a rechargeable battery is part of the energy system. The use of solar energy is also being investigated. Most of the power management of the MCU and its peripherals is accomplished within the MCU itself through the use of various levels of “sleep” or reduced activity states. At the system level, most wearables today have a very simple LED or LCD display that does not require more sophisticated power management than what an MCU can handle. As screens become more complex, however, power requirements will take on even greater importance. Looking Forward As the total wearable market evolves, it will probably follow the usual path toward consolidation. Because of their ubiquity, smartwatches are likely to become the preferred aggregator of the physiological information collected by sensors. Sports watch manufacturers are already differentiating themselves by expanding beyond activity tracking to include monitoring of traditionally “medical” vital signs such as blood pressure (BP), oxygen saturation (SpO2), heart rate (HR) and heart rate variability (HRV). The transition from stand-alone applications to broader interoperability cannot be far behind. This does not, however, mean that proprietary protocols will give way entirely to standards because smartphones can accommodate any number of protocols and proprietary systems can support robust ecosystems within a circumscribed application space. Despite likely advances in battery technology, maximizing battery life will certainly remain a priority for wearables, particularly as new features and functionality are added. Ensuring that power-hungry CPUs run only when needed will remain one of the best approaches to accomplishing this. Lattice Semiconductor Hillsboro, OR (503) 268-8000 www.latticesemi.com

RTC Magazine FEBRUARY 2015 | 29

TECHNOLOGY DEVELOPMENT CUDA

CUDA: An Introduction CUDA – NVIDIA’s GPU architecture and programming API for harnessing the power of their massively parallel graphics processors – has been a de facto go-to for many performance-oriented developers since it was first released in 2006. by Dustin Franklin, GE Intelligent Platforms

Today, GPUs are available in a great variety of Compute Unified Device Architecture (CUDA) core counts and memory configurations – which determines power consumption and compute performance – scaling all the way up from NVIDIA’s 165W, 2,048-core Maxwell GM204 down to the 10W, 192-core CUDA-capable Tegra K1 ARM SoC – with many GPUs available in between (Figure 1). Many of the high-end Tesla cards are designed into HPC datacenters, powering the largest supercomputers; Quadro cards are found in professional workstations; and the GeForce cards are used heavily by gamers and consumers alike in desktops and laptops. All are capable of running the same CUDA code.

Easily Scalable

Embedded systems – traditionally sensitive to size, weight, and power – have also capitalized on the performance and productivity gains of CUDA with ruggedized NVIDIA modules available for deployment in harsh environments. Applications that utilize the shared CUDA architecture can easily be scaled across platforms by selecting the right GPU for the deployment at hand. It’s all provided by CUDA which automatically scales an algorithm depending on the variable number of GPU cores available. CUDA’s threading model provides an efficient way to process vast datasets in parallel, independent of the underlying GPU hardware. The result is increased runtime performance and power efficiency over traditional serial CPUs, all the while remaining in a pure software environment (as opposed to firmware or ASIC) and reaping the productivity benefits of accelerated application development. Applications in the fields of imaging, signal processing, and scientific computing are prime targets for GPU acceleration and have been the focus of many CUDA developers. Users are able to execute a multidimensional grid of threads over a dataset. For example when processing an image, one lightweight CUDA thread is typically launched for each pixel in the image. Each thread runs a user-provided function, called a kernel, that is responsible for processing an element of the dataset (in this case, one pixel from the image). Generally this entails reading the data element from GPU global memory (typically GDDR5 or DDR3), performing any initial calculations, communicating 30 | RTC Magazine JANUARY 2015

Figure 1 The NVIDIA Tegra K1 mobile processor is rapidly becoming the basis of rugged embedded computing solution.

with neighboring data elements via on-chip shared memory, and storing any results back to global memory.

Fast switching

What really makes CUDA fast is that the GPU cores are able to switch between the lightweight threads very quickly. This means that when cores stall on memory accesses, the cores can quickly switch to different threads and continue to make progress where they can, instead of wasting cycles waiting for memory or other hazards. GPUs have large on-chip register caches, allowing them to keep many thousands of threads in flight simultaneously. This provides a large pool of work from which to schedule without needing to go out to global memory. Serial processors traditionally incur large overhead when switching between relatively heavyweight and monolithic CPU threads. Let’s take a look at the “hello world” of CUDA kernels – vector multiply/add. A * B + C is performed element-wise over each sample, with the result stored in A. First let us define the kernel function, which runs on the GPU and performs the vector operations (Source Listing 1).

CUDA kernels are written in a language very close to C/C++, with a few keywords and extensions added by NVIDIA. For example, the __global__ identifier enables the function to be invoked as a kernel that runs on the GPU. The blockDim, blockIdx, and threadIdx variables are built-in registers that contain the value of the current thread’s ID within the work grid. Traditional facilities like printf are available to aid in development and debugging. The source listing is part of a .CU source file; let’s call it vma.cu. It’s similar to a .C or .CPP source file, except CUDA source files can contain mixed CPU/GPU code. Our vma_kernel has three arguments – pointers to A, B, and C reference global memory buffers on the GPU that will be allocated in the following code section, which runs on the CPU and invokes our CUDA kernel (Source Listing 2). The main() function from the listing is the typical C/C++ application entry point. First, it allocates GPU global memory using a function called cudaMalloc() that’s provided by the CUDA driver. Providing APIs for GPU management, the CUDA driver runs on the CPU and appends operations to the GPU’s work queue. After allocating GPU memory with cudaMalloc(), a grid of thread blocks is launched using the <<< … >>> construct, which launches a CUDA kernel. The arguments that we call the kernel with from the CPU are transferred to the GPU and show up as the kernel’s parameters (in this case, the pointers to A, B, and C). Source Listing 2 is appended to the same .CU file as Source Listing 1. It contains mixed GPU/CPU code and will be compiled by NVIDIA’s NVCC compiler, part of the CUDA Toolkit.

The Tool Chain

The CUDA Toolkit, freely available for Windows and Linux (including ARM-based Linux4Tegra), provides the tools, headers, libraries, and documentation necessary for development with CUDA. Included is the compiler (NVCC), debugger (cuda-gdb), profiler (cuda-prof), Parallel NSight IDE plugins for Eclipse or Visual-Studio, in addition to many examples that highlight many of the features of CUDA. To compile our hello world example, run the command in Source Listing 3 after installing CUDA Toolkit. It will compile our vma.cu source file, containing the code from listings 1 & 2, into an executable called vma.out (or vma.exe on Windows). NVCC automatically adds CUDA Toolkit’s /include directory as an include path (-I), in which exists the header referenced by the #include<cuda_runtime.h> statement from listing 1. These headers define the CUDA extensions and constructs used by the GPU kernels. The headers also provide the CPU runtime driver APIs that provides calls like cudaMalloc(), cudaMemcpy(), and the ability to launch kernels. Shared libraries provide the backing implementation for the runtime driver APIs in addition to many useful application libraries like cuFFT, cuBLAS, and NPP (NVIDIA Performance Primitives). One great aspect of working with CUDA is that there are many existing libraries already available that are implement-

ed with CUDA under the covers. In fact, you can easily make GPU-accelerated applications without writing any CUDA kernels at all but instead calling into libraries like cuFFT, which in turn launch the CUDA kernels for you. Let’s take a look at an example of using cuFFT to execute a 1024-point FFT over 2^20 samples. cuFFT has an API reminiscent of FFTW and provides multi-dimensional FFTs & IFFTs, real & complex modes, single & double precision. This sample performs a real-to-complex forward 1D FFT (Source Listing 4). To compile the FFT example, run the command line in Source Listing 5. There are many useful libraries freely available for CUDA in addition to cuFFT, like the cuBLAS linear algebra library that provides CUDA matrix multiply, linear solvers, etc. NPP provides hundreds of image processing functions. NVIDIA’s new cuDNN library implements dynamic neural networks for GPU-accelerated machine learning. All of these are included with CUDA Toolkit. You can do a lot before ever having to write your own parallelized CUDA kernels although a little application-specific optimization never hurts . Source Listing 6??

Streaming Memory

Our examples so far have had a shortcoming for real-world use. Although we allocated GPU global memory before launching CUDA kernel, we never initialized the memory with data (from sensors or disk, for example). Likewise we never transferred the results of our CUDA computations back off the GPU. There are multiple ways that one can efficiently stream memory to/from GPUs, depending on the kind of device that the data is coming from or going to. Performing the memory I/O asynchronously is important for creating a continuously streaming CUDA processing pipeline, especially if the desired application has realtime characteristics. The user should allocated pinned memory from system RAM using the cudaHostAlloc() API. One can also use cudaHostRegister() to pin existing memory that was previously allocated with malloc(), for example. Users can queue DMA transfers between the host CPU and GPU using the cudaMemcpy() API. Using pinned memory in calls to cudaMemcpy() is important because pinned memory transfers to or from the GPU are performed asynchronously – i.e. the CPU returns immediately after posting the memcpy operation to the GPU’s work queue. If you were to send unpinned memory to cudaMemcpy(), the function would block the CPU until the transfer had completed (Source Listing 7).

ZeroCopy on Tegra K1

TK1’s integrated GPU and CPU physically share the same memory, which means we should use a feature in CUDA called ZeroCopy to eliminate unnecessary copies between CPU and GPU on TK1. After all, the CPU and GPU access all the same memory. If we pass the cudaHostAllocMapped flag to cudaHostAlloc(), the memory allocation is eligible to be mapped into GPU global memory space, which can be done RTC Magazine FEBRUARY 2015 | 31

TECHNOLOGY DEVELOPMENT CUDA with the cudaHostGetDevicePointer() function. Both the CPU and GPU will each have their pointers to the same shared memory (Source Listing 8).

GPUDirect – RDMA over PCI Express

It’s often the case that data is coming in from an external source - for example, a video capture card or network interface that speaks Ethernet or InfiniBand. Many of these devices are PCI Express peripherals that can provide I/O tailored to the application. Remote Direct Memory Access (RDMA) is a latency- and bandwidth-saving technique used to transport memory across devices with low overhead. Using a CUDA feature called GPUDirect, third party devices can stream data directly to or from the GPU. Before GPUDirect it used to take multiple copies, as third party devices first had to copy their data to system RAM, where the methods outlined above were then performed to get the data into GPU memory. Now, the memory is able to be streamed directly to the GPU, allowing for low-overhead intercommunication, reduced CPU usage, and low-latency asynchronous CUDA applications. GPUDirect allows GPUs to exist in a mixed heterogeneous environment alongside FPGAs, and other third party devices like network interfaces, solid state drives and storage RAID and so on. Many GPU-accelerated systems are deployed with customized FPGA interfaces on the front end, which acquire data in application-specific mediums and RDMA it to the GPU for the heavy floating-point math. The results are then typically RDMA’d to other compute nodes over a network fabric like 10GbE or InfiniBand. Following this blueprint where GPUDirect is used to interconnect processor nodes results in scalable system architectures.

Rendering to the Display with OpenGL or Direct3D

Many CUDA-accelerated applications, because of their rich multimedia nature, would like to render video or visualizations

Source Listing 1: #include <stdio.h> #include <cuda_runtime.h> //#define DEBUG_ME __global__ void vma_kernel( float* A, float* B, float* C ) { const int idx = blockDim.x * blockIdx.x + threadIdx.x; // load A, B, and C from global memory and perform the computation const float result = A[idx] * B[idx] + C[idx]; // store the output in vector A A[idx] = result; // printf works (but is slow when printing from every thread) #ifdef DEBUG_ME printf(“block %i thread %i result %f\n”, blockIdx.x, threadIdx.x, result); #endif }

Source Listing 2: int main( int arc, char** argv ) { const int numElements = 1048576; const int bufferSize = numElements * sizeof(float); float* A = NULL; float* B = NULL; float* C = NULL; // allocate global memory on the GPU cudaMalloc(&A, bufferSize); cudaMalloc(&B, bufferSize); cudaMalloc(&C, bufferSize); if( !A || !B || !C ) return 1; // parameterize the vector across a grid of thread blocks const dim3 block(512); // number of threads per block const dim3 grid(numElements/block.x); // number of blocks per grid // asynchronously launch the kernel on the GPU vma_kernel<<<grid, block>>>( A, B, C ); // control will be returned to the CPU immediately after queuing the kernel // at this point, it’s normal to queue more kernels to the pipeline. // wait for the GPU to finish all work cudaDeviceSynchronize(); // release GPU memory before exiting cudaFree(A); cudaFree(B); cudaFree(C); return 0; }

Source Listing 3: nvcc vma.cu –o vma.out –gencode=arch=compute_50,code=compute_50

Source Listing 4: #include <cufft.h> int main( int arc, char** argv ) { const int numSamples = 1048576; const int numTaps = 1024; // create a cuFFT plan for a real-to-complex 1024-point FFT cufftHandle plan; if( cufftPlan1d(&plan, numTaps, CUFFT_R2C, numSamples / numTaps) != CUFFT_SUCCESS ) return 1; // allocate CUDA global memory const int realSize = numSamples * sizeof(float); const int complexSize = numSamples * sizeof(float2); float* realIn = NULL; float2* complexOut = NULL; cudaMalloc(&realIn, realSize); cudaMalloc(&complexOut, complexSize); if( !realIn || !complexOut ) return 1;

Figure 2 Each Tegra K1 on this multiprocessor board developed by GE’s Intelligent Platforms business is capable of delivering well over 300 GigaFLOPS while consuming only 10 watts of power.

32 | RTC Magazine FEBRUARY 2015

of some kind. If an NVIDIA GPU is connected to the display, CUDA applications can utilize a shortcut – OpenGL/Direct3D interoperability – to decrease the overhead and latency of transferring memory from CUDA global memory to OpenGL/ Direct3D buffers or textures. If the GPU doing the rendering is different than the one running CUDA, the interoperability layer will automatically copy the buffer GPU↔GPU. If the same GPU is both driving the display and simultaneously running CUDA, the memory remains on-GPU and no external transfer is necessary. Without the interoperability layer, CUDA memory would need to be copied back to CPU system RAM before being re-uploaded to the OpenGL/Direct3D buffer or texture. The benefits are very similar to how GPUDirect RDMA avoids unnecessary copies to or from system RAM.

CUDA at Work and Home

Real-world processing pipelines can be built by chaining together multiple CUDA kernels (whether implemented yourself or invoked via a library like cuFFT or cuBLAS along with the memory streaming operations for your particular dataflow. Due to convenient developer tools, parallelized libraries, and leading performance out-of-the-box, applications can be developed more quickly with CUDA than other technologies that attempt similar compute density but require significant resources to program. CUDA’s flexible software programmability results in rapid application development and reduced project timelines. Each new GPU architecture launched by NVIDIA provides a steady march of performance improvements, many taken advantage of automatically without requiring any updates to existing CUDA code. Embedded devices and systems utilizing CUDA benefit not only from the ample compute horsepower at their disposal, but also from the shortened development cycles and constant infusion of new features. CUDA allows everyone to access the vast possibilities offered by the processing power of GPUs. CUDA has a very low barrier of entry for anyone interested. The CUDA Toolkit, NVIDIA drivers, and an ecosystem of CUDA libraries are all provided for free by NVIDIA. New and updated versions are available for download every couple of months from their website. What’s more are the millions of lines of open-source CUDA code freely available online, covering everything from gene sequencing and protein folding to machine learning and computer vision. Anybody who has an NVIDIA GPU from the last eight years in their desktop or laptop can run CUDA-accelerated applications. By using CUDA, anyone can make high-performance systems and applications that leverage the efficiency of GPUs. What will you build today with CUDA? General Electric Intelligent Platforms Charlottesville, VA (780) 401-7700 www.ge-ip.com

Source Listing 5: nvcc fft.cu –o fft.out -lcufft

Source Listing 6: // execute the FFT cufftExecR2C(plan, realIn, complexOut); // wait for the GPU to finish all work cudaThreadSynchronize(); // release GPU memory before exiting cudaFree(realIn); cudaFree(complexOut);

return 0;

Source Listing 7: const int samples = 16384; const size_t size = samples * sizeof(float); // allocate memory on the CPU and GPU float* hostPtr = NULL; float* cudaPtr = NULL; cudaHostAlloc(&hostPtr, size, 0); // pinned memory from system RAM cudaMalloc(&cudaPtr, size); // cuda global memory if( !hostPtr || !cudaPtr ) return; // perform a basic initialization of CPU memory for( int i=0; i < samples; i++ ) hostPtr[i] = float(i); // copy from CPU to GPU cudaMemcpy(&cudaPtr, &hostPtr, size, cudaMemcpyHostToDevice); // launch an example kernel const dim3 block(512); const dim3 grid(samples/block.x); my_kernel<<<grid, block>>>( cudaPtr ); // copy the results back to the CPU cudaMemcpy(&hostPtr, &cudaPtr, size, cudaMemcpyDeviceToHost); // wait for the above asynchronous operations to complete cudaThreadSynchronize(); // free memory cudaFreeHost(hostPtr); cudaFree(cudaPtr);

Source Listing 8: const int samples = 16384; const size_t size = samples * sizeof(float); float* hostPtr = NULL; float* cudaPtr = NULL; // allocate CPU memory that’s eligible to be mapped to the GPU cudaHostAlloc(&hostPtr, size, cudaHostAllocMapped); if( !hostPtr ) return; // map the memory into CUDA address space using ZeroCopy cudaHostGetDevicePointer(&cudaPtr, hostPtr, 0); if( !cudaPtr ) return; // any changes we make to hostPtr on the CPU will be reflected // on the GPU, because the pointers are mapped to the same memory. for( int i=0; i < samples; i++ ) hostPtr[i] = float(i); // cudaPtr can be used and referenced in CUDA kernels as before. // note that no cudaMemcpy() is necessary because it’s the same memory. const dim3 block(512); const dim3 grid(samples/block.x); my_kernel<<<grid, block>>>( cudaPtr );

RTC Magazine FEBRUARY 2015 | 33

INDUSTRY WATCH SMALL BOARD PROTOTYPING

Creating an ApplicationSpecific Operating System to Power an Arduino Robot We describe how we used a software tool called SynthOS withman off-theshelf robot kit, to developed control algorithms, and create an RTOS to schedule and coordinate the various robot tasks. This demonstrates how writing code for SynthOS is straightforward, and SynthOS can easily adapt an RTOS to a very constrained platform. by Igor Serikov and Jacob Harel, Zeidman Technologies

34 | RTC Magazine FEBRUARY 2015

At Zeidman Technologies, we wanted to create a project using our SynthOS software tool, which automatically generates an optimized real-time operating system at the push of a button. We call such an RTOS an “application specific operating system” or “ASOS.” We decided to create a multitasking robot based on an Arduino processor. The requirement was to build a robot that can move around an obstacle course while avoiding hitting walls and objects and not getting trapped in narrow sites. Furthermore, it should be able to adjust its speed for the left and right tracks independently, and if anything fails, such as one of the tracks getting stuck, it should give an indication (a beep) and shut down its power.

Hardware Platform

We started with an off-the-shelf DFRobotShop Rover V2 kit from RobotShop Distribution. The robot is shown in Figures 1 and 2. It contains the following parts: • Main control circuit board using the Arduino UNO architecture. • Atmel ATtmega328p microprocessor • Battery and on-board battery charger • Twin-motor gear box for controlling two tracks, each consisting of wheels with rubber treads • Two encoders for determining motor position, one for each track motor • Infrared compound eye consisting of 4 pairs of IR photo transistors and 4 IR LEDs to measure both reflected IR • Ultrasound sensor board for generating sound and measuring reflected sound. • Pan and tilt servo motors • Buzzer • I/O expansion shield The board uses a slightly modified Arduino UNO reference design. It incorporates an Atmel ATmega328p microprocessor. The ATmega328p is a system-on-chip with 2K SRAM and 32K flash memory running at 16MHz. The CPU uses a Harvard architecture where code and data reside in separate address spaces. Programming the ATmega328p was done in C because this language is common for embedded systems and is supported by SynthOS. We used the following tools: • avr-gcc, binutils-avr, and avr-libc: a GNU tool chain for Atmel chips • avrdude: a firmware upload utility that talks to Arduino bootloader

Overview of the Project

In order to avoid hitting any objects, the robot uses an ultrasonic sensor. This sensor can detect objects at a short distance. Using the ultrasound sensor, the robot continuously scans its surroundings, turning the sensor left and right for a total of nearly 180 degrees. This wide scanning sector enables the robot to approach an obstacle at a very shallow angle. Since ultrasound behaves like light, the angle of reflection is equal to the angle of incidence. We found that the robot could not detect vertically

Figure 1 Rover V2 robot front view

inclined surfaces, and some curvy surfaces can cause erroneous distance readings. To find the minimum distance from an object at which the robot will have to take action, we noted that the next time the robot detected the object is when the robot head moves back in the opposite direction after finishing the scanning sector and returning to the original position. The robot should move slowly enough to avoid hitting the object before seeing it the next time. We have calculated the relation between the robot speed and the full turn time. The real minimal distance could be a bit smaller than this because the robot would not really hit the object detected near the very edge of the sector unless it is a wall approached at a shallow angle. When the robot detects an object that is close enough, it turns left. Moving the tracks in opposite directions allows it to make a turn with a nearly zero radius. Because the robot always turns in the same direction, it does not get stuck in narrow sites. After the robot makes a turn, it has to stop and make one complete scanning of its surrounding before moving any further. The robot constantly monitors the speed of both tracks using the encoders. The encoder wheel has 16 sectors. Connected to an analog input, and sampled via a 10 bit A-to-D converter, the encoder emits a sine wave whenever the wheel rolls. To gauge the speed, the firmware measures the time between two adjacent half-waves. To eliminate outliers, the algorithm arranges every three consecutive samples and picks the middle one. Motor overheating is not a concern for this project since the motor speed is not high, and if the motor gets stuck, it will be turned off by the task associated with this track. For that reason, we do not use the temperature sensing capability. The torque and speed of each of the motors are adjusted independently, while turning requires higher torque than moving straight.

RTC Magazine FEBRUARY 2015 | 35

INDUSTRY WATCH SMALL BOARD PROTOTYPING about 342 m/s at sea level. Sometimes, the ultrasonic sensor will get no response and will set the pulse width to the maximum. Besides open space cases and curvy, inclined surfaces, this was found to be caused by objects that do not reflect sound, for example, pillows.

Using SynthOS

Figure 2 Rover V2 robot side view

The servo motors’ control circuits use pulse width modulation for determining the position where the number of pulses determines the position. It might take several pulses to reach the desired position, if the current position and the required one are far apart since every pulse turns the shaft a small angle. It is recommended to send a pulse every 20 milliseconds. To send pulses of precise width we use delays with interrupts disabled. This is acceptable since the delays are below two milliseconds.

Timers and Interrupts

The firmware uses a hardware timer for all time-related tasks via timer interrupts. The divider is set to 1024, and the CPU clock rate is 16 MHz, so the time counter register increments every 1024*1/16000000=64 microseconds. The timer generates an interrupt when the counter reaches 156, interrupting roughly every 10 milliseconds. The ultrasonic sensor reports the moments of sending an acoustic burst and getting an echo by raising and dropping its output. The width of the pulse is equal to the round trip time for the sound to travel to and from the object. Since the time counter register increments every 64 microseconds, the measuring error is around 0.011 meter, calculated to be about 0.43 inch given the speed of sound, which is

Figure 3 SynthOS code generation

36 | RTC Magazine FEBRUARY 2015

SynthOS allowed us to write our code in C. When one task needed to call another task, or wait for another task to complete, we inserted a special line of code that is recognized by SynthOS, called a “primitive.” We also created a simple project file to specify the parameters of each task such as the task’s priority and its frequency. SynthOS was then run on all of the task code. SynthOS created the appropriate semaphores and flags for each task and inserted the appropriate code at the appropriate points in the task code. SynthOS also created task management code to manage the tasks and their associated flags and semaphores. A generic diagram of the resulting code is given in Figure 3 with a more detailed diagram given in Figure 4. Note that each task is mostly written by us, but SynthOS inserts the code required by the ASOS into each task, and generates the ASOS that controls execution of each task. SynthOS automated the process of creating the operating system so that we could focus on writing the tasks for the robot and its sensors. Because the output of SynthOS is in C, we had complete visibility to everything going on in the operating system. All of the tools that we normally use to compile and debug the tasks were also used to compile and debug the operating system. The operating system code could also be easily modified by hand, if necessary, giving us complete control over the code. Developing the code with SynthOS to the specific hardware platform was fairly straightforward. First we needed to define three required routines to deal with interrupts. In our SynthOS project file (project.sop), we put the following variable settings that’s define the user functions for the specific function-

Figure 4 SynthOS generated system components

ality (Source Listing 1). In this implementation we had to augment a couple of standard headers because SynthOS does not yet support some of the features used by the Atmel libraries. For example, we converted variadic macros and functions that take variable numbers of arguments, like printf(), to macros and functions that take a fixed number of arguments because SynthOS does not yet support variadic macros and functions.. We created a couple of our own headers that include the standard library functions and added some customization to support for SynthOS: • Set __ATTR_CONST__ to the empty string before including a header that uses it. • Redefined ISR in the Atmel support library that is used in the project to be a non-variadic macro. • Un-defined the macros sei (set interrupt) and cli (clear interrupt), and instead created two in-line functions. After adapting the environment to work with SynthOS we started working on the system architecture. Based on the system requirements we defined the following main tasks, which are illustrated in Figure 5: • robot: Scanning and high level control loop task to manage the scanning for obstacles using ultrasound detection and distance measurements. This task is the main control task in the system. It synchronizes with the other tasks via SynthOS primitives;

• left_motor: Left motor control loop task – manage the speed and direction of left tracks and will turn off the motor is it gets stuck; • right_motor: Right motor control loop task – manage the speed and direction of right tracks and will turn off the motor is it gets stuck; • drive_pan: A call task that manages the scanning and movements of the ultrasound sensor, left and right. • print: A call task that manages all the error and messages communication via the UART port. • ultrasonic_measure: A call task that controls the ultrasonic sensor transmit and receive, and calculates the distance to an object based on the ultrasound echo. SynthOS made writing code for task communication simple. Using the SynthOS_wait(cond) primitive, a task can wait for any condition that can be expressed using global variables and constants. On the trigger side, nothing needs to be done at all; SynthOS automatically inserts the code that monitors the affected variables and activates waiting tasks when necessary. A good example of using this mechanism is programming delays, which is very easy to implement. The only required variables were a tick counter which we named clock, and xyz_timer for every waiting task that holds the counter value at the delay start. Then, when we needed a delay in a task we used the following code:

Figure 5 Robot software architecture

RTC Magazine FEBRUARY 2015 | 37

INDUSTRY WATCH SMALL BOARD PROTOTYPING

Figure 6 Synchronizing tasks and interrupts using SynthOS_wait()

xyz_timer = clock; SynthOS_wait (clock - xyz_timer ticks_to_wait);

The timer interrupt routine just increments clock and SynthOS does the rest. Note that the timer interrupt routine has no specific knowledge about waiting tasks, and SynthOS has no specific knowledge about our timekeeping. When we needed a more precise delay, we first used the SynthOS_wait() primitive to approximate the delay and a loop containing a SynthOS_sleep() primitive for the rest of the interval. Here is an example: start = pclock (); xyz_timer = clock; SynthOS_wait (clock - xyz_timer >= ticks_to_wait); while (pdiff (start, pclock ()) < ticks_to_wait * clock_divider) SynthOS_sleep (); The function pclock() reports the time in CPU time register increments and the function pdiff(start, end) calculates the time interval. A general diagram of interrupt handling with SynthOS is given in Figure 6. We are using the UART communication port only for debugging since the robot is completely autonomous (it is actually USART â&#x20AC;&#x201C; universal synchronous asynchronous receiver transmitter). We followed the tradition approach of using circular 38 | RTC Magazine FEBRUARY 2015

input and output buffers. Again, we used SynthOS ability to track variables to synchronize tasks with the UART interrupt handlers. The interrupt handlers update buffer pointers and tasks just wait for proper conditions (Source Listing 2). The communication and debugging are done using a custom command similar to printf(). We could not use the standard printf() function because it calls a put character routine, which is blocking. SynthOS restricts the use of blocking RAM

FLASH

FreeRTOS

1.25K

12.8K

SynthOS ASOS (with debug code)

1.00K

13.5K

SynthOS ASOS (without debug code)

0.5K

8.6K

Table 1 Comparison of FreeRTOS and SnythOS-generated ASOS

primitives to top level functions. Therefore we re-implemented printf() as a SynthOS call task that embeds SynthOS blocking primitives. It is worth mentioning that in SynthOS, there is no need to declare synchronization variables as volatile unless they are modified by the interrupt handlers. This is because all control transfers are transparent to the compiler. The use of the SynthOS simple 5 primitives enabled us to create a simple and straightforward architecture that included all the RTOS functionality to lunch and synchronize the high-level tasks. The SynthOS tool also generated very efficient code to implement this system with the system using only 1K of RAM and 13.5K of flash including debug code. Taking out debug code reduced the RAM usage to 0.5K and 8.6K of flash. We ported the code to FreeRTOS, where it required 1.25K or RAM and 12.8K of flash, significantly more than our SynthOS-based solution. This can be seen in Table 1. This project was a very interesting example implementation of an embedded control system using the SynthOS ASOS generation tool to create a sophisticated yet simple multi-tasking RTOS and easily integrate it into our system. This implementation shows the strength of the ASOS generated by SynthOS not only to create basic multi-tasking system architecture but also to enable system synchronization at all levels from interrupts to high-level system management. Using SynthOS allowed us to quickly come up with clean and succinct code that has a very small footprint. The code of the project can be downloaded at www.zeidman.biz/downloads.htm. A video of the final robot can be seen at http://youtu.be/HzCGSk202gY. SynthOS is available to use online completely for free at www.SynthOSonline.com. Zeidman Technologies Cupertino, CA (408) 741-5809 www.zeidman.biz 1. Available at: www.robotshop.com/en/dfrobotshop-rover-v2-autonomous.html 2. Available at: www.nongnu.org/avr-libc 3. Available at: www.nongnu.org/avrdude

Source Listing 1: [interrupt_global] enable = ON getMask = get_mask setMask = set_mask enableAll = enable_ints And add the relevant routines to the application to support the interrupt functionality: void enable_ints (void) { sei (); } int get_mask (void) { int mask = SREG; cli (); return mask; } void set_mask (int mask) { SREG = (uint8_t) mask; }

Source Listing 2: #define uart_put_byte(b) do { unsigned char _b = (b); while (!((uart_send_put + 1) % UART_SEND_BUFFER_SIZE != uart_ send_get)) SynthOS_wait((uart_send_put + 1) % UART_SEND_BUFFER_SIZE != uart_send_get); uart_send_buf [uart_send_put] = _b; uart_send_put = (uart_send_put + 1) % UART_SEND_BUFFER_SIZE; uart_transmit (); } while (0) #define uart_get_byte(l) do { while (!(uart_receive_get != uart_receive_put)) SynthOS_wait(uart_receive_get != uart_receive_put); l = uart_receive_buf [uart_receive_get]; uart_receive_get = (uart_receive_get + 1) % UART_RECEIVE_BUFFER_SIZE; } while (0)

\ \ \ \

\ \ \ \ \ \ \ \ \

RTC Magazine FEBRUARY 2015 | 39

PRODUCTS & TECHNOLOGY

COM Express and Thin Mini-ITX on Fifth Generation Core Processor

congatec is expanding its product range with the fifth generation Intel Core processor platform up to Core i7-5650U on COM Express Computer-On-Modules and Thin Mini-ITX motherboards. The single-chip processors feature a power consumption of just 15W TDP. Built on Intel’s new 14nm process technology, the fifth generation Intel Core processor is designed to provide excellent graphics and performance, supporting the next generation of congatec’s COM Express and Thin-Mini-ITX boards for Internet of Things (IoT) solutions, while maintaining compatibility with previous generations. The new Intel HD Graphics 5500 and 6000 found in the fifth generation processors delivers stunning and responsive visuals, including Ultra HD 4k display and additional codec support. Enhanced security and manageability features help to drive down total cost and risk, protecting data and preventing malware threats. Both the COM Express module and the Thin Mini-ITX motherboard allow for the connection of up to three independent display interfaces via HDMI 1.4, LVDS and embedded DisplayPort (eDP). When using DisplayPort 1.2, the individual displays can be daisy chained to take advantage of simple wiring. Native USB 3.0 support provides fast data transmission with low power consumption. The two SODIMM sockets can be equipped with up to 16 GB SODIMM DDR3L memory. The COM Express compact Type 6 module, the conga-TC97, offers flexibility and customization abilities for the application. A total of eight USB ports are provided, two of them support USB 3.0 SuperSpeed. Four PCI Express 2.0 lanes, four SATA ports with up to 6 Gb/s, RAID support and a Gigabit Ethernet interface enable fast and flexible system extensions. The conga-IC97 supports customers who need high quality single board computers (SBCs) with long-term availability. The flat design of Thin Mini-ITX – measuring 25mm in height with I/O shield – enables flat housings, such as those required for panel PCs. Four USB 3.0 SuperSpeed ports are directly available on the I/O shield. A total of two 5 Gbit/s PCI Express 2.0 lanes can be used as mPCIe Half Size and PCIe Full Size shared with PCIe x1 and mSATA. Fast and flexible system extensions are possible thanks to four SATA interfaces with up to 6 Gb/s plus one mini PCIe. Two Intel I210 Gigabit Ethernet controllers each provide one Gigabit Ethernet LAN access via the two RJ45 sockets. The universal power source with 12 to 24 volts complete the feature set. congatec, San Diego, CA (858) 457-2600. www.congatec.com

40 | RTC Magazine FEBRUARY 2015

High Performance Fourth Gen Core I7-Based 3U VPX SBC

A fourth generation Intel Core i7 single board computer (SBC) is now in a 3U OpenVPX small form factor package. The rugged commercial-off-the-shelf (COTS) VPX3-1258 from Curtiss-Wright Defense Solutions features Intel’s latest fourth generation Intel Core i7 (“Haswell”) chip-set. It is designed for use in compute-intensive applications deployed in space, weight and power (SWaP)-constrained platforms. The board delivers 2.4GHz quad-core (8-thread) performance supported with up to 16 GB of dual-channel high-speed ECC-protected DDR3 memory. With its support for a “seamless” upgrade path between older and future Intel processors, the board simplifies customers’ long-term technology plans. Because it is pin-compatible with previous generations of Curtiss-Wright Intel-based SBCs, the VPX3-1258 is suitable for use in technology upgrade programs. What’s more, with built-in support for “drop-in” processor replacement, the VPX3-1258 eases future upgrades to Intel’s fifth generation Core i7 processor, which isexpected to be released in 2015. This fully ruggedized board speeds and simplifies the integration of high performance processing into demanding defense and aerospace deployed applications such as mission computing, general processing, virtualization and small multi-SBC ISR. The VPX3-1258 is easily integrated with other members of Curtiss-Wright’s extensive 3U OpenVPX product family, including Intel, Power Architecture and ARM-based SBCs, powerful graphics modules, and DSP and FPGA engines to integrate powerful ISR/SIGINT applications. Curtiss-Wright Defense Solutions, Ashburn, VA. (661) 705-1142. www.cwcdefense.com

PRODUCTS & TECHNOLOGY

Online Tool for Designing Custom Embedded Systems

Gumstix, has announced Geppetto 2.0, the most advanced version of Gumstix’ online build-to-order tool. Geppetto 2.0 introduces Tux-Approved recommended mappings for buses, ensuring optimal compatibility between customer-created hardware and standard Linux images. In addition, 2.0 offers an expanded module selection, improved dimensioning, faster UI, and video tutorials. As part of this announcement, the Geppetto-designed AeroCoreT 2 MAV Control Board (compatible with OveroT COMs) and the Geppetto-designed PepperT DVI-D single-board computer (SBC) are now available. With manufacturing setup priced at just $1999 USD, and with affordable perunit costs, Geppetto offers electronics designers an inexpensive way to reduce time-to-market and design costs for their custom embedded systems. Designed to power intelligent, next-generation micro-aerial vehicles (MAVs), the Geppetto-designed AeroCore 2 (compatible with Gumstix’ Overo COMs) offers enhanced flexibility compared to its predecessor. The AeroCore 2 gives MAV developers greater selection in finding a computing solution tailored to their needs, adding CAM, Spektrum RC and GPS interfaces to the ARM Cortex-M4 powered board. Priced at $149 USD, the AeroCore 2 has also offloaded GPS functionality onto a separate module using an industry-standard connector, thus enhancing functional modularity and choice while reducing cost. The Geppetto-designed Pepper DVI-D SBC is a powerful, complete and compact solution for embedded developers interested in Cortex A8 ARM processors. Featuring the Texas Instruments Sitara AM3354 processor, the Pepper DVI-D (priced at $119 USD) offers high-definition video output, 512 MB RAM, WiFi and Bluetooth, a microSD card slot, audio connectivity, a console port and two USB On-The-Go ports.

Smart, Small and Integrated Wi-Fi Platform Solution for Complete Connectivity for IoT

A new suite of smart Wi-Fi module solutions for the Internet of Things includes software, modules and development kits. The newest EC19W01 802.11b/g/n Wi-Fi modules from Econais offer lower power drain (12.33uA) and feature a fully integrated 32 bit MCU, Wi-Fi, cloud connectivity, flash memory and antenna certified for FCC, EC, IC, and RoHS/REACH. For ease of use and configuration, the EC19W01 modules now support the Apple Bonjour, MQTT messaging protocol, WPS Configuration, and much more. Econais’ unique ProbMe API is also included and enables end users, integrators and manufacturers to rapidly install and configure numerous Wi-Fi devices simultaneously supporting devices across Android, iOS, and Windows platforms. Advanced features include WPA Enterprise (TLS, TTLS, PEAP), Certification installation for Serial to Wi-Fi, Roaming support for Serial to Wi-Fi, and seamless usage of plain and secure sockets with TLS 1.2 support. The new software greatly improves TCP and UDP network performance and increases SPI and UART stability and performance. The assortment of critical features now includes Over The Air (OTA) system updates, complete Wi-Fi Direct (P2P) API for Serial to Wi-Fi, and Wi-Fi Monitor mode (Sniffer Mode). The EC19W01 Modules and Development Kits are currently available for purchase through the Econais global representative and distribution network. Econais, San Jose, CA. 408-827-8331 www.econais.com

Gumstix, Redwood City, CA www.gumstix.com

RTC Magazine FEBRUARY 2015 | 41

ADVERTISER INDEX

Company...........................................................................Page................................................................................Website congatec, Inc........................................................................................................4..............................................................................................www.congatec.us Dolphin....................................................................................................................25.................................................................................... www.dolphinics.com EDT..............................................................................................................................4........................................................................................................www.edt.com Intelligent Systems Source....................................................................19,23............................................. www.intelligentsystemssource.com InterDrone.............................................................................................................43................................................................................... www.InterDrone.com One Stop Systems......................................................................................15, 44..................................................................www.onestopsystems.com Pentek.......................................................................................................................9.............................................................................................. www.pentek.com Super Micro Computers, Inc.....................................................................5.................................................................................... www.supermicro.com Trenton Systems................................................................................................ 2........................................................................ www.TrentonSystems.com RTC (Issn#1092-1524) magazine is published monthly at 905 Calle Amanecer, Ste. 150, San Clemente, CA 92673. Periodical postage paid at San Clemente and at additional mailing offices. POSTMASTER: Send address changes to The RTC Group, 905 Calle Amanecer, Ste. 150, San Clemente, CA 92673.

The Event for Embedded, M2M and IoT Technology 2015 Real-Time & Embedded Computing Conferences Boston, MA – April 07

Orange County, CA – August 27

Ottawa, ON – April 21

Minneapolis, MN –October 06

Toronto, ON – April 23

Chicago, IL – October 08

San Diego, CA – August 21

Seattle, WA – November 05

For Information: The RTC Group, Inc. 905 Calle Amanecer, Suite 250

42 | RTC Magazine FEBRUARY 2015

San Clemente, CA 92673 Call: (949) 226-2000

Attend

Two Awesome Conferences: For Builders

For Flyers and Buyers

More than 30 classes, tutorials and panels for hardware and embedded engineers, designers and software developers building commercial drones and the software that controls them.

More than 30 tutorials and classes on drone operations, ďŹ&#x201A;ying tips and tricks, range, navigation, payloads, stability, avoiding crashes, power, environmental considerations, which drone is for you, and more!

Meet with 80+ exhibitors!

September 9-10-11, 2015 Rio, Las Vegas www.InterDrone.com

Demos! Panels! Keynotes! The Zipline!

A BZ Media Event