
This edition first published 2017
© 2017 John Wiley & Sons, Ltd
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Roger Woods, John McAllister, Gaye Lightbody and Ying Yi to be identified as the authors of this work has been asserted in accordance with law.
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
John Wiley & Sons, Ltd., The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial Office
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
Library of Congress Cataloging-in-Publication Data
Names: Woods, Roger, 1963- author. | McAllister, John, 1979- author. |
Lightbody, Gaye, author. | Yi, Ying (Electrical engineer), author.
Title: FPGA-based implementation of signal processing systems / Roger Woods,
John McAllister, Gaye Lightbody, Ying Yi.
Description: Second editon. | Hoboken, NJ : John Wiley & Sons Inc., 2017. |
Revised edition of: FPGA-based implementation of signal processing systems /
Roger Woods … [et al.]. 2008. | Includes bibliographical references and index.
Identifiers: LCCN 2016051193 | ISBN 9781119077954 (cloth) | ISBN 9781119077978 (epdf) |
ISBN 9781119077961 (epub)
Subjects: LCSH: Signal processing--Digital techniques. | Digital integrated
circuits. | Field programmable gate arrays.
Classification: LCC TK5102.5 .F647 2017 | DDC 621.382/2--dc23 LC record available at
https://lccn.loc.gov/2016051193
Cover Design: Wiley
Cover Image: © filo/Gettyimages;
(Graph) Courtesy of the authors
The book is dedicated by the main author to his wife, Pauline, for all for her support and care, particularly over the past two years.
The support from staff from the Royal Victoria Hospital and Musgrave Park Hospital is greatly appreciated.
Digital signal processing (DSP) is the cornerstone of many products and services in the digital age. It is used in applications such as high-definition TV, mobile telephony, digital audio, multimedia, digital cameras, radar, sonar detectors, biomedical imaging, global positioning, digital radio, speech recognition, to name but a few! The evolution of DSP solutions has been driven by application requirements which, in turn, have only been possible to realize because of developments in silicon chip technology. Currently, a mix of programmable and dedicated system-on-chip (SoC) solutions are required for these applications and thus this has been a highly active area of research and development over the past four decades.
The result has been the emergence of numerous technologies for DSP implementation, ranging from simple microcontrollers right through to dedicated SoC solutions which form the basis of high-volume products such as smartphones. With the architectural developments that have occurred in field programmable gate arrays (FPGAs) over the years, it is clear that they should be considered as a viable DSP technology. Indeed, developments made by FPGA vendors would support this view of their technology. There are strong commercial pressures driving adoption of FPGA technology across a range of applications and by a number of commercial drivers.
The increasing costs of developing silicon technology implementations have put considerable pressure on the ability to create dedicated SoC systems. In the mobile phone market, volumes are such that dedicated SoC systems are required to meet stringent energy requirements, so application-specific solutions have emerged which vary in their degree of programmability, energy requirements and cost. The need to balance these requirements suggests that many of these technologies will coexist in the immediate future, and indeed many hybrid technologies are starting to emerge. This, of course, creates a considerable interest in using technology that is programmable as this acts to considerably reduce risks in developing new technologies.
Commonly used DSP technologies encompass software programmable solutions such as microcontrollers and DSP microprocessors. With the inclusion of dedicated DSP processing engines, FPGA technology has now emerged as a strong DSP technology. Their key advantage is that they enable users to create system architectures which allow the resources to be best matched to the system processing needs. Whilst memory resources are limited, they have a very high-bandwidth, on-chip capability. Whilst the prefabricated aspect of FPGAs avoids many of the deep problems met when developing SoC implementations, the creation of an efficient implementation from a DSP system description remains a highly convoluted problem which is a core theme of this book.
The book looks to address FPGA-based DSP systems, considering implementation at numerous levels.
The book covers these areas of FPGA implementation, but its key differentiating factor is that it concentrates on the second and third areas listed above, namely the creation of circuit architectures and system-level modeling; this is because circuit-level optimization techniques have been covered in greater detail elsewhere. The work is backed up with the authors’ experiences in implementing practical real DSP systems and covers numerous examples including an adaptive beamformer based on a QR-based recursive least squares (RLS) filter, finite impulse response (FIR) and infinite impulse response (IIR) filters, a full search motion estimation and a fast Fourier transform (FFT) system for electronic support measures. The book also considers the development of intellectual property (IP) cores as this has become a critical aspect in the creation of DSP systems. One chapter is given over to describing the creation of such IP cores and another to the creation of an adaptive filtering core.
The book is aimed at working engineers who are interested in using FPGA technology efficiently in signal and data processing applications. The earlier chapters will be of interest to graduates and students completing their studies, taking the readers through a number of simple examples that show the trade-off when mapping DSP systems into FPGA hardware. The middle part of the book contains a number of illustrative, complex DSP system examples that have been implemented using FPGAs and whose performance clearly illustrates the benefit of their use. They provide insights into how to best use the complex FPGA technology to produce solutions optimized for speed, area and power which the authors believe is missing from current literature. The book summarizes over 30 years of learned experience of implementing complex DSP systems undertaken in many cases with commercial partners.
The second edition has been updated and improved in a number of ways. It has been updated to reflect technology evolutions in FPGA technology, to acknowledge developments in programming and synthesis tools, to reflect on algorithms for Big Data applications, and to include improvements to some background chapters. The text has also been updated using relevant examples where appropriate.
Technology update: As FPGAs are linked to silicon technology advances, their architecture continually changes, and this is reflected in Chapter 5. A major change is the inclusion of the ARM® processor core resulting in a shift for FPGAs to a heterogeneous computing platform. Moreover, the increased use of graphical processing units (GPUs) in DSP systems is reflected in Chapter 4.
Programming tools update: Since the first edition was published, there have been a number of innovations in tool developments, particularly in the creation of commercial C-based high-level synthesis (HLS) and open computing language (OpenCL) tools. The material in Chapter 7 has been updated to reflect these changes, and Chapter 10 has been changed to reflect the changes in model-based synthesis tools.
“Big Data” processing: DSP involves processing of data content such as audio, speech, music and video information, but there is now great interest in collating huge data sets from on-line facilities and processing them quickly. As FPGAs have started to gain some traction in this area, a new chapter, Chapter 12, has been added to reflect this development.
The FPGA is a heterogeneous platform comprising complex resources such as hard and soft processors, dedicated blocks optimized for processing DSP functions and processing elements connected by both programmable and fast, dedicated interconnections. The book focuses on the challenges of implementing DSP systems on such platforms with a concentration on the high-level mapping of DSP algorithms into suitable circuit architectures.
The material is organized into three main sections.
Chapter 2 starts with a DSP primer, covering both FIR and IIR filtering, transforms including the FFT and discrete cosine transform (DCT) and concluding with adaptive filtering algorithms, covering both the least mean squares (LMS) and RLS algorithms. Chapter 3 is dedicated to computer arithmetic and covers number systems, arithmetic functions and alternative number representations such as logarithmic number representations (LNS) and coordinate rotation digital computer (CORDIC). Chapter 4 covers the technologies available to implement DSP algorithms and includes microprocessors, DSP microprocessors, GPUs and SoC architectures, including systolic arrays. In Chapter 5, a detailed description of commercial FPGAs is given with a concentration on the two main vendors, namely Xilinx and Altera, specifically their UltraScaleTM/Zynq® and Stratix® 10 FPGA families respectively, but also covering technology offerings from Lattice and MicroSemi.
This section covers efficient implementation from circuit architecture onto specific FPGA families; creation of circuit architecture from SFG representations; and system-level specification and implementation methodologies from high-level representations. Chapter 6 covers only briefly the efficient implementation of FPGA designs from circuit architecture descriptions as many of these approaches have been published; the text covers distributed arithmetic and reduced coefficient multiplier approaches and shows how these have been applied to fixed coefficient filters and DSP transforms. Chapter 7 covers HLS for FPGA design including new sections to reflect Xilinx’s Vivado HLS tool flow and also Altera’s OpenCL approach. The process of mapping SFG representations of DSP algorithms onto circuit architectures (the starting point in Chapter 6) is then described in Chapter 8. It shows how dataflow graph (DFG) descriptions can be transformed for varying levels of parallelism and pipelining to create circuit architectures which best match the application requirements, backed up with simple FIR and IIR filtering examples.
One of the ways to perform system design is to create predefined designs termed IP cores which will typically have been optimized using the techniques outlined in Chapter 8. The creation of such IP cores is outlined in Chapter 9 and acts to address the key to design productivity by encouraging “design for reuse.” Chapter 10 considers model-based design for heterogeneous FPGA and focuses on dataflow modeling as a suitable design approach for FPGA-based DSP systems. The chapter outlines how it is possible to include pipelined IP cores via the white box concept using two examples, namely a normalized lattice filter (NLF) and a fixed beamformer example.
The final section of the book, consisting of Chapters 11–13, covers the application of the techniques. Chapter 11 looks at the creation of a soft, highly parameterizable core for RLS filtering, showing how a generic architecture can be created to allow a range of designs to be synthesized with varying performance. Chapter 12 illustrates how FPGAs can be applied to Big Data applications where the challenge is to accelerate some complex processing algorithms. Increasingly FPGAs are seen as a low-power solution, and FPGA power consumption is discussed in Chapter 13. The chapter starts with a discussion on power consumption, highlights the importance of dynamic and static power consumption, and then describes some techniques to reduce power consumption.
The authors have been fortunate to receive valuable help, support and suggestions from numerous colleagues, students and friends, including: Michaela Blott, Ivo Bolsens, Gordon Brebner, Bill Carter, Joe Cavallaro, Peter Cheung, John Gray, Wayne Luk, Bob Madahar, Alan Marshall, Paul McCambridge, Satnam Singh, Steve Trimberger and Richard Walke.
The authors’ research has been funded from a number of sources, including the Engineering and Physical Sciences Research Council, Xilinx, Ministry of Defence, Qinetiq, BAE Systems, Selex and Department of Employment and Learning for Northern Ireland.
Several chapters are based on joint work that was carried out with the following colleagues and students: Moslem Amiri, Burak Bardak, Kevin Colgan, Tim Courtney, Scott Fischaber, Jonathan Francey, Tim Harriss, Jean-Paul Heron, Colm Kelly, Bob Madahar, Eoin Malins, Stephen McKeown, Karen Rafferty, Darren Reilly, Lok-Kee Ting, David Trainor, Richard Turner, Fahad M Siddiqui and Richard Walke.
The authors thank Ella Mitchell and Nithya Sechin of John Wiley & Sons and Alex Jackson and Clive Lawson for their personal interest and help and motivation in preparing and assisting in the production of this work.
One-dimensional
Two-dimensional
Auditory brainstem response
Accumulator
Analogue-to-digital converter
Advanced encryption standard
Adaptive logic module
Arithmetic logic unit
Adaptive lookup table
Advanced Micro Devices
Artificial neural network
Analytics-on-chip
Application program interface
Application processing unit
Advanced RISC machine
Application-specific integrated circuit
Application-specific instruction processor
Adaptive voltage scaling
Boundary cell
Binary coded decimal
Block CLA with intra-group, carry ripple
Block random access memory
Coherent accelerator processor interface
Current block
Control and communications wrapper
Clock enable
Complex instruction set computer
Carry lookahead adder
Configurable logic block
Convolutional neural network
Complementary metal oxide semiconductor
Coordinate rotation digital computer
Carry propagation adder
Central processing unit
Conditional sum adder
Cyclo-static dataflow
Continuous wavelet transform
Distributed arithmetic
Discrete cosine transform
Double data rate
Data Encryption Standard
Dataflow accelerator
Dataflow graph
Discrete Fourier transform
Dependence graph
Distributed random access memory
Data memory
Dataflow process network
Digital receiver
Digital signal processing
Discrete sine transform
Decision tree classification
Dynamic voltage scaling
Discrete wavelet transform
Electrically erasable programmable read-only memory
Embedded Block RAM
Error correction code
Electroencephalogram
Electrically programmable read-only memory
Enhanced Squared Givens rotation algorithm
Electronic warfare
Fixed beamformer
FPGA-based custom computing machine
Functional engine
Forward error correction
Free-form expression
Fast Fourier transform
First-in, first-out
Finite impulse response
Field programmable gate array
Field programmable logic
Floating-point unit
Finite state machine
Full search motion estimation
Giga floating-point operations per second
Giga multiply-accumulates
Giga multiply-accumulate per second
Giga operations per second
General-purpose graphical processing unit
Graphical processing unit
General regression neural network
Gigasamples per second
Hardware abstraction layer
Hardware description language
High-K metal gate
High-level synthesis
Inter-Integrated circuit
Input/output
Internal cell
Instruction decode
Integrated design environment
Inverse discrete Fourier transform
Institute of Electrical and Electronic Engineers
Instruction fetch
Instruction fetch and decode
Inverse fast Fourier transform
Infinite impulse response
Instruction memory
Internet of things
Intellectual property
Instruction register
International Technology Roadmap for Semiconductors
Joint Photographic Experts Group
Constant-coefficient multiplication
Kernel memory
Kahn process network
Logic array blocks
Logic delay measurement circuit
Low-density parity-check
Low-level virtual machine
Least mean squares
Logarithmic number representations
Low-power double data rate
Least squares
Least significant bit
Linear time-invariant
Lookup table
Memory access
Multiply-accumulate
Minimum absolute difference
Multidimensional arrayed dataflow
Multiplicand
Motion estimation
Military standard
Multiple instruction, multiple data
Multiple instruction, single data
Memory LAB
Memory management unit
Model of computation
Media processing engine
Motion Picture Experts Group
Multi-processing SoC
Multiplier
Multi-rate dataflow graph
Most significant bit
Most significant digit
Multidimensional synchronous dataflow
Medium-scale integration
Megasamples per second
Not a Number
Normalized lattice filter
Non-recurring engineering
On-chip memory
Orthogonal frequency division multiplexing
Orthogonal frequency division multiple access
On-line analytical processing
Open computing language
Open multi-processing
Open RVC-CAL Compiler
Programmable Array Logic
Parameter bank
Program counter
Printed circuit board
Peripheral component interconnect
Pattern detect
Processing element
Programmable logic
Programmable logic block
Programmable logic device
Phase locked loop
Programmable power technology
Processing system
Quadrature amplitude modulation
QR recursive least squares
Random access memory
Radio access network
Block CLA with inter-block ripple
Reduced coefficient multiplier
Register file
Reduced instruction set computer
Recursive least squares
Residue number representations
Read-only memory
Radiation tolerant
Register transfer level
Reconfigurable video coding
Signed binary number representation
Snoop control unit
Signed digits
Synchronous dataflow
Software development kit
Signed digit number representation
Simple dual-port
Serializer/deserializer
Single event upset
Signal flow graph
Squared Givens rotation
Single instruction, multiple data
Single instruction, single data
Shared-memory multi-processors
Signal-to-noise ratio
System-on-chip
Social media intelligence
System on programmable chip
Serial peripheral interface
Structured query language
Single-rate dataflow graph
Static random access memory
Shift register lookup table
Shifted signed digits
Support vector machine
Search window
Transmission Control Protocol
Tera floating-point operations per second
Time of arrival
Throughout rate
Transistor-transistor logic
Universal asynchronous receiver/transmitter
Ultra-low density
Unified modeling language
VHSIC hardware description language
Very high-speed integrated circuit
Very long instruction word
Very large scale integration
White box component
Wave digital filter