Digital Radio Mondiale by Brian Stengaard

on the implementation of a

SOFT WARE DRM RECEIVER running ON A SYSTEM WITH LIMITED COMPUTATIONAL RESOURCES

Anders MĂ¸rk-Pedersen & Brian Stengaard February 1, 2010 Department of Electrical Engineering Technial University of Denmark

ether, n. [Gr. αιθηρ, upper air — L. æther, the upper pure, bright air] 1. The element believed in ancient and medieval civilizations to fill all space above the sphere of the moon and to compose the stars and planets. 2. (in Physics) An all-pervading, infinitely elastic, massless medium formerly postulated as the medium of propagation of electromagnetic waves. American Heritage ®Dictionary of the English Language, 4 th Edition, 2003.

[

ABSTRACT In recent years a trend towards converting broadcast media to an all-digital paradigm has begun. The conversion is done to accommodate the demand for higher quality, while still occupying a comparable bandwidth â&#x20AC;&#x201C; by utilising digital compression schemes. Digital terrestrial radio transmission offers broadcasts in high fidelity on lowfrequency bands, that has not previously been used for hi-fi radio. This opens an opportunity for transmissions with longer range than existing terrestrial hi-fi radio offers. The Digital Radio Mondiale, drm, meets the requirement for long range hi-fi radio by specifying a standard for transmissions on narrow channels, previously reserved for amplitude modulated stations. We present a complete Software Defined Radio, sdr receiver, capable of decoding drm broadcasts. The receiver was implemented in C, and targeted at an embedded platform. The receiver was completed to a degree that meets most requirements of the drm receiver profile. Audio data can be decoded on a pc. Issues still remain with the target platform.

a iii

CONTENTS

Project Introduction Abstract

iii

Contents

List of Figures

List of Tables

xii

List of Symbols

xiii

List of Abbreviations

Preface

xvii

About the Reader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii

Introduction 1.1 1.2 1.3 1.4 1.5

Background . . . . . . . . . . . Problem Statement . . . . . . . Project Scope . . . . . . . . . . Introduction to the DRM System Overview of the Report . . . . .

. . . . .

1 1 5 6 6 9

Theoretical Foundations of the DRM System 2

Orthogonal Frequency Division Multiplexing 2.1 Introduction . . . . . . . . . . . . . . . . . 2.2 Background . . . . . . . . . . . . . . . . . 2.3 Orthogonal Carrier Waves . . . . . . . . . 2.4 Guard Time Configuration . . . . . . . . . 2.5 Baseband Extraction . . . . . . . . . . . . .

. . . . .

13 13 14 16 17 19

2.6 2.7 2.8 2.9 2.10 2.11

Symbol Synchronisation . . . Sub-carrier Extraction . . . . Frame Synchronisation . . . Equalising . . . . . . . . . . Recognising DRM Spectrum Summary . . . . . . . . . . .

. . . . . .

Channel Decoding 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . 3.2 Overview of the Encoding and Decoding Process . 3.3 Multilevel Coding . . . . . . . . . . . . . . . . . . 3.4 Modulation . . . . . . . . . . . . . . . . . . . . . 3.5 Interleaving . . . . . . . . . . . . . . . . . . . . . 3.6 Puncturing . . . . . . . . . . . . . . . . . . . . . . 3.7 Convolutional Coding and Decoding . . . . . . . . 3.8 Energy Dispersal . . . . . . . . . . . . . . . . . . . 3.9 CRC used in DRM . . . . . . . . . . . . . . . . . . 3.10 Unequal Error Protection . . . . . . . . . . . . . . 3.11 Summary . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . .

Content Channels in DRM 4.1 Introduction . . . . . . . . . . 4.2 Fast Access Channel . . . . . . 4.3 Service Description Channel . 4.4 Main Service Channel . . . . . 4.5 Perspective . . . . . . . . . . . 4.6 Summary . . . . . . . . . . . .

20 21 24 25 28 28

29 29 29 32 32 37 39 41 48 48 49 49

. . . . . .

51 . 51 . 52 . 53 . 54 . 56 . 56

Architecture of the DRM Decoding System 5.1 Introduction . . . . . . . . . . . . . . . 5.2 Target Platform . . . . . . . . . . . . . 5.3 Creating a drm Decoder Chain . . . . . 5.4 Linking the Chain Together . . . . . . . 5.5 Libraries . . . . . . . . . . . . . . . . . 5.6 Organisation . . . . . . . . . . . . . . . 5.7 Perspective . . . . . . . . . . . . . . . .

. . . . . . .

59 59 60 63 64 67 68

Frequency Acquisition and Signal Down-mixing 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 Process Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

Design of the DRM Decoder 5

69 69 70

6.4 6.5 6.6

OFDM Demultiplexing 7.1 Introduction . . . . . . . . . . 7.2 Interface . . . . . . . . . . . . 7.3 Process Flow . . . . . . . . . . 7.4 Estimating Robustness Mode .

. . . . . . .

Carrier Equalisation and Frame Enumeration 8.1 Introduction . . . . . . . . . . . . . . . . . 8.2 Interface . . . . . . . . . . . . . . . . . . . 8.3 Frame Synchronisation . . . . . . . . . . . 8.4 Channel Estimation . . . . . . . . . . . . . 8.5 Equalising . . . . . . . . . . . . . . . . . . 8.6 Summary . . . . . . . . . . . . . . . . . . .

. . . . . .

7.5 7.6 7.7

Acquiring the DC Offset . . . . . . . . . . . . . . . . . . . . . . . . . 70 Down-mixing the Signal to Baseband Equivalent . . . . . . . . . . . 71 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

. . . .

. . . . Symbol Boundary Synchronisation . Extracting Sub-carriers . . . . . . . Summary . . . . . . . . . . . . . . .

Channel Demodulation and Decoding 9.1 Introduction . . . . . . . . . . . . . 9.2 Interface . . . . . . . . . . . . . . . 9.3 Process Flow . . . . . . . . . . . . . 9.4 Creating Logical Channels . . . . . 9.5 Decomposing the Frame . . . . . . 9.6 Cell Demodulation . . . . . . . . . 9.7 Deinterleaving . . . . . . . . . . . . 9.8 Depuncturing . . . . . . . . . . . . 9.9 Viterbi Decoding . . . . . . . . . . 9.10 Summary . . . . . . . . . . . . . . .

10 Payload Decoding 10.1 10.2 10.3 10.4 10.5 10.6

Introduction . . . Interface . . . . . Audio Decoding . Data Decoding . . SDC Presentation Summary . . . . .

. . . . . .

. . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . . . . . . . . . . .

75 75 75 76 76 78 80 80

83 83 83 84 85 88 88

89 89 90 90 91 92 93 94 95 96 96

97 97 97 98 99 100 100

Decoder Performance and Characteristics 11 Test Strategies

103 vii

. . . .

12 Signal Performance and Characteristics 12.1 Introduction . . . . . . . . . . . . . . . 12.2 Downconversion . . . . . . . . . . . . . 12.3 OFDM demultiplexing and Equalising . 12.4 Case Studies - Batch Signals . . . . . . . 12.5 Summary . . . . . . . . . . . . . . . . .

. . . . .

13 Computational Performance Characteristics 13.1 Introduction . . . . . . . . . . . . . . . . . . . 13.2 Experiences with the Target Platform . . . . . . 13.3 Processing Requirements . . . . . . . . . . . . 13.4 Computational Performance of Processes . . . 13.5 Processing Performance Comparisons . . . . . 13.6 Memory Constraints . . . . . . . . . . . . . . . 13.7 Summary . . . . . . . . . . . . . . . . . . . . .

. . . . . . .

. . . .

11.1 11.2 11.3 11.4

Introduction . . . . . . . Signals used for Analysis Test execution . . . . . . Analysing the Signal . . .

. . . .

103 103 105 107

109 109 109 110 111 116

117 117 117 118 120 122 123 124

Summary 14 Discussion 14.1 14.2 14.3 14.4

Introduction . Current Status Future Work . Perspectives .

. . . .

127 127 127 128 128

15 Conclusions

131

Bibliography

133

Appendices A Product Manual A.1 A.2 A.3 A.4

Overview of the Contents . Building the DRM Decoder Using the DRM Decoder . Analysing the DRM Signal

B Signal Characteristics viii

. . . .

139 139 140 141 141

145

C Division of Labour

151

LIST OF FIGURES

1.1 1.2

itu Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11

Single-carrier vs. multi-carrier . . . . . . . . The concept of orthogonal sub-carriers . . . Orthogonal sub-carriers in the time domain Extending symbols with cyclic prefixes . . . Timing in drmâ&#x20AC;&#x2122;s ofdm scheme . . . . . . . Cyclic prefix correlation . . . . . . . . . . . . ofdm block diagram . . . . . . . . . . . . . The drm spectrum . . . . . . . . . . . . . . . Radix-2 fft signal flow graph . . . . . . . . Reference cells in drm mode B . . . . . . . Surface plot of channel envelope . . . . . . .

. . . . . . . . . . .

15 16 16 18 20 20 21 22 23 25 27

3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10

Stream content . . . . . . . . . . . . . . . . . . The coding and interleaving process . . . . . . Different pam-systems . . . . . . . . . . . . . Phase Shift Keying constellation . . . . . . . . qam constellations used in drm. . . . . . . . State transition diagram for the drm encoder Structure of the drm encoder. . . . . . . . . . Full trellis . . . . . . . . . . . . . . . . . . . . . Encoding trellis. . . . . . . . . . . . . . . . . . Decoding path . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

30 31 33 35 36 43 44 44 45 47

4.1 4.2 4.3

Decomposition of drm content to osi layers . . . . . . . . . . . . . 52 Example of streams in msc . . . . . . . . . . . . . . . . . . . . . . . . 54 aac audio frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.1 5.2 5.3 5.4

Elaborated block Diagram . . Inter-process communication Interrupts for mode changes . Source code tree . . . . . . . .

. . . .

2 7

61 62 65 67

6.1 6.2 6.3

State machine for down-mixer . . . . . . . . . . . . . . . . . . . . . . 70 drm-signal complex envelope . . . . . . . . . . . . . . . . . . . . . . 71 The Hamming low-pass filter used in the downmix process. . . . . . 73

7.1 7.2 7.3

f sm for the ofdm block. . . . . . . . . . . . . . . . . . . . . . . . . . 76 Correlations of different modes . . . . . . . . . . . . . . . . . . . . . 77 Cyclic Prefix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8.1

Scatter patterns for mode B and D . . . . . . . . . . . . . . . . . . . . 86

9.1 9.2

Object interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 Example of symbol search . . . . . . . . . . . . . . . . . . . . . . . . . 95

11.1

Laboratory test-bench for streaming (real time) tests. . . . . . . . . . 104

12.1 12.2 12.3 12.4 12.5 12.6 12.7 12.8 12.9 12.10

Signal: Voice of Russia, if and I/Q spectrums . . . . . . . . Signal: Voice of Russia, timing synchronisation . . . . . . . Signal: Voice of Russia, unequalised constellation . . . . . . Signal: Voice of Russia, channel transfer function . . . . . . Signal: Voice of Russia, 2D channel transfer function . . . . Signal: Voice of Russia, single frame qam constellation . . Signal: Deutsche Welle, 2D channel transfer function . . . . Signal: Deutsche Welle, single frame qam constellation . . Signal: Radio Netherlands, 2D channel transfer function . . Signal: Radio Netherlands, single frame qam constellation

. . . . . . . . . .

109 110 110 111 112 112 114 114 115 115

13.1 cpu usage as a function of time . . . . . . . . . . . . . . . . . . . . . 121 13.2 Memory consumption as a function of time . . . . . . . . . . . . . . 123 A.1 A.2 A.3 A.4

Main menu . . . . . . . . . . . . . . . . . . Decoder gui-for testing convenience . . . Decoder gui . . . . . . . . . . . . . . . . . The gui for plotting and analysing signals

. . . .

140 142 142 143

B.1 B.2 B.3 B.4 B.5 B.6 B.7 B.8

Signal: Bouquet Flevo, 2D channel transfer function . . Signal: Bouquet Flevo, single frame qam constellation . Signal: R. Luxembourg, 2D channel transfer function . . Signal: R. Luxembourg, single frame qam constellation Signal: RTL, 2D channel transfer function . . . . . . . . Signal: RTL, single frame qam constellation . . . . . . . Signal: Deutsche Welle, 2D channel transfer function . . Signal: Deutsche Welle, single frame qam constellation

. . . . . . . .

145 146 147 147 148 148 149 149

LIST OF TABLES

1.1

Traditional itu frequency bands for radio . . . . . . . . . . . . . . .

2.1 2.2 2.3 2.4

Predefined channel profiles in drm+ . . . . . . . . . . . drm parameters for symbol extraction . . . . . . . . . . Operations involved in different fft algorithms . . . . . Example with number of operations for fft-algorithms

3.1 3.2

Distance and normalisation factor for selected constellations . . . . 35 Example of a puncturing pattern . . . . . . . . . . . . . . . . . . . . . 40

4.1

Parameters for example Figure . . . . . . . . . . . . . . . . . . . . . . 54

5.1

Station Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

8.1

Scatter recurrence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

11.1

The signals that are analysed . . . . . . . . . . . . . . . . . . . . . . . 104

. . . .

2 19 22 24 25

13.1 Benchmarks comparisons . . . . . . . . . . . . . . . . . . . . . . . . . 119 13.2 The top-most cpu absorbing function in decode . . . . . . . . . . . 122 13.3 Decoding time per second comparison . . . . . . . . . . . . . . . . . 122 C.1

xii

Responsibilities in relation to this report . . . . . . . . . . . . . . . . 151

LIST OF SYMB OLS ofdm-Related j fi fs Fk Fk sk sn Ns dn di Ψi Tg Tu Ts ρr η H(n) τ xn

√ The imaginary unit vector, j = −1 ∈ C Intermediate frequency Sampling frequency Logic abstraction of frame number k Logic abstarction of super frame number k Logic abstraction of symbol number k. Sample no. n of some symbol in time domain Number of carriers in a symbol A vector, spanning all the Ns subcarriers of symbol n The i th cell of a symbol, d i = d ⟦i⟧ , j ∈ [0; Ns − 1] i th carrier-wave of a symbol, is usually modulated with d i . Symbol guard duration Useful symbol duration (integration period for orthogonality) Total symbol duration, Ts = Tg + Tu Denotes the density of scattered reference cells Is an efficiency factor Transfer function for the channel Often used as relative delay between two receiving paths Is sometimes used for the nth sample of a real signal, sampled at if (s)

(c)

Carrier index for the nth pilot is denoted: κ n for scattered pilots, κ n for continual (t) pilots and κ n for time references. (s)

Reference value (complex constant) for the nth pilot is denoted: ℵn for scattered (c) (t) pilots, ℵn for continual pilots and ℵn for time references.

xiii

Encoding and Modulation Related Q aQ M E Λ λ C K P R

xiv

Constellation. Normalisation factor for constellation Q. Number of points in a constellation Energy Set of levels Value of a soft bit A code Constraint length of a code Polynomials for a code Logical channel, fac, sdc or msc

LIST OF ABBREVIATIONS lw mw sw am itu fm rds vhf epg etsi drm dab sdr adc rf mcu arm if ofdm msc sdc fac fec qam dvb-t adsl dvb-c cofdm isi ici idft fft hpp

Long Wave Medium Wave Short Wave Amplitude Modulation International Telecommunication Union Frequency Modulation Radio Data System Very High Frequency Electronic Programme Guide European Telecommunications Standards Institute Digital Radio Mondiale Digital Audio Broadcast Software Defined Radio Analog-to-Digital Converter Radio Frequency Micro Controller Units Advanced RISC Machine Intermediate Frequency Orthogonal Frequency Division Multiplexing Main Service Channel Service Description Channel Fast Access Channel Forward Error Correction Quadrature Amplitude Modulation Digital Video Broadcast – Terrestrial Advanced Digital Subscriber Line Digital Video Broadcast – Cable Coded Orthogonal Frequency Division Multiplexing Inter Symbol Interference Inter Carrier Interference Inverse Discrete Fourier Transform Fast Fourier Transform Higher Protected Part xv

lpp pam snr psk qpsk sm vspp hmsym hmmix cc f sm va ml crc uep liab mips f pu faad eep drm sdr

xvi

Lower Protected Part Pulse Amplitude Modulation Signal to Noise Ratio Phase Shift Keying Quadrature Phase Shift Keying Standard Mapping Very Strongly Protected Part Hierarchical Modulation - Symmetrical Hierarchical Modulation - Mixed Convolutional Code Finite State Machine Viterbi Algorithm Maximum Likelihood Cyclic Redundancy Check Unequal Error Protection Linux In A Box Million Instructions Per Second Floating Point Unit Free AAC Audio Decoder Equal Error Protection Digital Radio Mondiale Software Defined Radio

PREFACE Our choice of thesis project was influenced by a desire to apply signal processing in practice. We are firmly grounded software enthusiasts with a background in embedded systems and Unix development. This naturally makes the implementation of a software defined radio stack on a Linux platform an interesting case. This report is about the implementation of a drm receiver on Linux.

about the reader We assume the reader has some knowledge about signal processing and coding theory. We also use some standard notation from computer science that the reader is expected to know (Big-Oh notation and the like).

acknowledgements We would like to thank our adviser, Johan Jacob Mohr, for pushing us ever forward and making sure we did not get focus to much on the irrelevant details. Mikael Dich is owed a thank you for letting us use his Viterbi decoder, supplying us with information when needed and even a nanoliab to conduct our tests on. Thanks are due Ole Poulsen and Dennis Pedersen for letting us use their work as a starting platform. We would also like to thank Rasmus Mejsner for proof reading.

Anders MĂ¸rk-Pedersen Brian Stengaard Kgs. Lyngby, DK January 2010

xvii

Chapter

INTRODUCTION

1.1

background

In this section the background and our motivation for the choice of topic is given.

Why Radio? Throughout the twentieth century, the radio has played an important role in informing the public. Radio broadcasts have been used for anything from securing democratic rights through free information, to manipulating populations using propaganda and, of course, plain entertainment. Until the advent of television, the radio was the only mass media for live announcements and emergency alerts. During the second world war many of the occupied nations was relying on radio stations for uncensored news. Radio services are still in heavy use around the world, informing and entertaining millions everyday. In many less developed countries, such as India and large parts of Africa, radio is still the main source for information, given that few people can afford a tv. In Britain 89% of the population listens to radio programming, for an average of around 20 hours during the week [2]. Among the Unites States population almost 80% of adults listens to radio programming on a daily basis [24]. These numbers suggest that radio is indeed an important service to a large number of people, even in the more developed part of the world. Traditionally, terrestrial radio has been broadcast on the Long Wave, lw, Medium 1

region 2 region 1 region 3

Figure 1.1: The itu regions

Band

lw mw sw vhf

Region 2

Region 1

∆ fc [kHz]

[535; 1710] kHz regional [87.5; 108] MHz

10 10 100

Region 3

Mod.

∆ fc [kHz]

[148.5; 283.5] kHz [535; 1611] kHz regional [87.5; 108] MHz

9 9 10 100

[535; 1611] kHz regional [87.5; 108] MHz

9 10 100

am am am fm

Table 1.1: itu recommended frequency bands and channel spacings by region. Traditionally used for analog radio broadcasts. According to NTIA [27] and Spragg [35]

Wave, mw, & Short Wave, sw, bands1 using Amplitude Modulation, am. The bandwidths have been limited to 9 kHz or 10 kHz by the International Telecommunication Union, itu, Table 1.1. The narrow bandwidth combined with the poor band-efficiency2 of am modulation results in a poor audio quality for musical programmes. Audio feeds with higher fidelity can be achieved with the Frequency Modulation, fm, radios, where a 100 kHz bandwidth is used for each station. This has allowed fm broadcasts to include stereo sound and Radio Data System, rds, services, while am transmission traditionally has been in mono. Using the lw, mw or sw bands for broadcast transmissions (Table 1.1), allows a single broadcast to cover larger areas, than equivalent transmissions on the bands reserved for fm transmissions3 . The good propagation parameters for lw, mw and sw that are caused by reflections from the ionosphere (sky-wave) and the earth (ground-wave) is probably why am has survived to this day, whereas the vhf band only allows line-of-sight (direct-wave).

lw,mw and sw is defined as the bands in the frequency ranges lw ⊂ [30; 300[ kHz (or low frequency), mw ⊂ [300; 3000[ kHz (or medium frequency), and sw[3; 30[ MHz (or high frequency) 2 An am station has two sidebands that are redundant replicas 3 fm is transmitted on Very High Frequency, vhf ⊂ [30; 300[ MHz

Why Digital Radio? In recent years there has been a development towards creating radio services with digital content. This shift to a new technological paradigm is motivated by the desire for maximising the utilisation of the available bandwidth. Better bandwidth utilisation is achieved by incorporating modern audio codecs (coder/decoder) which offer significant advances, in terms of data compression and audio quality, by exploiting certain properties of the human sound perception. Digital radio allows for stations to attach versatile content to their audio feed. This content could be meta-data such as an Electronic Programme Guide, epg, or it could be multimedia content to be presented to the listener (e.g. weather reports and news). The audio feed can even be broadcast in several languages on the same frequency if multilingual radio is deemed more important than sound quality. Flexibility is a keyword for digital radio broadcasts. Mainly the the flexibility a station has to trade quality for robustness to best suit their need. From the listeners perspective it is the flexibility to select from a range of services, possibly offered by a single station. The applications of the combined digital multimedia solutions discussed here are only limited by imagination. Possible usages include: traffic information, weather forecasts, price quotes for anything from industrial fish quotas to the current price of a kilowatt hour. With these thoughts in mind, only all-digital radio systems shall be considered for the remainder of this thesis.

Why Digital Radio Mondiale? In September 2001 the European Telecommunications Standards Institute, etsi, released the first version of the system specification for Digital Radio Mondiale, drm, [11]. The drm system is designed specifically to operate on frequencies below 30 MHz4 , that is the long, medium and short wave bands. The system is designed to give near-fm audio quality in the narrow channels of these scarce bands. An advantage to broadcasters is that they can retrofit their older am equipment to broadcast drm signals with relative ease. The audio encoding in drm is based on a subset of the mpeg-4 standard. The drm specification is an open and free standard promoted and created by a consortium of broadcasters and network providers along with transmitter and receiver manufacturers. drm combines the benefits of a digital radio broadcasting system with the benefits of propagation range for lw, mw and sw transmissions. There are certain drawbacks associated with drm as well. The drm signal is more 4

The newer standard has a mode for vhf broadcasts, we denote this drm+

complicated than am, so the era of making foxhole radios5 in the science classes is over. Also the channel bandwidth of the lw, mw & sw bands is very narrow â&#x20AC;&#x201C; limiting the audio bit-rate. Finally, it is difficult to make an efficient amplifier for the transmitters, because a drm signal has instantaneous peak amplitudes that exceeds the average by large amounts (i.e. has a high peak-to average range, thus the amplifier needs to be fast.)

The Competitors Naturally there are some contenders to drm as a digital radio broadcast system: Digital Audio Broadcast, dab, hd radio and fmextra are all fully digital systems, however there are drawbacks to all of them which will be discussed below. dab [8] is a free and open digital radio standard also developed by etsi. It is aimed at delivering content in the frequencies above 47 MHz. It utilises the older mpeg-2 standard for audio codec (invented in 1980â&#x20AC;&#x2122;s). This radio system is found to be outdated. hd radio is a proprietary standard for radio broadcast on both the fm and am bands. Since it is a closed standard, royalties must be payed to further the specification. Also it seems to be mostly adopted by the United States. fmextra is a hybrid system, sending both analog and digital content, it is compatible with hd radio, but not very wide-spread yet. Implementing drm is preferred to these competitors in that it is already adopted in Europe, so testing should be easier. Also, the narrow bandwidths used for drm eases the computation, making it easier to process in software.

Why do it all in Software? A Software Defined Radio, sdr, is a radio where components that have traditionally been implemented by analog circuits are instead implemented using software on a micro processor. More specifically, a platform equipped with a micro processor and an Analog-toDigital Converter, adc (this could be a sound card in a regular PC), along with some suitable Radio Frequency, rf front-end. The signal is then sampled and the processing is done in software. The disadvantage of this is that a micro processor in general will cause higher power consumption and worse performance, compared to a dedicated circuits. The benefits of the sdr solution is the gained flexibility over a regular hardware based receiver. The software counterpart can be used for other purposes, radio data 5

World War II veterans constructed simple am radios from a coil and a razor-blade [4].

can be recombined and transformed at the receiving end at the discretion of the user. Furthermore any error discovered after release of the product can trivially be fixed and made available to customers. More to the point, as an academic exercise, it is very easy to analyse the signal at intermediate points, when the signal is already in an accessible format.

Why Embedded? Given the trend of ever cheaper and faster low-end Micro Controller Units, mcu’s, the market for embedded sdr’s has a large potential. Especially the Advanced RISC Machine, arm, family of processors have been popular in recent years, being used in Apple’s iPhone, Tom Tom’s navigation systems, Microsoft’s Zune and the Sony PlayStation Portable (to name but a few). Using an embedded platform, however, does introduce a number of constraints on the computational requirements of the application, along with constraints on memory consumption. Furthermore special care must be taken to ensure portability between systems. These embedded systems often lack support for floating point operations, and memory management (i.e. paging). The constraints on memory consumption and processing requirements should be relieved through careful design of the decoding system. The portability issues can be reduced if the decoder is targeted at a standardised platform. For this, Linux is chosen, since it is known to scale well in embedded applications [28, 39], the authors are also quite comfortable with Linux, and not keen to develop a new operating system from scratch before starting to develop the software radio.

1.2

problem statement

Having addressed the issues above the scope and purpose of the project is now manifested. This project focuses on the design and implementation of a complete software defined radio receiver, for demodulating and decoding the contents of a drm broadcast, as defined in the system specification for drm [13]. We will build it with an embedded platform in mind, but not optimise too much until a working receiver is in place “Premature optimisation is the root of all evil” Donald E. Knuth

1.3

project scope

The goal for this project is to produce a receiver capable of decoding drm transmissions. The focus is mainly on the software. If time permits it a front-end and an antenna may be developed. The implementation may rely on software libraries where they fit the purpose and are freely available, e.g. audio codecs. As for the specification, a new standard has been released, sometimes denoted drm+, with a new configuration for vhf broadcasts. Throughout this project, the elder standard (version 2.3.1 2008) is used, unless otherwise specified. on.

The decoding must be done in real-time, so that an rf front-end can be used later

The decoder shall be able to decode all robustness modes of standard 2.3.1, in all occupancy configurations. The decoder must be compatible with real input signals at an Intermediate Frequency, if of 12 kHz, i.e. compatible with the recorded signals found at Fischer [16]. All the logical channels that comprise the drm content must be decodable. At least some audio feeds should be fully decodable. Other content may be presentable if time permits and it turns out relevant.

1.4

introduction to the drm system

In this section the composition and architecture of the drm system is explained briefly. Digital Radio Mondiale allows radio transmission in any broadcast band below 30 MHz. It does so using very little bandwidth; the maximal bandwidth is 20 kHz (double channel), and the minimal bandwidth is 4.5 kHz (half-channel). Most often 9 â&#x2C6;&#x2019; 10 kHz (single channel) is used though. In these very narrow bandwidths, bit-rates ranging from 4.8kbit/s to 72kbit/s can be produced depending on a number of parameters such as robustness mode, modulation type, bandwidth, protection mode and code rate. [13, see Annex H] To achieve this remarkable flexibility in bit rate versus error robustness, drm utilises Orthogonal Frequency Division Multiplexing, ofdm, (see Chapter 2), along with advanced error-correcting coding techniques (see Chapter 3) to ensure confident transfer of source information. The primary source in radio (i.e. the audio feed) is compressed, using an mpeg-46 codec. 6

The mpeg-4 audio codecs used for drm: celp, hv xc and aac

These main points altogether form the basis of a radio system with the ability to achieve a much higher fidelity than any analog system with comparable bandwidth and range is able to.

Block Diagram of a drm Decoder Now the blocks in the block diagram (Figure 1.2) will be referenced and their functionality will be explained. The process of decoding the drm signal is to reverse each of the encoding steps, as described in etsi [13]. Figure 1.2 (p. 7) shows a block diagram of the decoding process in terms of functionality. quantised, Chapters 2, 6, 7,8 Mode freq. synchr.

time synchr. n0

reinflate

xn p

resample

equalise dk

deinterleave

p fec decode

ofdm demux

s k,n

cell deintl.

digital, Chapters 3,9

rf front-end

frame synchr.

p qam demap

msc, p â&#x2C6;&#x2C6; {2, 3, 6} sdc, p = 2 fac, p = 1

p recombine

dedisperse

msc sdc

present

fac: Occupancy, qam sdc: Coding, inflation

Figure 1.2: A block diagram, sketching the blocks by function. Red blocks signify the rf

front-end, green blocks process a quantised analog signal, and yellow blocks does all-digital processing. The blue block does content decoding

The drm system is used to simulcast up to 4 streams of source data. The content of this source data can be audio feeds or other multimedia data[12, 13] (e.g. a set of pictures). Of course including several source streams lowers the bit-rate available for each stream.

Understanding the Transmission This is a superficial top-down description of the encoding and modulation process of a drm transmission, it is given in order to supplement the block diagram, Figure 1.2. The 7

block diagram follows a bottom-up approach, so the last block in the block diagram corresponds to the first part of the encoding process. The main feed in radio broadcast is naturally the audio, so lets start here. The Content The audio is encoded using the mpeg-4 codec that best suits the profile of the audio, there is one codec for musical content (aac) and two different ones for speech (hv xc and celp) (the corresponding block in the diagram is present). The encoded audio now has a bit rate that is considerably lower than the channel capacity. Now the audio bit-stream is cut into chunks representing a duration of 400 ms, each audio feed represents a so called stream. If there is other content (i.e. data), this is included in one or more different streams, for example using drm data packets. All the streams make up the main payload of the broadcast, called the Main Service Channel, msc. The msc content for a 400 ms period, is known as a frame. The msc channel is accompanied by two complementary channels, that provide meta-data, describing how the streams should eventually be decoded. These are the Service Description Channel, sd c, that holds the meta-data for the streams, along with descriptions of their services. And the Fast Access Channel, fac, that is a very robust channel with a shallow station description7 , and the modulation parameters that should be used to extract the other streams. Hence the streams depend on each other in the way that the fac is needed to demodulate sd c and msc. And the sd c is needed to extract and present the main content from the msc (see Chapter 4). The Modulation and Transmission At this point the payload is ready to transmission; one frame is processed a time, and here is how it is done. Each of the three channels are processed individually and conveyed on different sub-carriers in the ofdm link. They therefore have their own coding/modulation parameters. A general description of the process is given here. The stream is randomised, in a predictable manner, to disperse the energy (corresponding to dedisperse in block diagram) before the encoding is done. This is, in part, because the Forward Error Correction, fec, uses a serial bit-processing scheme that is more efficient when the bits often alternate (i.e. there are no long sequences of zeros or ones). Hereupon, the channels are split in levels (recombine in the block diagram â&#x20AC;&#x201C; p, is the number of levels for a specific channel), and the fec is applied to each level individually (the block fec decode), to make the stream redundant. According to the selected code rate a fraction of the output is discarded (the same number of zero-bits are stuffed back in again by the block reinflate), so the resulting bit-stream may still be recovered if a number of consecutive bits is incorrect. The symbols are interleaved, or permuted, to avoid effects of time fading channels (cancelled by the block deinterleave), 7

No label, just a coarse genre and language description

and the bits are now mapped to a Quadrature Amplitude Modulation, qam, alphabet (the block diagram counterpart is the qam demap block). The result of the above make up a transmission frame, and the three channels are then mapped to ofdm symbols, after which reference cells are added, and the symbols are broadcast (demultiplexing of ofdm symbols is done by the ofdm demux block).

The Remaining Blocks in the Block Diagram The blocks that was not mentioned when the transmission was addressed above, was the equalise block, that estimates and corrects the channel, the synchronisation blocks, time synchr. and frame synchr., that are responsible for finding the beginning of symbols and frames respectively. When a real signal is used as an input (i.e. we emulate a superheterodyne radio), the intermediate frequency is found by the block named freq. sync, and the signal is re-sampled to in-phase/quadrature by the resample block.

1.5

overview of the report

This thesis is written as a bottom-up walk-through of a decoder for the drm system. This is done to help the reader understand the flow of processes involved from sampling, through demodulation to decoding.8 However, brief descriptions of concepts may be needed, and thus given, before they â&#x20AC;&#x153;belongâ&#x20AC;?. In the first part of this report the theoretical foundations of the drm system is given, i.e. it is shown how the signal is received and extracted through how it is decoded and what it contains, Chapters 2-4. After this, a design and architecture for such a system is proposed, Chapters 5-10. Next it is demonstrated that the implemented solution works and different performance bounds are shown, Chapters 11-13. Lastly a summary of the results and a perspective on the developed receiver is given, Chapters 14 & 15.

Exactly the opposite of the drm systems specifications top-down approach.

PART I THEORETICAL FOUNDATIONS OF THE DRM SYSTEM

Chapter

ORTHOGONAL FREQUENCY DIVISION MULTIPLEXING

2.1

introduction

ofdm is becoming increasingly popular for digital communication systems. It is used in everything from wireless internet1 to radio2 and television3 . As well as cable based technologies, like internet subscriptions4 and cable tv 5 . The widespread use is very justified altogether because ofdm facilitates a highly flexible framework for transferring information. Basically ofdm-based links allows to trade efficiency for ruggedness against multipath propagation. ofdm links can be designed to mitigate the effects of long delay spreads6 between signal paths for the sacrifice of channel efficiency. Being able to adjust the tolerance of multipath propagation is very desirable. Multipath arise from reflections of the signal. drm broadcasts are often carried by a low frequency, and the carrier wave may thus be reflected by both the ionosphere and/or the ground, a combination likely to cause multipath. Multipath propagation is also experienced by moving vehicles7 (e.g. cars, trains). When the link is able to cope well with high levels of multipath propagation, it allows broadcasting the same signal on a single frequency with several transmitters, which is compelling for broadcast networks. The other major advantage of ofdm, when combined with other technologies, is 1

ieee-802.11{a,g,n}, wimax, umts: e-utran dab 3 Digital Video Broadcast â&#x20AC;&#x201C; Terrestrial, dvb-t 4 Advanced Digital Subscriber Line, adsl 5 Digital Video Broadcast â&#x20AC;&#x201C; Cable, dvb-c 6 Tolerance for multipath delay spread is related to symbol lengths. ofdm use long symbol periods 7 Moving vehicles experience time-varying multipath 2

the ability to cope with narrow-band interference, and frequency/time selective fading. This is again a trade off between tolerance and efficiency (the subject of Chapter 3) The flexibility is what makes ofdm so important. Links can be optimised to best suit their specific purpose and requirements. ofdm is the bread and butter of drm, and the inherent flexibility manifests itself by a number of predefined ofdm configurations, called robustness modes, that define the balance between ruggedness and capacity. The robustness mode can be changed by the station at any time8 . drm offers many ways of trading fidelity for ruggedness. More are discussed when we get to the coding (Chapter 3).

2.2

background

The ofdm link is used for conveying information in parallel, and as such, it does little more than that. The concept of frequency division multiplexing schemes is to divide the available channel bandwidth, into a large number of sub-bands, or sub-carriers, and modulate the data onto these. A technique very similar to that of modern ofdm already found its way into military radio modems, named â&#x20AC;&#x153;Kineplexâ&#x20AC;? [3], back in the 1950â&#x20AC;&#x2122;s. Kineplex was composed of analog tuned circuits - and the complexity of the hardware (filters, oscillators and mixers) allowed for 24 channels (each of 300baud) in a 6-foot rack. It seems Chang [5], back in 1966, was one of the first to formalise9 ofdm as it is known today. ofdm is an improvement on the earlier multi-carrier schemes, in that the carriers have overlapping side-bands, making good use of the available bandwidth. For a thorough history of the evolution of ofdm, see Bahai et al. [3], Chapter 1 Frequency Diversity Channels that are subject to frequency selective fading, or narrow-band interference, may attenuate and distort certain sub-carriers as to make their throughput unacceptable low. Therefore ofdm links are used in combination with error coding techniques [34] to spread the information across the sub-carriers; thus taking advantage of the frequency diversity. Hence we shall use the term demultiplexing for extracting the ofdm information and preserve the term demodulation to the process of re-combining the information in Chapter 3. When ofdm is used in combination this

Most stations will probably find an optimal configuration for their transmitter and never change

Bell labs. even took up a patent (US patent 3488445)

∆t = 400 ms

s0 s1 s2

s4 s5

(a) Single-carrier transmission at 15 baud and n bps

(b) Multi-carrier transmission at 2.5 baud but also n bps

Figure 2.1: The difference between single- (a) and multi-carrier (b) links. Notice how a multi-carrier symbol has a much longer period than a single-carrier symbol (i.e. lower symbol rate), with comparable gross bit rates (n)

with channel coding techniques (which it usually is), it is sometimes denoted Coded Orthogonal Frequency Division Multiplexing, cofdm. There are often hundreds (drm: 88-460) of sub-carriers, and the vast majority are used to transfer the payload10 . Thus when the data is error coded beforehand, a single sub-carrier may be less important, as the data it represents can be recovered. This is indeed the case when ofdm is used in combination with coding techniques, such as forward error correction, it can then be tolerated to lose several of the sub-carriers, making the link tolerate narrow-band interference and fading. As opposed to say a single carrier link. Multipath Delay Spread The use of multiple carriers has another important effect, related to multipath delay spread tolerance. In single-carrier links, high bit-rates are achieved by increasing the symbol rate (the symbol period is lowered). Contrary to multi-carrier links, where the data is aired in parallel at low symbol rates (Figure 2.1). The low symbol rate is an advantage if the symbol is received via different paths, with a relative delay of say, τ. As long as the symbol period, Ts , is longer than the relative delay of the paths (i.e. Ts > τ) then a fraction of the same symbol is received from both paths. However, if τ > Ts , then different symbols are received from the different paths, and Inter Symbol Interference, isi, occurs over the entire symbol period. When Ts > τ ∧ τ ≠ 0, there is still isi in a period of τ for each received symbol. The problem may be solved by extending symbols with a cyclic prefix. This technique is addressed later on. 10

Others are used for reference and control cells

sub-carriers

d 1 Ψ1

d 2 Ψ2

d 3 Ψ3 d 4 Ψ4 d 5 Ψ5 d 6 Ψ6

Fm+1

fsn fsn+1

fsn+2

t Figure 2.2: Illustration of the sub-carriers constituting symbol sn , that is a part of frame F m , the period between two consecutive symbols (e.g. s n and sn+1 ) is Ts = Tg + Tu d 1 Ψ1

d 2 Ψ2 t

d 6 Ψ6 d 5 Ψ5 d 4 Ψ4 d 3 Ψ3

Figure 2.3: Orthogonal sub-carriers in the time domain, the symbol is the sum of all these. Note that the sub-carriers are modulated with an amplitude, but their phase is the same. The sub-carriers correspond to those of Figure 2.2

2.3

orthogonal carrier waves

So ofdm is merely a technique to convey information in parallel. However it is very efficient, because the carrier-waves, Ψ, are spaced with overlapping sidebands. The trick to having overlapping sidebands, without interference from adjacent carrier-waves, is to have all the carrier-waves form an orthogonal set. Hence when the demodulated subcarriers are integrated, only their own contribution will count. Orthogonal sub-carriers with overlapping sidebands are depicted in Figure 2.2. The property of orthogonality is obtained by spacing the carrier-waves evenly, with a separation of ∆ω = 2π Tu apart, where Tu is the symbol period over which the receiver shall integrate the demodulated carriers. The carrier-waves that constitute an ofdm symbol then have the form (complex notation) jk 2π Ψk ≡ e Tu (2.1) The consequence of this particular frequency separation is that every carrier has a multiple number of periods within Tu . A few orthogonal carriers of this form are sketched on Figure 2.3, where the carrier-waves on the figure are modulated with different amplitudes. 16

When modulating digital information (that is discrete by nature) we seek to modulate a constant onto each sub-carrier representing the digital information, for the duration of the symbol (i.e. a phase and an amplitude that can represent the discrete information, for instance using qam-points). Mathematically we can denote the unmodulated cell by d ∈ C. The cells are modulated onto carrier-waves to make actual sub-carriers, and the sub-carriers are summed to form a symbol s N s −1

s n ≡ ∑ d k Ψk⋅n

(2.2)

k=0

Ns is the total number of sub-carriers for a symbol, n is the sample number (i.e. the discrete time variable) and k is the carrier-wave index. That the carrier waves are indeed linearly independent over the period Tu , can be accounted for in the following way. Imagine how one could down-convert the i th sub-carrier from s, by mixing the symbol with a wave Ψi of the complex conjugate carrier-wave, and then integrating the demodulated signal over a period of Tu

∫

Ψi⋅t ⋅ s t ∂t

(2.3)

Expanding this using Equation 2.2 and Equation 2.1:

∫

−ji⋅t T2π

N s −1

k=0

∑ d k Ψk⋅t ∂t = ∑ d k

∫

j2πt k−i T u

∂t

(2.4)

And we will particularly note that the integral:

∫

t j2π k−i T u

∂t = {

0 k ≠ i∣k, i ∈ Z 1 k=i

(2.5)

which makes Equation 2.4 evaluate to d k Tu . More intuitively it can be said that all resulting sub-carriers d k , k ≠ i, have frequencies of 2π k−i Tu , when carrier i is downTu 11 mixed. This is an integer number of i−k cycles through an interval of Tu , except for the i th sub-carrier that is down-mixed and has no carrier wave. This also accounts for how it is possible to retrieve the individual sub-carriers without any filtering of the down-mixed signal afterwards. The other sub-carriers in the symbol simply integrates to 0. Thus the carriers are orthogonal

2.4

◻

guard time configuration

Of course it seems most efficient to send ofdm symbols in immediate succession, so that no channel capacity is wasted, but then no multipath propagation can be tolerated. 11

beat tones

integration interval, sn

sn+1

(a) Sub-carrier reception without cyclic prefix, and Ψ1 delayed by τ

(b) Sub-carrier reception with cyclic prefix, and Ψ1 delayed by τ Figure 2.4: (a) As an illustration of how a delayed sub-carrier can cause both ici and isi,

here is a symbol consisting of only two sub-carriers, Ψ1 , which is delayed by τ and Ψ2 which is punctual. The carrier-wave of Ψ1 , causes Inter Carrier Interference, ici in s n due to the fact that ∫0Tu Ψ1 ∂t ≠ 0. The isi arises from the part of the delayed sub-carrier that is not zero in s n+1 (b) here a cyclic prefix has been added in order to mitigate the effects of ici and isi.

In case of multipath the orthogonality is lost, because the latent carriers are no longer available throughout the integration interval. The effect is ici on the punctual carriers. The latent carrier is now also arriving simultaneously with the consecutive symbol, resulting in isi. The effect of isi on the consecutive symbol can be neutralised by introducing a delay between the symbols, but that would not solve the ici. To mitigate the effects of both ici and isi, a cyclic prefix is added to every symbol, called the guard interval, see Figure 2.4. By adjusting the length of this guard interval, Tg , a trade off between channel efficiency and robustness to multipath is made. As long as Tu > Tg > τ, where τ is the relative delay between two receiving paths, there is neither any ici nor isi. The cyclic prefix is (intuitively) constructed by copying the last part of the signal that make up a symbol and prefixing the symbol with it. Mathematically we extend the symbol by Tg , and delay the phase of all the subcarriers by Tg (Equation 2.6), and send the resulting symbol for a duration of Tg + Tu instead of Tu . N s −1

sn = ∑ e k=0

−j2π T

d k Ψk⋅n

(2.6)

Table 2.1 show the trade-offs that has been made in the drm standard to accommodate the challenges with lw, mw and sw channels. These frequency bands are all prone to multipath propagation, since the wave may be reflected by the earths surface as well as 18

Mode

Capacity loss

Efficiency

Tolerable propagation conditions

Tg /Tu

ρr

A B C D

1/9 1/4 4/11 11/14

1/20 1/6 1/4 1/3

83% 62% 47% 14%

Gaussian ch., minor fading

E†

1/9

1/16

83%

Time and freq. selective ch. (line-of-sight)

Time and freq. selective ch., increased delay spread As mode B, plus higher Doppler spread tolerance As mode B, plus heavy delay and Doppler spread

† Mode E is a recent addition to the standard, aiming at vhf bands, and we have ignored it. Table 2.1: The channel profiles that the robustness modes of drm+ are engineered to cope with, as well as the efficiency, η, of the link. ρ r is the density of scattered reference cells.

the ionosphere. The efficiency η, in the table has been found as: η = (1 −

Tg 1 ) (1 − ) ⋅ α Tu ρr

(2.7)

where ρr is the density of scattered pilots, that are always present, and Tg /Tu indicate how big a fraction of the capacity that is wasted by guard intervals. The α ≈ 1 factor is due to some extra time reference cells present in the first symbol of a frame. This essentially give the net capacity of the link. To consider the payload efficiency, the numbers would be even smaller, since the payload is redundant and has a number of control channels that do not carry the content of interest.

2.5

baseband extraction

Sampling and discretising real signals give rise to a frequency ambiguity, as can be realised by the trigonometric identity cos(ωt) = cos(ωt + nπ), n ∈ Z

(2.8)

with the particular consequence that cos(ωt) = cos(−ωt). This ambiguity is restrained by sampling the quadrature phase of the signal as well (Equation 2.9). Thus a complex baseband signal can have a bandwidth of fs (i.e. twice the nyquist interval). cos(ωt + n ⋅ 2π) + sin(ωt + n ⋅ 2π), n ∈ Z The frequencies are then folded (or wrapped) around in the spectrum at seem as if they are in the range as) negative frequencies.

f [− 2s ; 0[,

(2.9) fs 2

and

hence they look like (and are often referred to

400 ms F1

t F1 F2

s1 s2 s3 s4 s5 s6 s7 s8 s9 s10 . . . sk k ∈ {15, 15, 20, 24} ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ modes A,B,C & D s1 Tg

Figure 2.5: The fundamental timing principals of the drm ofdm scheme, note that this is different for mode E in the new standard [14]. The symbol, s, is contained in the frame, F, which in turn constitutes a super-frame F.

N u delay

Conjugate

find phase

freq. offset, ϕ 0

max ×

∑N g

find maximum

symbol start, n o

Figure 2.6: The concept of correlating the last part of a symbol to the beginning.

Handling Real Signals An rf front-end that only outputs a real (i.e. envelope) signal, can still be used even though we assume a quadrature baseband signal. The rf front-end must then mix the signal to if, whereupon the if signal is sampled, thus emulating a superheterodyne receiver. The quadrature mixing can then be done in software afterwards. ofdm is very sensitive to phase errors which will arise with improper down-mixing or sampling (i.e. frequency offsets).

2.6

symbol synchronisation

ofdm links constitute a succession of ofdm symbols. A number of symbols make up a frame, in such a way that a frame marks the place where the periodic reference cells of the symbols are starting over again. The timing of drm is sketched on Figure 2.5 In order to extract the symbols, the boundaries has to be detected. The detection of symbol boundaries can be done by correlating the cyclic prefix to the end of the symbol using a sliding window. Because the two signals are ideally the same (though one of them might be overlaid by multipath) The principle is shown on Figure 2.6. 20

cos(ω i f t n ) sn ∶ R

X0 in-phase, R{x n } X1 xn ∶ C

j ⋅ sin(ω i f t n )

N u −1 x n Ψ k⋅n ∑ k=0

Xk ∶ C X2

quadrature, I{x n } X N u −1

Figure 2.7: A block diagram showing the basic principle’s of demultiplexing ofdm symbols

The timing offset does not have to be 100% accurate, but if a timing offset, τ, should occur, the carrier-waves will all experience a phase change, ∆φ i ∆φ i = ω i τ

(2.10)

for the i th extracted sub-carrier. This can also be realised by considering the number of cycles a carrier-wave is subject to; the more cycles per symbol, the narrower period, and hence the larger phase-change.

Detecting Robustness Mode The robustness mode can be found using the same correlating function as the symbol timing. This process involves comparing the output of the correlation function, when it is configured with different delays and integration intervals, corresponding to the timing parameters of the individual modes.

2.7

sub-carrier extraction

The ofdm extraction process may be understood as a vectorised downmixer, downmixing every sub-carrier to integrate it and obtain the cell. Fortunately, we recognise Equation 2.2 as the Inverse Discrete Fourier Transform, idft, (though missing a factor of T1u , which has been discarded here for simplicity). The consequence is that the sub-carriers can be extracted by applying the discrete Fourier transform, for which there exist an efficient algorithm, namely the Fast Fourier Transform, fft. 21

Mode A B C D

f s = 48 kHz

fft radix stage ⟨k, ..., 1⟩

1152 1024 704 448

⟨4, 4, 4, 2, 3, 3⟩ ⟨4, 4, 4, 4, 4⟩ ⟨4, 4, 4, 11⟩ ⟨4, 4, 4, 7⟩

104 92 † †

samples ⌈Tu ⋅ f s ⌉

Spectrum occupancy 1 2 3 4 116 104 † †

204 182 † †

228 206 138 88

412 366 † †

5 460 410 280 148

† Modes C and D are not defined for 9 kHz channels or half-channel modes (too small bit-rates).

Table 2.2: fft radix configurations and sub-carriers for the robustness modes of drm. This shows why f s = 48 kHz is an appropriate sample-rate for drm, because fft stages can be done efficiently – the radix-stages given here are from the kiss fft library’s strategy. Refer to Figure 2.8 to see what the spectrum occupancies correspond to.

One am channel, ∆ f c

∆ f c ≡ 10 kHz, for occupancies {1,3,5} ∆ f c ≡ 9 kHz, for occupancies {0,2,4} Occupancies {0,1} Occupancies {2,3} Occupancies {4,5}

Figure 2.8: drm Can be configured to fit the broadcast license of the station, the signal can even be simulcast with an am signal.

Hence the carrier extraction is done using the fft d ∝ FFT {s} N=Tu ⋅ f s

(2.11)

Where Tu ⋅ fs is the number of samples per symbol. The number of sub-carriers in drm ofdm symbols, are related to two parameters. Mainly the symbol period, Tu , which is defined by the robustness mode (Table 2.2). And the bandwidth of the signal (the station can occupy as wide a spectrum as to fit their license, see Figure 2.8).

A Note About FFT Algorithms The most demanding algorithm in the ofdm processing is the extraction of sub-carriers, therefore a brief walk-through of some common fft properties follow here. The asymptotic worst-case run time for fft algorithms in general fall in the complexity class O(n log n), whereas a naïve dft calculation has a complexity of O(n2 ). fft algorithms are divide-and-conquer algorithms, i.e. they work by dividing the problem of taking an N point dft into k smaller dft problems (stages), of taking 22

x(0) x(1) x(2) x(3) x(4) x(5) x(6) x(7)

X(0) X(4) X(2) X(6) X(1) X(5) X(3) X(7)

× × × ×

× ×

Ψ08 Ψ18 Ψ28 Ψ38

stage 3

× Ψ0

Ψ08 Ψ28

× Ψ0

8 × Ψ08 × Ψ2

stage 2

× Ψ0 stage 1

Figure 2.9: Signal flow graph, showing an 8-point fft calculation using, three stages of

radix-2 butterflies (Decimation in frequency). At red lines the constant is subtracted instead of added

smaller M j , j = 1, ..., k, point dft’s, where M j < N. This is exemplified by the common radix-2 fft that is sketched in Figure 2.9. fft algorithms exploit that the so called twiddle-factors, denoted Ψk in Equation 2.2, show properties of symmetry (Equation 2.12) and periodicity (Equation 2.13). Thus instead of rotating12 each sample with all the twiddles (as in dft), a pipeline is made where some rotation is done at the end of each stage. The calculation, however more effective, yields the same result as dft. 2π

N ΨkN ≡ e jk N = −Ψk+N/2

(2.12)

N ΨkN = Ψk+N

(2.13)

The fact that the number of samples can be factorised to 4 (at fs = 48 kHz) in Table 2.2, for all drm modes, is hardly coincidental. The explanation is that fft algorithms that use the radix-4 butterfly are particular efficient. Many of the phase ○ rotations become trivial (i.e. the rotation is 2π 4 = 90 , so it comes down to swapping and inverting the imaginary and real parts). A radix-4 butterfly algorithm can be expressed as a linear transformation: ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

X(0) X(1) X(2) X(3)

⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣

1 1 1 1 ⎤⎥ ⎡⎢ 1 − j −1 j ⎥⎥ ⎢⎢ ⎥⎢ 1 −1 1 −1 ⎥⎥ ⎢⎢ 1 j −1 − j ⎥⎦ ⎢⎣

x(0) x(1) x(2) x(3)

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(2.14)

When these are combined to stages, the input x = [x(0) x(1) x(2) x3]T is multiplied by twiddle factors before each stage, as when combining radix-2 stages. This involves 12 complex additions and three complex multiplications (Ψ0N = 1). By performing the additions in two steps, it is even possible to reduce the number of 12

a twiddle factor multiplied by a complex number rotates the complex number

Operation o ∶ C2 ↦ C Multiplications Additions

fft algorithm radix-4

radix-2

⌈ 83 N log2 (N − 2)⌉ ⌈N log2 (N)⌉

⌈ 21 N log2 (N)⌉ ⌈N log2 (N)⌉

Table 2.3: Number of arithmetic operations involved in typical radix-2 and radix-4 fft

algorithms.

complex additions to 8. This can be shown by expressing the matrix as a product of two matrices, and evaluating them from right to left. ⎡ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎣

X(0) X(1) X(2) X(3)

⎤ ⎡ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎦ ⎣

1 0 1 0

0 1 0 ⎤⎥ ⎡⎢ 1 0 1 0 ⎤⎥ ⎡⎢ x(0) ⎤⎥ 1 0 − j ⎥⎥ ⎢⎢ 1 0 −1 0 ⎥⎥ ⎢⎢ x(1) ⎥⎥ ⎥⎢ ⎥⎢ ⎥ 0 −1 0 ⎥⎥ ⎢⎢ 0 1 0 1 ⎥⎥ ⎢⎢ x(2) ⎥⎥ 1 0 j ⎥⎦ ⎢⎣ 0 1 0 −1 ⎥⎦ ⎢⎣ x(3) ⎥⎦ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶

(2.15)

First evaluate this to col. vector

The number of arithmetic operations involved in a radix-2-only versus a radix-4only algorithm is outlined in Table 2.3. Whether the fft is implemented in hardware (e.g. f pga), or, as in this case, software, the multiplications will often take longer time13 than the additions, and the addition times often becomes negligible. fft transformations can be done much more efficient, both in terms of power and time, by using a dedicated hardware component, or a dsp processor, that will do multiply-and-accumulate instructions very efficient. However, this project is targeted at mcu’s. Taking for example the drm mode B, that can either be done using radix-2-only or radix-4-only, the number of operations required is shown in Table 2.4 for the different algorithms. As we use fixed point to represent numbers, we note that another advantage of radix-4 algorithms are that each stage can do scaling by shifting the result one bit to the right, since a usual (bi-directional) scaling factor is √1 = √14 = 21 . N

2.8

frame synchronisation

Since each ofdm symbol must be interpreted differently, both in terms of reference and data cells, a way of enumerating symbols is needed. 13 In a few risc processors, multiply instructions are single-cycle – However arm 9 takes 4 cycles to multiply two 32bit integers

Operation o ∶ C2 ↦ C Multiplications Additions † Speedup mult. † Speedup add.

dft algorithm fft radix-4

fft radix-2

Naïve dft

3,839 10,240 273 102

5,120 10,240 204 102

1,048,576 1,047,552

†against the dft

∆

Pi lo t1

s4 s3

s7 s6 s5

s9 s8

s1 s 10 1

s1 s 12 3

s1 s 14 5

s s1

Table 2.4: A comparison showing the number of complex multiplications respectively additions involved in extracting a symbol from drm mode B, using different dft algorithms. It is seen that the speedup is two orders of magnitude.

Figure 2.10: The reference cells involved in a drm mode B transmission. ( ) scattered

pilots, ( ) continual pilots, and ( ) time references, note that the continual pilots double as time references (i.e. they are consistent)

In drm, the first symbol of a frame (i.e. s0 ) carries a series of so called time reference cells that can be recognised by the receiver, see Figure 2.10. The correlation of a symbol with the known time reference cells (or some of them [16]) can then be done by the receiver, to identify the first symbol.

2.9

equalising

ofdm links may indeed be favorable to single-carrier links in terms of equalisation. Where single-carrier links crucially rely on complicated equalisers[26, page 49], multicarrier links consist of sub-channels, that are so narrow that they can be equalised by 25

multiplication by a complex constant. ofdm links can be designed to not use equalising at all. This is the case with dabradio, that use a technique known as differential detection. In differential detection the sub-carriers are not coded with absolute values, but rather with differences, so the demodulator (e.g. dqpsk-demapper) must cope with relative positions instead of absolute positions in the modulation constellations. Other ofdm links, like drm, use a technique known as coherent detection. In Coherent detection, a fraction of the many sub-carriers are assigned to known references. These known reference tones are scattered throughout the frequency spectrum at even spacing. They can then be used to measure the channel response at particular frequencies. When the channel response for every scattered pilot is known, interpolation can be used to estimate the entire channel envelope, in a process known as channel estimation.

Channel Estimation Channel estimation is used to estimate a transfer function, H(n), for the channel. The transfer function is based on the a priori known scattered pilots. The transfer function for the transmission medium is the relationship between the output (i.e. the received sub-carriers), d, and the input (i.e. the transmitted) pilot tone, (s) ℵn . H(n) then becomes ⎧ ⎪ ⎪ ⎪ H(n) = ⎨ ⎪ ⎪ ⎪ ⎩

(s) κn (s) ℵn

undefined

n∈S

(2.16)

n∉S (s)

where S spans the set of all scattered references for the current symbol, and κ n is the position (i.e. carrier-index) of the nth observed scatter pilot. These positions for drm robustness mode B, can be seen in Figure 2.10. H(n) now constitutes the channels effect on certain sub-carriers. To get an estimate for the carriers at positions n ∉ S, we interpolate between the known values. However, some of the observed scattered pilots may be distorted by noise and interference. If trivial interpolation is used, the noise fluctuation will influence the channel estimate. Therefore the statistically optimal solution is to use an adaptive filter such as a Wiener filter [26]. Other interpolation methods include un-filtered trivial interpolation, such as zeropadding [30], linear interpolation [29], Forsythe polynomials [29], or simply taking the nearest adjacent estimate. 26

phase change, φ

0.5 0 −0.5 −1 0

103 2

symbol, s

14 -103

sub-carrier, d

103

0.5

0 0

−0.5

-103

6 8 symbol, s

Figure 2.11: A channel envelope that distorts the phase with a frequency dependent

amount φ. This figure is compiled from the Voice of Russia signal, that we will return to later.

Estimation in 1D vs. 2D The effect of multiple paths may result in either destructive or constructive interference at any particular sub-carrier. This phenomenon is known as frequency selective fading. An example of a frequency selective channel can be seen in Figure 2.11 In a moving vehicle such as a car, the reception paths will change over time, and hence the fading will change over time. This is then time selective fading. If the time selective fading is not to severe, one may choose to interpolate the transfer function in 2 dimensions (i.e. save the estimates from different symbols and interpolate between these – we will call this approach inter-symbol-interpolation or 2D interpolation). Otherwise the interpolation can be based solely on the current symbol. In that case we only use inter-carrier-interpolation and call it 1D interpolation. 27

The nature of the scatter pilot pattern in drm, where the reference cells are displaced along both axes (Figure 2.10) invites to interpolate in both frequency (inter-carrier) and time (inter-symbol) directions. This way, more sub-carriers are exercised because the scattered pilot positions alternate between symbols.

2.10

recognising drm spectrum

In drm, three continuous reference tones, the continual pilots, are assigned to the sub-carriers present at frequencies 750 Hz, 2250 Hz and 3000 Hz, the first of these, Pilot 1, can be seen on Figure 2.10. These are placed in accordance with the guard time duration, so that they will have an integer number of periods in a duration of Tg + Tu , with the practical implication that their phase remains constant for every symbol (except for odd symbols in mode D). As an example, in mode B, 750 Hz is sub-carrier 16, and the guard time, Tg is 1/4⋅Tu . Then the 16th sub-carrier has 16 periods in Tu , and it will have 16⋅ 45 = 20 periods throughout the whole symbol duration, Ts , and thus the phase remains consistent for all symbols. Again we are confirmed in fs = 48 kHz is an appropriate frequency, because kHz 750 kHz = 48000 750 kHz = 64 is an integer number of samples. All pilot tones (both continual and scattered) are boosted in amplitude, but only the continual pilots remain at fixed sub-carrier positions – the scattered pilots alternate positions, so their average energy will never be as high as the continual pilots. The continual reference tones make it trivial to identify a drm spectrum. The process involves transforming a signal into the frequency domain and comparing the amplitudes at these distinct frequencies (that are invariant for all robustness modes).

2.11

summary

In this chapter we have examined the advantages and the disadvantages of ofdm links for radio broadcasting. To summarise the main points, ofdm can provide good properties in terms of multipath reception, which is particularly useful for lowfrequency broadcasts, due to the special wave characteristics. It has also been noted that ofdm by itself is not a very useful link, because a number of the sub-carriers that make up the link may indeed be distorted by interference or attenuated by fading – e.g. from destructive interference from multiple paths. To put the c in cofdm and make it error-tolerant, error coding is used, as described in Chapter 3.

Chapter

CHANNEL DECODING

3.1

introduction

In this chapter we define and discuss the aspects of coding and decoding channel content with a special eye on how it is done in drm. But before these specifics a few definitions may be useful. VidkjĂŚr [36] defines modulation as â&#x20AC;&#x153;the process of transforming a baseband message to a form suitable for transmission through the channel in considerationâ&#x20AC;?. In the context of drm this is the process of mapping a coded source bit-stream into a set of cells suitable for ofdm transmission. Conversely demodulation is the process of recovering an encoded bit stream from a set of ofdm cells. The term encoding is used to describe the process of translating a stream of source bits1 into a stream of encoded bits. These encoded bits contain redundancy information such that appropriate decoding can recover the original bit pattern, even under circumstances where errors occur.

3.2

overview of the encoding and decoding process

In this section, the processes required for demodulating and decoding a drm signal is outlined. The contents of different logical channels, is also briefly touched upon. The outline is limited to the extent that is necessary to comprehend the later detailed description. 1

What these bits represent is not important at this level of abstraction.

ofdm conf.

fac

modulation

Modulation conf. Time / Date Service/station names Genre

fac’ sdc’ msc’

sdc

Language

dispersion gen.

Payload protection Coding conf.

Audio feeds Audio feeds

coding conf.

hpp,lpp

CODEC parameters source coding source coding (CODEC) (CODEC)

msc

Figure 3.1: The content of the streams that are contained in the drm multiplex.

A good starting point is to look at the data to be sent and the configurations that the station can set up. A block diagram of the source data is shown in Figure 3.1. In the heart of this source data, we find the audio feeds. These are naturally essential for a radio station. The feeds are coded by an appropriate codec. What must be noted at this point is which data is multiplexed into each of the logical channels (fac, sdc and msc), and how the dependency relationship between these are. For example the Modulation conf. in the fac channel dictate what constellations that shall be used for the decoding of the sd c and msc channels. Hence the fac is always encoded and modulated in the same way (qam-4, one level) so it can be received with relative ease. We can say the fac is the outer layer of an onion that has to be peeled away to reveal what is below, if we are using an analogy where the contents is in the core of an onion. The fac, then, reveals how the sd c and msc streams are modulated. The sd c in turn contains a description of how the individual streams in msc are multiplexed (Figure 3.1 suggests that there are two audio streams), i.e. how much of each streams data that is carried in the msc’s Higher Protected Part, hpp, and how much of the data that goes in the Lower Protected Part, lpp. How well these parts are protected is determined by the coding configuration, that is also present in the sdc logical channel. The last step in the process is an exclusive-or gate, that is used for so called energy dispersal, the dispersal generator block on the diagram generates a predictable pseudo-random binary stream. This is to ensure an equal dispersion energy during transmission. Now that the source bit stream has been presented, we can focus on the encoding and modulation that it is subject to. Figure 3.2 depicts the processes involved. 30

modulation/ coding conf.

puncturing pattern puncturing pattern p=1 p=2

∣C p, i ∣

⋮ ⋮ p=l

fec

4∣C p, i ∣

puncturing

∣C p,i ∣

∣C p, i ∣

interleave

∣C p, i ∣

p=1

∑

i enumerates fac,sdc,msc

Figure 3.2: The coding and interleaving process seen from the broadcasters point of view, the input to this diagram is the output of Figure 3.1

The input to the encoder is one of the three logical channels, Ci which is split into l different levels. The number of levels is predetermined. In drm the number of levels is given by the modulation configuration. For each consecutive level, a chunk of size ∣C p,i ∣ is being encoded by a convolutional code. The output chunk has a length that is 4 times larger – all levels and logical channels use the same fec with the same mother code. The encoded output is then punctured. Some bits are dropped in order to reach a higher bit-rate (note the trade off between redundancy and bit-rate) this process is known as puncturing and the selection of bits to be dropped is handled by a puncturing pattern that is determined by the protection level chosen by the broadcaster. Once punctured, the bits are permuted by the interleaver. This is to ensure resilience against narrow-band interference manifesting itself as noise bursts, affecting several consecutive carriers across the channel spectrum. Finally the bits are mapped to a modulation constellation, and passed on to the ofdm generator. The combined process is called multilevel coding and was first proposed (in a slightly different form) by Imai and Hirakawa [21]. The incarnation used in drm is a component code based on punctured convolutional codes (some times it is called perforated convolutional codes). The process of decoding is somewhat similar to the above description in reversed order. Though the same broad outline of steps can be used, some of the stages become more advanced to fully utilise the structure. 31

3.3

multilevel coding

The concept of splitting cell values into levels, is to differentiate the code rate, to better protect small value changes than large ones. The cells amplitude and phase is first divided into coarse levels that are coded with a small amount of redundancy (i.e. we are more confident on the quadrant than on the actual phase). On the next level, they map to finer levels and are coded with more redundancy, and so forth. The term level is used as a logical construct that form a part of the data represented by each cell. There are a number of different ways to decode a multilevel code. Martin and Taylor [25] propose an iterative approach where the recovered bits of one level are used to deduce reliability of the consecutive levels, thus the decoder can exclude certain cell values â&#x20AC;&#x201C; gaining higher confidence in the remaining values. The statistically best results are obtained from utilising an iterative multilevel decoding approach [38]. However, we note here that it is indeed possible to simply decode each level individually, resulting in a less reliable decoding. Only the trivial approach is discussed further on, in the name of simplicity. The number of levels, l, in the multilevel coding scheme employed by drm is defined by the modulation constellation2 . Every bit position, in the demapping process, is associated with a certain level, p â&#x2030;¤ l. Each level, in turn, is coded separately with a different code rate3 , generally with the first levels being better protected, through lower code rates, than the last ones. For the purpose of generalisation, the fac can be considered a special case of a multilevel coding where there is only one level.

3.4

modulation

qam, maps a bit stream to a set of complex values, arranged in a so called qam constellation. The points that form these constellations can be arranged in different ways. In the following we will only discuss equispaced, symmetrical constellations, as used in drm. Furthermore we assume that all bit values are equally likely to occur4 and independent. Pulse Amplitude Modulation, pam, can be considered a special, phase independent, 2

msc.

A modulation constellation is chosen for each of the individual logical channels: fac, sd c and Note that code rate is the reciprocal of redundancy, so low code rate gives high redundancy The dispersion generator ensures this

case of qam. pam-2 has two constellation points, so a pam-2 modulator can modulate two states (i.e. a single bit) to the constellation points −1 or 15 . This could scale to pam-4 or pam-8 by simply adding more discrete values. (See Figure 3.3)

−1a

(a)

−3a

−1a

(b)

Figure 3.3: Different pam systems. (a) A pam-2 system. One bit can be represented per value. (b) A pam-4 system can represent two bits per value.

Note that the constellation points in Figure 3.3 is normalised with a scaling factor a (referred to as normalisation factor). The scaling produce an average power of one, so the full dynamic range of the transmitter is utilised and different constellations are transmitted with the same average power. Clearly the less spacing between constellation points, the more susceptible to noise the demodulation of each point will be. Hence a constellation with 4 states will yield a better Signal to Noise Ratio, snr than say a constellation with 16 states. The thin grey circle in Figure 3.3 marks the amplitude of the average energy in each of the constellations. The normalisation factor is defined so the average energy’s magnitude is 1. Formally an alphabet of a given pam constellation, Q, can be defined as αQ = {m ∈ {1, 2, ..., M}∣(2m − 1 − M)a}

(3.1)

where 2a is the distance between adjacent points and M is the number of points in the constellation Q [32]. Note that this can trivially be extended to a given qam constellation, by adding a second dimension. Think about a M-pam constellation, and a signal where all the M constellation points are uniformly distributed. This signal will have the average energy, E [32]. The normalisation factor, a, can now be found as ¿ −1 √ −1 ⎛Á 1 M ⎞ ⎛ ⎞ 1 À ∑ x2 ⎟ = ⎜Á a= EQ m ⎝ M ⎠ ⎝ M m=1 ⎠

(3.2)

where x m is the amplitude of constellation point n, and α m is the alphabet used in the constellation. 5

−1 and 1 are normalised to the output power of the transmitter (or input power of the receiver).

This calls for an example; take for instance the pam-4 constellation, given in Figure 3.3b, that has states at the normalisation factors αpam−4 = {−3, −1, 1, 3} This constellation has an average energy Epam−4 Epam−4 = 2(32 + 1) = 20 Hence the normalisation factor is 1 apam−4 = √

In general an M-ary constellation can be used to transfer ∣b∣ = log2 M

(3.3)

states. Naturally the higher this is the higher bit-rates are achievable without an increase in bandwidth. Though upper-bounded by Shanon. By extending pam and considering the phase of the signal as well, the bit-rate can be doubled, or the same bit-rate can be achieved more robustly. Formally the alphabet changes from a vector to a matrix of discrete points in the two-dimensional modulation plane, but little else changes. That is Equation 3.1 changes to become two dimensional (R2 ). One way to combine phase and amplitude to a constellation is the Phase Shift Keying, psk, or a combined pam-psk solution. Another important concept when discussing modulation constellations is the Q Q minimum euclidean distance, d min , for constellation Q. Clearly the smaller d min , the more error-prone the constellation becomes. I.e. smaller noise bursts are able to offset cells to wrong points. as

Q The value of d min depends on the geometry of constellation Q and can be calculated Q d min =

min

x 1 ,x 2 ∈αQ ,x 1 ≠x 2

∣x1 − x2 ∣2

(3.4)

Q d min is of course dependent on the scaling factor and is an expression of how wellspaced the constellation is.

While psk and pam-psk variations offer more information than pam, they can be cumbersome to demodulate. Using rectangular qam for modulation allows the demodulation to be divided into two rounds of pam modulation. All constellations in drm are rectangular qam constellations, which allow us to use this trick. 34

3a 1a 1a R −1a

−3a

−1a −1a

−1a −3a

(a)

(b)

Figure 3.4: Phase shift keying constellations. The dashed grey lines represent the phase of the signal and is meant merely as guide. (a) a psk-4 constellation. (b) represents a pam-psk constellation with 8 points.

Constellation pam-2 pam-4 psk-4 pam-psk-8 qam-4 qam-16 qam-64

Q Distance, d min

Scaling, a

2a 2a √ a 2 2a 2a 2a 2a

√1 5

√1 5.5 √1 2 √1 10 √1 42

Table 3.1: Minimum Euclidean distance and normalisation factors for the demonstrated con-

stellation

Modulation in drm drm control and data cells are modulated with one of three qam constellations. The simpler and robust control channel, fac, is always modulated with qam-46 . qam-4 is also known as Quadrature Phase Shift Keying, qpsk, since it can be thought to be a psk that is rotated π4 . The other control channel, sdc, may either be modulated with qam-4 or qam-16 – which one is signalled in the fac. See Figure 3.5a (p. 36) and Figure 3.5b (p. 36) for the constellations of these modulations. For the source-carrying main service channel (msc), where higher bit-rates are 6

This is considered a special case of the Standard Mapping, sm

3a 16

1a 16

−1a 16

−3a 16

1a 4

−1a 4

required, drm supplements the already discussed qam-16 with a qam-64 constellation. Thus each modulation symbol represents up to 6 (coded) bits.

3a 16 1a 4 1a 16

R −1a 16

−1a 4 −3a 16

7a 64

5a 64

1a 64

(b) 3a 64

−1a 64

−3a 64

−5a 64

−7a 64

(a)

I 7a 64 5a 64 3a 64 1a 64

−1a 64 −3a 64 −5a 64 −7a 64

(c)

(d)

Figure 3.5: qam constellations used in drm. Average energy is always 1 because of normalisation.

Different msc content may be split up in several streams. If one stream is deemed particularly important, and does not require a large bit-rate, hierarchical modulation is an option. Hierarchical modulation is a technique that ensures higher robustness of a single stream, by utilising geometrical properties of the qam constellation. Examples of streams that may fit the criteria for using hierarchical modulation could be news updates, since human speech can compressed quite a lot [18]. Hierarchical modulation maps the bits that must be extraordinarily protected to those bit positions, in the qam constellation, that define the quadrant in the complex plane (i.e. they effectively become qam-4 or pam-2 protected). In drm, the bits that comprise the hierarchical modulated stream are referred to as the Very Strongly Protected Part, vspp. 36

For this purpose the drm specification defines three different bit-mappings for the qam-64 constellation. The regular mapping with non-hierarchical streams is qam-64 standard mapping, sm. Then there is one for hierarchical streams where both the real and imaginary parts of the signal carries the hierarchical stream, this is called qam-64, Hierarchical Modulation - Symmetrical, hmsym – note that this emulates a qam-4 modulation. Finally, the qam-64, Hierarchical Modulation - Mixed, hmmix, which carries the vspp entirely in the real part – this emulates a pam-2 modulation. Thus the bits used for the hierarchical part of the signal is protected better than the remaining bits of the msc, since it is unlikely, though not impossible, for noise to shift the signal into another quadrant all together. Naturally the constellations treated here only apply to data and control cells. The pilot cells discussed in Chapter 2 does not fit into these constellations.

Demodulation to Soft Bits For interpretation of the qam values, it is helpful to have a metric to express the confidence of a particular value (soft decision decoding). See Section 3.7 for how the metric is used by the decoder to find the most likely value. Here it suffices to say that for each received cell a number of confidence values is extracted, based on the distance between the optimal constellation point and the received constellation point. We will refer to these reliability estimates as “soft bits”, λ. Each of the three constellation available uses a certain number of levels defined by the coding-parameter and the constellation type. qam-4 constellation contains only one level and is considered a special case of the rest of the qam constellations. qam-16 offers two levels, each representing two bits per cell and qam-64 contains three levels each comprising two bits for every cell (the real and imaginary parts respectively7 ).

3.5

interleaving

During transmission over a wireless channel, fading of the signal may occur. Fading attenuates the signal, and may arise due to multipath effects. A number of consecutive cells might be lost as a result of frequency selective fading. This will result in bursts of errors during decoding, where a large number of soft bits have a low confidence, or are entirely wrong. Of course the receiver does not know if the values are inaccurate, and it follows directly from this that the end result may be wrong as a consequence of these inaccuracies. 7

Never a rule without an exception: hmsym is equivalent to 6 levels with 1 bit each. This is to reduce the bit rate of the hierarchically modulated signal, thereby gaining higher throughput for the remaining multiplex streams.

In order to rectify the problem of error-bursts in the frequency domain the bitstream of each level is permuted before broadcasts. This ensures that even in the face of deep frequency selective fading there are some correct values for the decoder to work with (this will be expanded upon later). There is, however, still the problem of time selective fading. To solve this problem the drm system offers interleaving of data cells after modulation. We will return to this later.

Soft Bit Deinterleaving The drm specification defines an algorithm to interleave the bits in a predictable manner. For each level, p, in each of the three channels, the soft bits are permuted according to the table Π. This table is built dynamically using the input (soft) bit size and a predefined randomisation factor, that depends on the level and the modulation scheme. The algorithm then generates a pseudo-random list of indexes using multiplication and modular arithmetic. For further details of the interleaving algorithm, we refer the reader to the specification [13, section 7.3.3]. For each channel the bit-stream of each level is interleaved according to y p,i = v p,Π i

(3.5)

where y p,i is the output stream on level p and i is the bit-index ranging over all bits in the level (2N for most configurations), v is the input stream and Π i is the value in the permutation table at position i. This stream is then transmitted. Reversing this effect is straight-forward and follows directly from Equation 3.5. Since if y p,i = v p,Π i

Ô⇒ v p,Π i = y p,i

(3.6)

That is: de-interleaving the received sequence of soft-bits is done with the same permutation table, now only reversing the use of the table.

Cell Deinterleaving Cell interleaving is only an option for the msc as both the fac and sd c must be available immediately in order to configure the decoder8 . There are two settings for cell interleaving (called interleaver depths) long or short. The short interleaver offers enough time- and frequency diversity to protect against channels with moderate time-selective behaviour (often lw or mw), according to the drm standard. For short interleaving a single multiplex frame is interleaved according to the same algorithm as described above, though naturally with a different 8 Any delay in the decoding path of fac and sdc would result in an even larger overall delay before the main service could be decoded. This is undesired.

randomisation factor and input length. Short interleaving delays the processing of the signal about 800ms = 2 ⋅ 400ms since the entire multiplex frame must be received for interleaving and deinterleaving to begin respectively. If the channel is severely time- and frequency-selective (sw channels) the standard offers another interleaving method. This method interleaves the cells across a consecutive number (D = 5) of multiplex frames. The output frame is then a combination of data from each of the 5 input frames. The first frames are used to fill the pipeline in both the transmitter and receiver. Once the pipeline is full the content of multiplex frame n is governed by zˆn,i = z n−Γ(i),Π(i) (3.7) where zˆn is the output multiplex frame and Γ(i) = i mod D,

0≤i<N

(3.8)

Where N is the number of available ofdm cells for the multiplex frame. This introduces a delay of 5 multiplex frames before the pipeline is full. Consequently output can be generated at the receiver side after 6 multiplex frames (2.4s).

3.6

puncturing

Many of the discussed techniques rely on the ability to code content with different code rates. drm offers 13 different code rates for individual levels. For each 64 point modulation (sm, hmsym, hmmix) there are four different codings (for the hierarchical part there are also four codings). This is highly flexible, and the configuration can be made at the discretion of the station. Overall code rates range from as low as 0.45 up to 0.78. The station may optimise the code-rates based on the content type of the transmission, available bandwidth and the desired range of broadcast. Achieving a high number of different code rates can be done by implementing a large number of fec coders. drm instead uses a punctured code to derive these code rates from a single mother code. The mother code in drm yields a 41 code rate9 . This effectively means that every input bit results in four output bits.10 We call this a codeword. The punctured stream of codewords is derived from the stream of original codewords, simply by expunging some bits. Which bits to remove is defined by a puncturing pattern described, for each code rate, in the drm specification [13, Table 60]. For instance an 118 code rate may be obtained from the stream of 41 code words, simply by decimating all but 11 of the 32 bits in eight codewords (i.e. deleting 21 bits). 9 1 6

is used in version 3 of the standard (drm+), to accommodate the new mode E. The number of output bits influenced by an input bit is 7 times as much, as the constraint length is 7. See Section 3.7 10

Which exact bits that are deleted is defined by the puncturing pattern given for the code rate. E.g. the drm puncturing pattern for 8/11 is given in Table 3.2. Bit

0 1 2 3

1 1 0 0

1 0 0 0

1 1 0 0

1 0 0 0

1 1 0 0

1 0 0 0

8 (0.72) code rate. w x 11 means input word number x. 1 means the bit is kept. 0 means it is deleted. The first bit of every input code word is kept and the second bit of some of the input words are kept. The rest are removed.

Table 3.2: Example of a puncturing pattern, in this case to achieve an

The output order of the bits, after the puncturing, is always word-wise. That is: all that remains of w i , followed by all that remains of w i+1 , where 0 ≤ i < k and k is the numerator of the code rate achieved by the puncturing.

Depuncturing In order to reverse the effects of puncturing, we must for a moment consider how the decoder works, without being overly specific. The soft bits, λ, derived during demodulation, lie in the range of λ ∈ [−q, q] ⊂ R. The decoding process can be thought of as finding the most likely path through a sequence of soft bits. Each of the soft bits contribute in their respective direction towards either 0 or 1, depending on the the specifics of the decoder and the value of the given soft bit. For reasons not yet apparent (discussed on p. 45), inserting 0 (to restore an erasure) somewhere in the soft bit stream contributes equally toward both 0 and 1. This means that it will not alter the probability of either outcome. The tail bits in drm is punctured differently. To explain the concept of tail bits we must, again, briefly mention the encoder (and decoder). The sequence of bits that are encoded can start and end with a random string of bits, the encoder starts in a known state (all 0) and ends in this same state. This is to drive the decoder to and from a known state. For the initial state the entire encoder shift register is simply initialised to 0. For the final state (K − 1) zero-bits11 is fed to the encoder. The reason for puncturing the tail bits is to provide conformance between the number of ofdm cells available, and the number of bits resulting from encoding the input stream with the chosen code rates. The fac does not puncture its tail bits as it is perfectly aligned: all modes and spectrum occupancies offer 65 fac cells and these give two bits per cell (130 bits). 11

K is a factor called constraint length, which will be described in Section 3.7

There is only one coding for fac (3/5). The fac is 72 bits long and constraint length in the drm encoder is 7. I.e. 2 ⋅ 65 53 − 6 = 72. This perfect positioning is not possible for the msc and sdc while still upholding the requirements of flexibility – there are simply to many combinations. Maximum half of the tail bits are punctured, since they are needed in the decoder. Exactly how many depend on the code rate for each level, remember that levels are encoded and decoded separately.

3.7

convolutional coding and decoding

In this section some theory for convolutional coding is given, since the theoretical results are needed later. After this we will delve into decoding of a convolutionally encoded stream using the Viterbi algorithm.

On the Encoding of Bits Let us begin be defining a number of terms. A code rate, r = nk , for an encoder is an expression for the reciprocal redundancy of information in a code word. In this case r ≤ 1, since it would otherwise be data compression: a related, but different topic. This means that there is more redundant data in a codeword encoded by a r = 41 encoder, than in an equivalent codeword encoded by a r = 31 encoder. drm defines a 41 convolutional encoder and derives all other code rates from this. In general a Convolutional Code, cc, can be thought of as a convolution between the input stream and the encoders impulse response (note the resemblance with fir and iir filters). It can also be considered a Finite State Machine, f sm, this is how we will approach the problem. In the following we will not consider feedback loops in the cc, i.e. we will only consider cc’s with finite impulse response. For our purposes we define a

1 n

cc, C, as12 C = (K, n, P)

(3.9)

Where K is the constraint length, n is the code rate ( n1 ), P is the set of n polynomials that make up the code. The polynomials in P is simply a definition of which memory registers should be combined, using modulo 2 addition with the input bit or the result from the previous 12 This does not capture all aspects of cc’s in general, but it will do for our analysis. In particular it does not capture a nk cc or a cc with feedback.

combination. Here the polynomials are expressed as octal numbers, where a 1 in bit position b i means that the i th memory register should be XOR’ed onto the results from the previous bit combination. 0 means to skip this register. The number of memory registers, m, needed to implement C is m = max(polydeg(x)) x

∀x ∈ P

(3.10)

where polydeg(x) gives the highest bit index for x (the polynomial degree). It follows directly from this that the state space of C is ∣S∣ = 2m

(3.11)

where S is the set of states. The constraint length, K, defines how many bits, in total, the encoder uses to generate an output word; it is defined as K = m+1

(3.12)

K can then be thought of as the length of time (in discrete steps) that an input can affect any output bit. Note that there are varying definitions of constraint length in the literature, where constraint length is defined as the memory size m. We use the current notation as it is the way we consider the problem. The free distance, denoted d f ree , of a code, C, is an expression of reliability in the code d f ree =

min

x⃗1 , x⃗2 ∶x⃗1 ≠x⃗2

w H (C(γ0 , x⃗1 ) − C(γ0 , x⃗2 ))

(3.13)

where x⃗j = {x0 , x1 , . . .}, x n ≠ 0 for some n, j ∈ {1, 2} and γ0 is the all-zero initial state. That is the minimum Hamming weight between two arbitrary inputs that differ in at least one position. It can be calculated with the heapmod method [7]. This expression, however, is a lower bound on the error correction capability for C. More precisely this bound grows linearly with the minimum Hamming weight of a sequence of code words, starting and ending with the all zero state. See Host et al. [20], Jordan et al. [23] for the gory details. If C has free distance d f ree the coding may recover t bits [40] t=⌊

d f ree ⌋ 2

(3.14)

Using interleaving improves the error correction capability by spacing the errors further apart. 42

17 16 15 14 19 18 13

26 27

6 5

37 38

59 58 39

45 46 51 47 48 49 50

Figure 3.6: State transition diagram for the drm encoder. State number can be seen as (binary) content of shift registers. Note the symmetrical nature of the transitions. The overwhelming number of states is why all following examples are on simpler encoders. The all zero state can always be reached in m steps.

The drm Encoder The encoder used in drm is a non-systematic13 feedback-free convolutional encoder with code rate 41 and a constraint length K = 7, i.e. m = 6. The polynomials for this encoder is P = {0133, 0171, 0145, 0133} given here in octal form. The implementation of this can be seen in Figure 3.7 Since m = 6, the decoder has 64 states (Equation 3.11). Each level of the data stream is input into the encoder. This, in turn, generates a sequence of code words that is passed onto the puncturing routine. Applying the heapmod method gives free distance of the drm encoder as 13. [7]

Decoding Convolutional Codes – The Viterbi Algorithm The encoding process can be done in O(∣L p ∣n), where L p is a level in the set of levels, Λ, that is linearly with the input length. The memory constraint on the encoder is 13

The input data does not occur directly in the output.

a i−1

a i−2

a i−3

a i−4

a i−5

a i−6

b0,i

b1,i

b2,i

b3,i Figure 3.7: The structure of the encoder in the drm system. a i . . . a i−6 are input bits. b 0,i . . . b 3, i are the i th codeword. The shift registers (D) can be thought of as delay elements.

γi

k = 0 k = 1 k = 2 k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10

0 1 2 3

Figure 3.8: An example of a full trellis for a ∣S∣ = 4 cc. Dashed lines indicate that the input was 0, full lines indicate 1. The full trellis contains all possible paths from and back to the all zero state – i.e. all paths shown here is a valid input combination of length t = 8.

linear with the constraint length – only the shift registers are really needed For decoding this is not the case – it is somewhat more involved. In 1967 Viterbi [37] proposed an effective algorithm for decoding convolutional codes, which is asymptotically optimal. This algorithm is now commonly called the Viterbi Algorithm, va. The va estimates the Maximum Likelihood, ml state sequence. Based on the received sequence of soft bits, it finds the shortest path of states through a specific type of graph – a trellis. From this state path the encoders original input sequence can be recovered. A trellis is a state-space representation of a discrete-time f sm as it develops over time. It can be considered an acyclic, weighted graph. An example of a full trellis with a m = 2 cc can be seen in Figure 3.8, note that the last two inputs are zero, this is to terminate the trellis in the all-zero state. The trellis diagram for a m = 6 cc has, as mentioned above, 64 states, this can not be captured in a diagram, without cluttering. 44

γi 0 1 2

k = 0 k = 1 k = 2 k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10 0

3 1

Figure 3.9: Trellis for an encoder. Dashed lines means input was 0, full lines means it was 1. Below each state transition line is the output.

Therefore we demonstrate the concept of the va on the m = 2 cc, Ce , defined below. Ce = (K = 3, n = 2, P = {7, 5})

(3.15)

If we let Ce encode the ascii value for “y”, which is 0111 1001, we obtain the encoder trellis seen in Figure 3.9. The Viterbi Algorithm We will now try to explain, briefly, how the va works. Assume we have received j codewords of length n – in total that amounts to n j soft bits. The purpose of the Viterbi algorithm is to find a maximum path through a trellis. The trellis consists of nodes. Each node represents a state γ at time k. There is always j∣S∣ nodes in total. A path in the trellis is then the time development of state transitions. In our scenario we try to find a minimum path through the trellis – the one that differs as little as possible from the most likely path. Initialisation The starting point of the algorithm is the first state – the all-zero state. Before decoding can begin the entire code word sequence must be received, demodulated and deinterleaved. With each state, γ ∈ S at time k, the path distance estimate for the path up to k, pˆ γk , is kept. The previous state, γ˜k−1 , is also remembered so the path can be reconstructed. The distance estimate is set to pˆ γk = ∞ ∀γ, k. The path distance estimates for all γ at time 0 is set to pˆ0γ = 0. Metric Calculation and Forward Search At each time step k, 0 ≤ k < j, we let γ range over all states in S. For the codeword, r k , received at time k a metric is calculated. This metric is then the weight of the path through state γ at time k. The metric is, in our case, based on the reliability information of each bit as received from the demodulator – the soft bits. 45

Since we know that 0 is equivalent to −q and 1 to q, where q is the quantised soft-decision maximum value, the distance between a state γ and a received signal r at time k, under the assumption of input b to the encoder at time k, can be found as n

m(r k , γ k , b) = ∑ ∣r k [i] + q(1 − 2C(γ k , b)[i])∣ i=1

(3.16)

where C(γ, b) is the code word produced by the encoder in state γ with input b. By a[i] we denote the i th element of some construct a. To put Equation 3.16 into words: the metric is the sum of the differences between each received set of soft bits and the ideal set of soft bits for each state at time k. Still assuming that at time k the input to the encoder was b for state γ, the next state γ′k+1 can be determined. γ k then creates a path to γ ′k+1 with the weight pˆ γk+1 = pˆ γk + m(r k , γ k , b) ′

(3.17)

These steps are carried out for all values of b (0 and 1). In summary: at time k each state, γ, knows the accumulated distance of the path, pˆ γk , leading up to itself. The state also know the distance with respect to the received codeword, r k , and the state value γ k under any input, b. Furthermore a path has been added to all possible next state values with the combined weight of the input path and the weight of going through γ at time k. Forward Search Let 1 ≤ k < j and let Υ be the set of paths coming into a state γ ∈ S at time k. The va then compares the path distance from all incoming states pˆ γk−1 ˜ for all γ˜ ∈ Υ, and chooses the one with lowest path distances. Ties are broken arbitrarily. All other paths are discarded. This means that for time k at most 2∣S∣ possible paths are generated and for time k + 1 only ∣S∣ paths survive. The metric is then calculated for γ (as described above). This is done for all states at time k. Backward Search When k = j−1, and no more codewords are available the algorithm searches among all the states at time j − 1, to find the one with lowest path metric (including its own distance). From this, the path is traced back and the inputs recovered. This is possible since there is a 1-to-1 relationship between state transitions and input values. Figure 3.10 illustrates the above process on the previously defined Ce . First all possible paths are searched, and the smallest distance path is kept at each time k for all states. The input in this case was ⃗r = {−3, −2, 1, 2, −1, 2, 2, 0}, that should represent encoder outputs 00 11 01 10. Note that random noise have been added to ⃗r . The decoded value should be 0111. This can be seen to be the first part of the “y” that was encoded earlier. 46

γi

k=0 k=1 k=2 k=3 k=4

9 3 8

7 5 13

8 4 13

7 5 5

7 8 5

9 9 3

4 15 8

1 11

5 7

3 9

8 4

Figure 3.10: Example of Viterbi decoding path

From Figure 3.10 it can be seen that after only four time steps there is already a clear minimum weight path. In fact the longer the input the more prevalent the difference between the “correct” minimum weight path and all other paths become. Computational Performance of the Viterbi Algorithm The va as described above clearly grows linearly with the length of the received data set. For each time step, k < j, all states are examined. It also grows linearly with the number of states. The number of states is a direct consequence of the structure of the encoder – the state space grows exponentially with the number of shift registers in the encoder, ∣S∣ = 2m . Further for each time step and each state, we have to compute the metric, which is n operations on the input soft bits. The encoder output must also be generated, but can be generated upfront, requiring no additional work during metric calculation. Backtracking is done after the final minimum length path have been identified and takes O(n j). In total the computational complexity is O(n j2m ). This is why constraint length is often held low (K < 10). The constraints that this algorithm imply on memory is not negligible. A large state history is needed so it is possible for each state, γ k , at time k, to find the previous state, γ˜k−1 . After decoding, each level has been transformed from a sequence of soft bits to a sequence of bits. This bit-data is now set into a single stream of data for the multiplex part. A separate stream is reserved for the hierarchical part. 47

3.8

energy dispersal

The drm specification defines a pseudo random number generator that all data must pass through. This is to ensure that the energy in the qam constellation is averaged to 1 (in combination with the normalising factor a). This is needed because of the previously stated carrying assumption, that all constellation points are equally likely to occur. This is not the case unless it is somehow randomised, since a long series of 0’s, or conversely 1’s, would result in a sequence of all 0 or all 1 codewords. The length of this sequence would need to be longer than the constraint length, K, but this is not unusual as K is kept small. Energy dispersal is placed at this point in encoding/decoding chain because all source streams are multiplexed on the multiplex frame of the msc. Consequently this is the only place in the chain where all bits are brought together into a single data buffer. The implementation of the energy dispersal random number generator is a stream of bits generated from an initial vector that is added to itself and then fed back. This generates a predictable (since the initial vector is known) sequence of pseudo random bits that can added to the multiplex frame with modulo 2 addition. This allows for recovery of the original bit stream by adding the same pseudo random bit vector to the energy-dispersed bit stream.

3.9

crc used in drm

A number of Cyclic Redundancy Check, crc, words are used in the drm system. In structure they are not unlike that of Figure 3.7 (p. 44), that is the structure of the convolutional encoder – except for a feedback loop, that ensures that we check all the data. crc codes generate check-sum words, used to verify that data is correct, once received. They do this by applying polynomial division modulo two. crc words are often used in communications protocols, and they are simple to implement. The crc words in drm have either an order of 16 or 8. Uses throughout the drm standard include, but are not limited to ▶ The fac has a crc (8-bit). This can be used as a first indicator that the channel acquisition has gone well. ▶ The sdc has a crc word guarding its entire content. ▶ The higher protected payload parts are protected with crc words. ▶ The audio codec makes heavy use of it. 48

Once the crc word is computed on the encoder side, all bits are negated (postinvert). This ensures that appending a zero bit to the input-data enables the crc check on the receiving side to detect the deletion of an initial 1 bit in the data block. Something the crc check would not be able to do otherwise. [31]

3.10

unequal error protection

drm offers a way for broadcasters to differentiate protection levels between parts (or sub-parts, in the case of data streams) of streams. This feature is called Unequal Error Protection, uep. uep is only available for the msc. If uep is used a separate, lower code rate is chosen for some portion of the multiplex. This of course lowers the overall bit rate, since more redundancy will be added to the higher protected part. The encoder then takes care to encode this part of the data separately by keeping it at the front of all levels. This enables the decoder to take the first received data and treat separately. The blocks which are affected by uep during both encoding and decoding is the puncturing and the interleaving. For puncturing, the boundaries between pattern switches are important, since this is what defines the code rate. In the deinterleaver special care must be taken during encoding since if higher and lower coded bits are mixed the receiver may not be able to decode them correctly. Uses for this technique include sending two streams with different protection levels (e.g. an audio stream that is highly protected, and some slide-show images with less protection). Another use case is to have one stream with different protection levels, using an audio codec that can sort the data after how error sensitive it is (e.g. aac), and then protect it accordingly.

3.11

summary

In this chapter we have discussed the techniques that make ofdm links error-tolerant, so that the radio feed can be recovered even if errors are introduced in the ofdm link. We touched upon the reasons for using techniques such as multilevel coding, interleaving, error-coding and energy dispersal and discussed how to reverse the effect, of the individual techniques in order to recover the source signal that originated from the radio station. However, we are not done, because the source signal is still encoded (e.g. audio is coded by an audio codec).

Chapter

CONTENT CHANNELS IN DRM

4.1

introduction

Of the three channels that are used in drm transmissions, two of them are so called control channels, and the third carries the actual content. The first control channel, fac, is related to basic transmission parameters, such as station bandwidth (occupancy) and how the other channels (sd c and msc) are modulated. The fac also crudely describes the radio-stations genre and language. The second control channel, sdc, relates to the content. It describes the content of the data channel, msc, and how the receiver should present this content (e.g. audio codec and parameters) â&#x20AC;&#x201C; if at all. The sd c may also provide labels that describe the content, through a concept called services â&#x20AC;&#x201C; normally the stations name will be used as the label. The data channel, msc, is a multiplex of the different source content that is sent out, it holds the audio feed, and possibly other services. In order to know where the individual streams in the content start and where they stop, the sdc is required.

Streams and Services The drm specification was conceived with simulcast of multiple content in mind, and in order to achieve this, the abstraction level of streams and services was devised. A stream is a part of the msc, that carries a specific type of content. The most common stream type for a radio station will thus be the audio stream. The complement to this is the data stream. The msc will hold up to 4 different streams, of different or similar types, as long as 51

OSI Layer DRM Equivalent auxiliary content (data) application presentation session transport network

audio feeds

present content (e.g. images, weather, ...) uncompress dab mot

playback

display txt

decode mp4

decode txt

dab tdc

drm data units

datagrams

drm data packets

data link

extract stream from frame, crc-check, etc.

physical

Wireless cofdm

audio frames datagrams

Figure 4.1: An attempt to map the drm payload to the layers of the osi model

the bandwidth permits it. A service is a concept to describe a stream, and services are essentially pointers to stream content, two services may point to the same audio-stream, but have different labels. Two services may also point at different content in the same data-stream (e.g. a slide-show and a weather report), or they may point on different streams altogether. If two broadcast stations were to join up to buy a single broadcasting license, they could simulcast their two audio-feeds in the same drm signal and label the two services with their respective station names (e.g. service one labeled “Danmarks Radio” and service two labelled “Radio Merkur”). Note that this scenario is very unlikely (though possible) because of the limited bandwidth.

4.2

fast access channel

The fac is the only channel that is always modulated and coded in the same way, hence any drm receiver needs to start decoding this. The fac is short, simple and very robust, and can be used as a way of detecting the presence of a drm broadcast while scanning a frequency range. The contents of the fac is mainly used to find the spectrum occupancy, the modulation and coding scheme for the sdc and msc channels. It can also be used to reveal superficial information about the station content, for instance there are 16 enumerations for language indication. And 30 enumerations for programme categories, so the fac will at best provide a crude description of the stations content. Whereas more detailed information can be extracted from the sdc later. 52

Cell Positions The fac is always positioned within the first 4.5 kHz (over dc). This enables a drm compatible receiver to decode the fac without making further assumptions on the bandwidth of the signal. The actual bandwidth is then signalled in the content of the fac. The location of fac cells depends on the robustness mode of the broadcast [13, pp. 134–135, Tables 88–91], but the number of cells is always the same (65). Once the fac is decoded the correct spectrum width must be fed back to the concerned processes.

4.3

service description channel

The sdc block extracted from a single transmission super frame, is composed of several entities, some of the most important are described here ▶ Multiplex Description is a mandatory entity, and there shall always be one of these per block. It is crucial because it enables extraction from the msc. The multiplex description will indicate how many streams that are in the msc as well as the length of them and their coding. In case of uep, there are two lengths per stream (i.e. the high protected part and the lower protected part). ▶ Label entities are used to label services, each service can have a label of maximum 64 bytes, encoded with utf-8. Labels are optional, but if present they must be sent in every sdc block. ▶ Application Information is the entity related to non-audible content (i.e. datastreams). If a stream carries multimedia content such as slideshows, it is signalled here that the content is images, and how it should be extracted (packages or stream oriented). Application information is mandatory for each application service, but do not need retransmission in every sdc block. ▶ Time and Date may be sent in an sdc entity. If used, it is sent once per minute. ▶ Audio Information Data relates to the content of a particular audio-stream. This metadata holds information about what codec the audio is encoded with, the parameters for the codec (rates, number of channels, etc.) and possibly some small text messages from the station. Audio information data is mandatory for each audio stream, and must be sent in every sd c block (to avoid latency for new listeners tuning in). Other entities include conditional access, which is reserved for commercial encrypted subscription-based radio. Other entities are reserved for alternative frequency signalling – which signal the receiver about alternative ways to receive the station. 53

hpp

lpp

384 bytes

680 bytes

Stream 0

Stream 2

Stream 0

Stream 1

320 bytes

64 bytes

575 bytes

105 bytes

Table 4.1: The parameters used in Figure 4.2

F1 F1 Stream 0 Stream 1 Stream 2

Service A: Audio (uep) Service A: Data (eep) Service B: Data (eep)

Multiplex part A

part B

Figure 4.2: The segmentation of an msc multiplex with two services, service A consisting of an uep audio stream and a eep data stream, and service B consisting of an eep data stream. The lengths of part A and part B are obtained from the multiplex description in sdc and are given in Table 4.1.

Cell positions The cells that make up the sd c are all the available data-carriers of the first ofdm symbol, s1 , that are not used for fac.

4.4

main service channel

The main service channel is used to submit the actual content. It is required to have the sdc/multiplex description in order to proceed with the extraction of the msc. The multiplex description from the msc will indicate the lengths of the hpp, and the lpp, of each stream. The procedure of extracting each stream is then to first copy the hpp part of the mandatory stream 0, then continue with the next stream and so forth until there are no more streams. Then the lpp part of stream 0 is extracted and appended to the hpp part, and so forth for the remaining streams. See Figure 4.2 for an example with the characteristics indicated in Table 4.1. 54

hpp

lpp

header with border positions

lower protected data

higher protected data

CRC checksums

Figure 4.3: For each aac audio stream, every frame will have a structure like this. If the hpp is zero, all the checksums will still be immediately before the lpp

Audio aac For the aac codec, two samplerates can be used. 12 kHz and 24 kHz. The aac stream is segmented into even smaller frames. If 12 kHz is chosen, there are 5 aac frames per drm frame and if 24 kHz is chosen, there are 10 aac frames per frame. (See Figure 4.3).

Multimedia â&#x20AC;&#x201C; Other than Audio As it is sketched in Figure 4.1, there are different choices when it comes to data transmission via drm. Often several layers are used, to wrap the data and ensure proper reception. Many of these technologies are directly taken from the dab specification. In the drm system, application data, as it is called, can be sent either in packages, or in an entity defined by the application (synchronous streams). If packages are chosen, an abstraction layer named data units can be used on top of packages, to assemble packages in larger segments â&#x20AC;&#x201C; these segments are then rejected if any of the packages are invalid. This can for example be used to transfer files (of up to 256 MByte). The concept of data-units is the subject of the etsi [10] specification. There exist appendices [10] to the drm specification, outlining the integration with dab-technologies. A specific example is the mot dab technology [9] that basically works like a carousel, where chunks of different files are multiplexed for parallel transfer, and then, when the files are complete they are presented for the user. The drm specification does not go into detail about carrying auxiliary contents. The commercial transmitter spark, and the free transceiver dream, demonstrate that further layers often are used on top of the drm specific ones. 55

This is confirmed by introspecting some of the batch signals1 that we have found on the internet.

Cell Positions The cells that make up the msc are all the available cells of all other ofdm symbols than the first, sk â&#x2C6;Łk â&#x2030; 1, that are not used for fac.

4.5

perspective

The fact that data can be simulcast, even at these low rates, could be very useful in situations where internet is not available. The question is just what it should be used for. The danish longwave transmitter in Kalundborg has traditionally sent information such as fish quotas and weather forecasts for the fishermen. This could now be done in the background with data, without interfering the audio programming. And the fishermen and yacht owners could receive the weather forecasts on a special radio or pc.

4.6

summary

In this chapter we have examined the source data that can be broadcast by the radio stations. We have addressed simulcast of several source feeds and shortly discussed the possibility of transmitting content other than audio. This concludes our bottom-up approach of the drm standard.

By batch signals, we mean sampled if signals that has been recorded from a real world rf front-end

PART II DESIGN OF THE DRM DECODER

Chapter

ARCHITECTURE OF THE DRM DECODING SYSTEM

5.1

introduction

In the following section we present the architecture of the proposed drm decoder system. The architecture is split into blocks in much the same way as traditional radio receivers can be decomposed into block diagrams (e.g. mixers, filters and detectors). The interface and shared functionalities between the blocks will be defined in this chapter. This part only concerns it self with design and we will give no references to code. For this we refer the reader to the Doxygen documentation on the accompanying usb disc.

5.2

target platform

The ultimate goal of this project is to have the receiver running on a small embedded processor. For this purpose a kit was borrowed from Linux In A Box, liab, that is based on an arm mcu. The kit, called nanoliab, does not have a way to sample analog signals. However, sampling is not strictly necessary for demultiplexing and decoding a drm signal, the lack of an ad c will be compensated for by using other signal sources. Choosing the nanoliab as a processing platform suggests using Linux as an operating system. This comes with a number of advantages, not the least of which is portability â&#x20AC;&#x201C; the entire decoder can be developed on a normal PC and simply moved 59

to the target platform for testing. Any choice of technology is essentially a trade-off and this is no different. All the comforts of a real operating system comes at a price: Overhead. The core of the nanoliab platform is a simple 200 Million Instructions Per Second, mips mcu. Now this has to drive both the Linux kernel and the decoding process. Another drawback in this choice of processing platform is that the nanoliabs arm mcu has no Floating Point Unit, f pu. This means that all computational benefits of fast and effective floating point operations are not available. This naturally implies that all mathematical operations must be done in integer arithmetic. These integers can then be seen as fixed point numbers if this is desired. Despite the overhead of running Linux and the difficulty of making decimal calculations, the choice is still justifiable. As the trend has been for more than 30 years1 that ever faster micro-processors come to the market. The arm family of microprocessors has a concept of bi-endianness, which means that the instructions can either be configured to use little-endian or big-endian. In order to change from one endianness to another, all the programs (the operation system, the kernel and the radio receiver) needs to be compiled for the chosen endianness. The consequence is that the development can be done relying on pc endianness. The endianness for regular pcâ&#x20AC;&#x2122;s (the x86 architecture specifically) is little endian.

5.3

creating a drm decoder chain

Based on the original block diagram of the decoder, Figure 1.2, blocks are formed such that each block performs a finite function and the resulting signal stream is changed or manipulated into some new form relevant to the decoding process. The result is that functional blocks from the original block diagram are put into larger chunks. these new blocks will then fit together somehow. We consider the point where sampled data is input as the starting point of the system. In general, functional blocks in the original block diagram (Figure 1.2) that read the same input are put into the same processing block. This relaxes some of the block dependencies. For instance the ofdm demultiplexing is depending on symbol boundary synchronisation. These two blocks are very related and are therefore integrated into a single block. The result is outlined on Figure 5.1 We have identified the following operational blocks: downmix estimates the if frequency, down-mixes the signal and produces the resulting I/Q stream. 1

Mooreâ&#x20AC;&#x2122;s law

rf front-end

downmix

ofdm

match

freq. synchr.

time synchr.

frame synchr.

resample

ofdm demux

equalise

Mode

decode reinflate

deinterleave

fec decode

qam demap

recombine

cell extract.

dedisperse

payload present

Occupancy

Figure 5.1: An elaborated version of Figure 1.2, showing which of the function blocks that are mapped to what processes in the design

ofdm identifies ofdm robustness mode and symbol boundaries and produces cells, that it extracts from the ofdm symbols. match detects and equalises errors in phase and magnitude of the cells. Detects frame boundaries and generates a stream of frames. decode identify super frames and decode multiplex frames. Produce a series of (sdc and msc) logical frames. Also detects the spectrum occupancy. payload decode content of payload. Produce output. The design and architecture of the drm decoder chain is based on a few key principles well-known in the Unix philosophy.2 ▶ Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new features. ▶ Rule of Modularity: Write simple parts connected by clean interfaces. ▶ Rule of Composition: Design programs to be connected to other programs. ▶ Rule of Transparency: Design for visibility to make inspection and debugging easier. 2 Today the word philosophy would most likely be replaced by design and/or architecture, but we uphold the tradition here.

â&#x2013;ś Rule of Optimization: Prototype before polishing. Get it working before you optimize it. These are distilled versions of different guide lines given by the elders of the Unix community and have been brought together by Raymond [33]. While they may seem somewhat tongue-in-cheek, they are indeed practical advise to consider during both design and implementation of systems such as this. These guide lines are applied where relevant in the division of blocks into coherent and self-contained pieces. In total they form the kiss principle: Keep It Simple, Stupid. buffered IPC

buffered IPC

Funct. Block A

buffered IPC

Funct. Block B Matlab probe

Figure 5.2: Illustration of the interaction between subcomponents

The drm decoder chain described above consists of a series of independent processing blocks which communicate with each other. This permits each of these tasks to be run in parallel, with the output of one being fed into the next in a pipe lined fashion. This suggests a multi-threaded paradigm. Multi-threaded applications have the benefit that while one thread of processing is blocked doing I/O (e.g. writing samples to the sound card) the others may still prepare data for presentation. In order to fully exploit the full-blown Linux kernel, the decoder chain will be running on, we let functional blocks be implemented as normal processes. This approach also relaxes synchronisation requirements on multi-threading, that can be hard to get right. We let the operating system do what it does best and schedule tasks for us. An inter block communication- and synchronisation framework will be engineered to connect input and output of the blocks so it forms a receiver. This framework must facilitate functionality such as feed back mechanisms and synchronisation of signals. We will use the mechanisms that the operating system offers to solve most of these problems. This is presented in Figure 5.1. For further details on the operation of the interface see Section 5.4. The technique of piping streams is what we particularly have in mind in Figure 5.2. The result of this block division is that we have a simple way to describe a multithreaded solution for decoding of a drm signal. Another implication of this is a well-described interface for the involved blocks, making it easy to test them separately with test data and even to measure their input and output (as a radio engineer would do with an oscilloscope), see Figure 5.2. 62

5.4

linking the chain together

The actual data transfer between processes will take place using Unix streams.3 This means that stdin and stdout or any other Unix pipe can be used to connect to blocks. This is easy to configure for testing and can be “hidden” in a final product. All status information will be written to stderr. Using files, as opposed shared memory, has the advantage of synchronising the input and output of the processes, since read() and write() functions are blocking. We realise there is an overhead in using streams, but see this mitigated by the simplicity of input/output operations, i.e. no need to select() and handle timeouts. Using files also makes sense, since main data flow in one direction only and there is always a single producer and a single consumer of data. The content of these streams will generally be binary data, raw values. This is done in spite of the fact that the Unix tradition is strongly based in textually oriented stream content. We choose binary streams for the main data flows here to avoid continuous conversion between representations. Instead we strive to provide tools that may “tap” the pipes between processing blocks and present the data to the user in some format. Signals in I-Q For quadrature signals, the in-phase and the quadrature parts are interleaved, but they do not have a separator. The first 16-bit word read is the real part and the next word is the imaginary part. Complex Signals Complex signal (e.g. data carriers in the frequency domain) may be sent between processes. For each part of the complex value a 32-bit little-endian integer is reserved. The complex number is sent with the value of R and I interleaved with the real part preceding the imaginary part.

Parameters Many of the blocks depend on drm parameters, in order to decode a particular station. These drm parameters will however sometimes first be determined later in the chain, and hence they need to be fed back to each process that needs them. This can be done in many ways, but since the parameters do not contain large data entries and since they remain invariant for long periods of time, they could be written into files, of a ramf s 4 , and when they are updated a signal (SIGINT, SIGUSR1 or SIGUSR2) can be emitted to all the processes that subscribe to that parameter. The 3

For historical reasons the type of the data structure is actually named FILE which has caused some confusion. By stream we here mean FILE. 4 ram Filesystem – a ram filesystem can be used to write variables to, in order to avoid needless write cycles to the flash

processes that receives the signal then updates the values based on the content in the file system, thus distributing the new parameters across the entire decoder chain. The parameter-variables that needs to be fed back are shown in Table 5.1. File name

Description

robustness bandwidth

Which robustness mode (A,B,C,D) Channel bandwidth (spectrum occupancy)

Table 5.1: Parameters that are used to configure the radio, and must be somewhat globally available, the files are located in /var/lib/drm

Communication of parameters will be handled by the shared block interface (see below). Tuning If none of the blocks are able to detect that tuning has been done, an exterior source can signal all the blocks (i.e. reset them), and the whole acquisition can start over. This bit has been left out for the time being, and is not considered further in this report.

5.5

libraries

We employ a number of libraries in the implementation of the receiver. Some of these we will need to create ourselves, others will be readily available and require some encapsulation on our part. Generally libraries will be used where functionality can be shared or such libraries already exist. This is to avoid reinventing the wheel, which is not the purpose of this work.

Shared Block Functionality To make sure the implementation of the individual components is consistent with the framework specification, that has so far been the subject of this chapter, a small library (module_common) is developed. Having a common library to handle setup and teardown of processes, allows an abstract way of controlling the processes, it also simplifies initial creation of new processes. A set of standard switches can be designed, for controlling the behaviour of the blocks. A switch in this context is an optional argument given at execution time. The functionality, that is then provided for each block throughout the receiver, also involves a standardised input/output interface, and a magic word, that can be used to delimit messages as a simple way of synchronising signals. 64

mode downmix

occup

LPF 5 kHz

ofdm ?

LPF n kHz mode known

match decode payload_audio

fac

fac/sdc/msc play audio

SIGUSR Figure 5.3: The data-flow between the blocks and the use of interrupts to signal parameter changes

To give an example of magic word synchronisation; a number of inputs may be expected before the next magic word appears (marking the start of another round of inputs). If the magic word cannot be read when expected, then clearly the settings on the preceding and current block differ. In this case data is simply dropped and the process searches for the next magic word. This is a pattern used throughout the designs and interactions in the drm receiver. The magic word is a 64-bit value chosen such that it is unlikely to occur naturally in most sequences (but it is of course possible) and is easily recognised while scanning through outputs. Using the functionality of the module_common library offers a way of signalling other processes, that also rely on this library, that a change has happened and they should take action. This is depicted in Figure 5.3, where the ofdm block first detects the mode, after which the decode process reads the bandwidth of the signal from the fac (the fac is always carried in the 0 â&#x2C6;&#x2019; 4.5 kHz part of the spectrum) and first then the system starts decoding audio. The module_common library maintains files with process identifiers for each of the running blocks in the receiver chain. These identifiers are used to send signals between processes. They are kept under /var/run/drm/ as per the File Hierarchy Standard [1], but this path is configurable. The mode and spectrum occupancy parameters are kept under /var/lib/drm/, since it contains data that the applications are able to re-generate. Again this is in accordance with the file hierarchy standard, and is also configurable.

DRM Parameter Library The parameters that are needed to configure the system blocks is collected in a library (drm_params). These parameters are derived from other parameters such as robustness mode and spectrum occupancy, that can be found with module_common. 65

When a mode change is issued, this library must be re-initialised so the correct parameters are being used. Examples of what is included in this library include the number of cells to extract, the positions of reference cells and control cells. The basic timing information such as the guard time Tg and the symbol time, Tu . Basically it implements all relevant tables from the standard that relates to ofdm, as well as some of the tables related to decoding (e.g. control cell positions).

Fixed Point Arithmetic In the following text, the sets C and R represent a finite number of values, as can be represented by a fixed point number instead of the usual uncountable infinite sets. Newer C compilers have a complex qualifier word [22] that can be used in combination with the fundamental types to indicate that they shall both have a real and an imaginary part. The complex qualifier can unfortunately not be utilised for fixed point arithmetic, as it is only defined for floating point based types. Therefore a set of functions for the basic fixed point operations will be created, and integrated with the fft library. Operations to be implemented are (to name but a few) ▶ Real multiplication, ⊠ ∶ R2 ↦ R ▶ Real addition, ⊞ ∶ R2 ↦ R ▶ Real division, ⧄ ∶ R2 ↦ R ▶ Complex multiplication, ⊗ ∶ C2 ↦ C ▶ Complex addition, ⊕ ∶ C2 ↦ C ▶ Complex division, ⊘ ∶ C2 ↦ C It is assessed that 16-bit data-types might be insufficient for representing the fixed point numbers, but to be safe, 32-bit data-types are chosen. The consequence is that the processor will have to use some 64-bit operations when multiplying and dividing. This is one of the points that can be addressed later on for optimisation. Operations will be implemented as either C-macros, or in-line functions, so the program does not need to make a function call every time two numbers are added. Other important functions include trigonometric functions, especially sin(θ) = √ cos( π2 − θ) ∶ R ↦ R, and the square-root x ∶ R ↦ R. This library is named fxmath. 66

lib

include downmix

ofdm

match

decode payload

audio

data

Figure 5.4: Organisation of the source code

Fast Fourier Transform The kiss fft library seem to have a size and simplicity that allows for relatively easy adaption to use the trigonometric functions and arithmetic operations that will be implemented in the fixed point library. kiss can already use fixed point calculations, but we rather want it to use the same operations and functions as the rest of the code. This library is named fxfft in the distribution, and we stress that it is still to be credited by the kiss library, though it has been adapted here to the purpose of a drm receiver.

CODEC The only mpeg-4 codec that we have found to be capable of decoding aac, is the Free AAC Audio Decoder, faad, that was created by the company Nero and released under the gnu General Public License version 2. The only implementations that we have found for the mpeg-4/celp codec and the mpeg-4/hv xc codec, are those of the reference implementation (iso) and they are not trivial to use. Hence this functionality is left out. The faad codec has the advantage that it can be compiled to use fixed-point calculations, also on the arm-family of processors. It is also the choice of other drmdecoders, such as Fischer [16] and Poulsen [30].

5.6

organisation

The source code is organised in a tree, where each process has its own subdirectory, and the common libraries are placed in the lib directory. See Figure 5.4. 67

5.7

perspective

Though it is found to be a good idea to separate the receiver in blocks, and using the operating systems facilities to make them communicate, it is the goal that they should be merged into one monolithic process in the future. This can be done by looking at all the module_xyz.c files in the distribution. These should probably better be run in each their own thread, and their internal communication should be adapted to use buffers instead of files. This should not, however, be such a daunting task.

Chapter

FREQUENCY ACQUISITION AND SIGNAL DOWN-MIXING

6.1

introduction

The purpose of this block is to take a real (i.e. mono) sampled signal, x n , located at an if frequency, f i , and re-sample it to a complex I/Q signal, s n , located at 0 Hz. The real input source is x n = R {s n â&#x2039;&#x2026; e

(

jn2Ď&#x20AC; f i Tu

)

}

(6.1)

And we seek to extract the I/Q baseband, s n . In order to do so, it must first determine whether a drm signal is present at all, and what the intermediate frequency, f i is. The resulting signal will contain so called negative frequencies, if the drm spectrum is wider than 5 kHz.

6.2

interface

This block is designed to take a standard pcm signal of the form that is generated by sound cards (more specifically it is 16-bit signed little-endian1 mono), and process it to an I/Q signal (i.e. 16-bit signed little-endian stereo, interleaved) and output this signal. This block will not signal any feedback lines, because the only inference it makes 1

For intel-based processors

re-analyse start

SIGUSR1 EOF

state

description

q0 q1 q2

Locate f i Down-mix the signal to 0 kHz No more input

(b) States involved

(a) State transitions Figure 6.1: A finite state machine, describing the main states that are involved in the block.

from the input is the intermediate frequency2 which is not useful for any other blocks since the output signal is down-mixed to dc. The process has been prepared to also accept input from I/Q front-ends â&#x20AC;&#x201C; using a special switch, but the functionality is not implemented yet.

6.3

process flow

The block will go through the states in Figure 6.1. In state q0 the block is analysing the signal to find the intermediate frequency. Once the intermediate frequency is found, the down-mixing will commence, and it will keep running until either the input-stream ends (EOF) or the rf front-end is tuned to a different frequency (should be indicated by a SIGUSR1 interrupt).

6.4

acquiring the dc off set

Corresponding to state q0 of Figure 6.1. To acquire f i from the input signal, we note the characteristics of the three continual pilot references versus the scattered pilot references. Both types of ofdm pilots are boosted with a gain of 2. However, only the three continual pilots are located at fixed frequencies (Figure 6.2) the remaining pilots alternate between data cells (with a lower average power) and boosted cells. Therefore, the scattered pilots will be smaller than the continual pilot cells on average. So a discrete Fourier transform of several consecutive symbols is analysed. 2 As an experimental feature it also decides the bandwidth, but this is not signalled as it is read from the fac channel instead

Magnitude d z1

2×1

d z2

d z3

−5 −4.5

4.5

13.5 15

f /[ kHz]

Figure 6.2: Complex envelope of the signal. The acquisition of the dc frequency is made (c) (c) (c) from the known position of the pilots κ 1 is at 750 Hz, κ 2 = 2250 Hz and κ 3 is at 3000 Hz.

The magnitude spectrum of a frequency transformation with a number of consecutive symbols will result in an envelope much like the one depicted on Figure 6.2. The algorithm used here is based upon 3 fs (c) fˆi = max {∑ d i+κ(c) } − κ1 i N m i=1

(6.2)

(c)

where κ i is the carrier-index of continual pilot number i, and N is the number of points used in the discrete Fourier transform. Where N−1

dk = ∑ xi e

−j2π k i N

(6.3)

i=0

Is the discrete Fourier transform. This is obviously calculated using a real-input fft. We set the fft-size to N = the fourier bins.

f s ≡48000 Hz , so there is a resolution of ∆ f 1 Hz

= 1 Hz between

There are shortcomings to this approach. For one thing, it does not take frequency drift into account, this would be a problem for a radio in a vehicle, where the doppler shift varies with the speed. It is also a problem that the sample rate is not corrected for “sound cards can show high sample rate offsets of up to 50 Hz at 48 kHz nominal sample rate” Fischer and Kurpiers [17]. To solve this problem fully, feedback information can be used from the ofdm demultiplexing process, in order to estimate the frequency offset, as well as the samplerate offset [17].

6.5

down-mixing the signal to baseband equivalent

Corresponding to state q1 of Figure 6.1. 71

When the intermediate frequency is found, the signal is I/Q mixed to baseband equivalent. This is done by taking each of the samples and multiplying it by respectively cos(−2π f i n) and sin(−2π f i n). For in-phase, and quadrature respectively. s n = x n ⋅ (cos (

−jn2π f i −jn2π f i ) + j sin ( )) Tu Tu

(6.4)

Filtering The down-mixing will result in a down-mixed signal centered at 0 Hz, as well as an image that wraps around the spectrum (folds) at fs /2 and − fs /2. The image must be filtered out, for the time-domain correlation-function used in the symbol synchronisation process to work. Because the bandwidth of the drm broadcast is still unknown, the signal could initially be filtered with a narrow low pass filter, i.e. with a cut-off frequency around 4.5 kHz. This would be a sufficient bandwidth to decode the fac channel and thus initialise the receiver (the fac channel is always present in the first 4.5 kHz, and needs to traverse). An interrupt is emitted by the decode block once the correct bandwidth has been found from the fac. This could be used to change the filter coefficients to an appropriate (wider) low pass filter (e.g. 10 kHz). Instead a filter to fit all spectrum widths between 4.5 kHz and 10 kHz is designed. The consequence is that the symbol synchronisation may not be as accurate if the spectrum is only 5 kHz and a lot of noise is present. The filter is realised by a 15th order fir filter defined by a Hamming window. The 16 filter coefficients are calculated in matlab as > fir1(15,9000/24000) Where 15 is the order, 9000 [Hz] is the cut-off frequency and 24000 [Hz] is the Nyquist frequency. The filters phase and magnitude characteristics, can be seen in Figure 6.3. The fir filter is implemented by: N

y n = ∑ a i s n−i i=0

(6.5)

where n is the sample number, y is the output, s is the input. N is the filter order and a i is the i th filter tap or filter coefficient. 72

magn. phase

−2 −20

−6 −8

−40

Magnitude [dB]

Phase [rad]

−4

−10 −12

−60

−14 0

10 15 Frequency [kHz]

Figure 6.3: The Hamming low-pass filter used in the downmix process.

6.6

summary

This process could be improved by allowing tracking of f i . And by completing the I/Q-input capability. More importantly, frequency tracking may be implemented, to take care of drift and offsets. It might also be useful to have several options for filtering, ensuring that at least fac signals can be decoded.

Chapter

OFDM DEMULTIPLEXING

7.1

introduction

This block identifies and demultiplexes all the ofdm symbols in a baseband equivalent signal. Moreover it estimate the robustness mode of a drm signal if this parameter is unknown. The resulting output is the extracted carriers of the ofdm symbols, i.e. the cells.

7.2

interface

The input to this block is the output of the downmix block. That is to say a baseband equivalent I/Q signal in 16-bit signed pcm little-endian with I and Q interleaved. The signal processing done by this block is mainly in the time domain, but its output is produced by the fft and hence is in the frequency domain. The output is the individual cells of each ofdm symbol in complex form. Real part and imaginary part interleaved. Each complex word is then a pair of 32-bit signed integers, little-endian. Every symbol is delimited with a magic word, allowing following blocks to assure themselves that the robustness mode and spectrum parameters are consistent. Symbols are not enumerated. A feedback signal is emitted when the robustness mode is determined. This enables all other blocks in the decoder chain to synchronise and read out the robustness mode. Before the robustness mode is signalled, the decoder chain is in an unknown state, i.e. the validity of the robustness mode can be questioned. However, no data is flowing yet. 75

not acquired start

Mode

SIGUSR1 EOF

state

description

q0 q1 q2

Identify robustness mode Track symbol bounds, extract cells No more input, exit

(b) States involved

(a) State transitions Figure 7.1: f sm for the ofdm block.

7.3

process flow

This block can be in one of two states: acquire robustness mode or track symbol boundaries. Once the robustness mode has been acquired, tracking of symbol boundaries begin. If an interrupt signal is received from another block, robustness mode acquisition starts over again. If the end of file marker is encountered, the process terminates. See Figure 7.1. All transitions in the process flow graph happens after the output buffer has been flushed.

7.4

estimating robustness mode

The robustness mode defines the timing constants and the number of data cells available1 in the drm ofdm symbols. Hence the robustness mode must be estimated before anything else can happen. The robustness mode cannot simply be included in the control-channel, because if that was available, the decoding was initiated and the robustness mode would already be known. Otherwise data extraction would not be possible. A circular argument. Since ofdm demultiplexing is the first instance in the decoding chain where knowledge of the ofdm timing is needed, this is where estimation of the robustness mode belong. The robustness mode of a drm signal can either be found by analysing the signal directly, or by guessing for an arbitrary robustness mode (the set of modes is finite â&#x20AC;&#x201C; and some are more likely given to the carrier-frequency) and validating that the fac can be decoded. If the guess was wrong, the process re-iterates with a guess. While the guessing method performs well if the first guess is right, it does require explicit 1

how many data cells also depend on the spectrum occupancy

1.4

Mode A Mode B Mode C Mode D

1.2 Correlation ×109

1.0 0.8 0.6 0.4 0.2 0

4 5 6 3 Samples ×10

Figure 7.2: Correlation outputs on the same input with different mode settings. The input in this case clearly appears to be mode C, the sample is Project QoSAM

feedbacks and timeouts to be set up, and the worst case tune-in time may be very long. Here we would like to avoid unnecessary feedback loops, since they can be hard to control. The more compelling method, to analyse the signal directly, utilises that the robustness modes have different timings for the useful part of the signal and the cyclic prefix. While this is what gives each mode its characteristic performance in the face of channel noise, it can also be used for robustness mode estimation. In this method, we also guess for a robustness mode, but the round-trip-time is smaller. Figure 7.2 shows one of the correlation functions, discussed in Section 7.5, under assumption (guesses) of different robustness modes for the same input. In this case using Equation 7.2, but the correlation function is not important. It is easy to deduce from the figure which robustness mode applies to the input signal: the one with the deep valleys. By simply summing the valleys for symbol start points (the valleys of mode C in Figure 7.2) for each mode, the right mode must be the one with the lowest average. Naturally there are other methods than this – looking at the distance between peaks (or valleys) is another way. It is clear that only one of the four modes in Figure 7.2 will have a consistent average between the peaks. This average would approach the number of samples in Ts . Ties could be broken by examining the standard deviation (smallest σ wins). 77

We use the later approach: to look at distances between peaks, with ties broken arbitrarily. Common for these techniques is that they apply algorithms that must be developed anyway – start of symbol estimation. Another commonality these methods suffer from is the random behaviour of the correlation parameters under the assumption of an incorrect robustness mode. This implies a small probability that the estimate of robustness mode will be incorrect. In such situations a re-tune (SIGUSR1) would most likely find the correct mode. We simply note here that some combination of the two methods for mode estimation discussed above would be useful, since each of the method has some probability of erroneous detection of robustness mode. Combined them would be more capable of eliminating false positives. We will not deal with the concept of robustness mode detection further. When the robustness mode is found, the fft library is initialised for the estimated mode. This is the process of calculating all the twiddle factors for a particular fft-size.

7.5

symbol boundary synchronisation

The primary purpose of the guard interval as described in Section 2.4 is to prevent isi. If a cyclic prefix is applied in this guard interval it also mitigates ici. One more use for this interval is identification of ofdm symbol boundaries. Equation 7.1 shows a basic correlation function from which others can be derived. If we call the nth complex sample of the I/Q baseband signal, r n , then one way of correlating the signal, would be that of Nee and Prasad [26], p. 80, expressed by (restated in discrete form here): ⎡ Ng ⎤ ⎢ ⎥ n0 = n ∣max(k n ) , k n = ⎢⎢ ∑ r n−m r n−m−Nu ⎥⎥ ⎥ ⎢m=0 ⎣ ⎦n=1,...,Nu

(7.1)

N g = Tg ⋅ fs , is the number of samples per guard interval (i.e. the size of the sliding window), Nu = Tu ⋅ fs is the number of samples between symbols. m is a variable that indexes in the sliding window and n is the current sample number, while n0 is the first sample of a symbol. That is the product of the received signal and a copy delayed by the expected period Tu is integrated through a period of Tg , the process is depicted in a block diagram on Figure 2.6. The correlating function can, of course, be made in numerous ways, a different example is: n0 = n ∣min(k n ) 78

⎤ ⎡ Ng ⎢ ⎥ ⎢ , k n = ⎢ ∑ ∣r n−m − r n−m−Nu ∣⎥⎥ ⎢m=0 ⎥ ⎣ ⎦n=1,...,Nu

(7.2)

x(t) t Figure 7.3: Guard interval – also known as cyclic prefix, when the last part of the symbol

data is copied into the guard interval. It ensures that even in the face of multipath delay an integer number of periods are present. The start of symbol is the point between g 1 and s 1 , this is where the correlation is the highest.

Noting that the structure of Equation 7.1 and Equation 7.2 is a sum for each point in some input sequence, this naturally lends itself to be implemented as a sliding window over the same input: at each time adding the i th sample and subtracting the (i − N)th TT sample, where N is an arbitrary size. This reduces the running time from O( sf s g ) to O( Tf ss ).

Equation 7.2 is chosen as correlation function. That is: to delimit the symbol we are looking for the minimum difference in values. Theoretically Equation 7.2 should be zero, but this is not the case when noise or latent carriers from previous symbols are present. Using the previously described sliding window technique we need to know when to terminate – when a symbol start has been identified. This applies to both the mode estimation routines and the tracking of symbol starts. For mode estimation we say that when roughly 23 of the symbol length has passed, the accumulators are reset. This is easy to implement and is well enough for a rough estimate of the number of symbols in a stream. For symbol tracking any error in offset directly affects the phase of the signal. Therefore the estimate of symbol bounds may not be to crude. Looking at the initial case we know that for a given robustness mode there must be a start of symbol (the transition between guard and symbol, see Figure 7.3) somewhere in the first Ts seconds or equivalently n = Tf ss samples. Applying Equation 7.2 to each sample in the first n samples gives us the initial index, n0 , for the start of the first symbol (under the assumption that the robustness mode is guessed correctly). To find the next offset we could apply Equation 7.2 again for the next n carriers, but because the timings are known, the contents of the symbol is simply skipped, i.e. jump n samples ahead to where the next symbol is expected. With this new position, n0 + n, search for the optimal correlation value can be done in the ±δ samples. This is necessary to compensate for inaccuracies in sample rate: Tfus 79

may not be integer. We set δ = 4 since it restricts the maximum size of the phase error somewhat, and allows for tracking of drifts in sample rate at both ends (encoder and decoder). The result of all this processing is a list of indexes for symbol starts, for the buffered samples.

7.6

extracting sub-carriers

This, crucial part of the processing, is as easy as taking the fft of the input signal. The placement of the fft is at the recently determined symbol boundary, as reported from by the symbol tracking routines. The length of the fft must be the period2 of the useful part of the symbol, Tu , otherwise orthogonality is lost. Tu is defined by the robustness mode, and with a sample rate of fs = 48 kHz, the lengths amount to some nice integers, that are dividable with common fft radixes. As was outlined in Table 2.2. Making the fft process efficient. For each symbol an fft is computed. This results in a list of complex, discrete frequencies. From this list the relevant carriers are extracted: from k min to k max . This is the only place in this block where the current spectrum occupancy of the drm signal comes into play: to determine the number of cells to extract and output. Note that until the fac is decoded by the decode process, the spectrum occupancy is assumed to be the minimum possible value, for the current robustness mode. The result is that only the part of the signal containing the fac is forwarded. Once the fac is decoded, the right spectrum occupancy can be set. Allowing the full spectrum to be forwarded. This implies that there is a moment of receiver “confusion” while the decoding is set up. Once the correct bandwidth is found, internal buffers in other blocks may contain data that is extracted and processed under the previous spectrum setting – not containing the right number of cells. This data will eventually be flushed through the system and discarded. The result of the ofdm demultiplexing is a sequence of ofdm cells. Ignoring channel effects and errors in symbol acquisition, this should be ready for demodulation. In practice these effects must be corrected first though.

7.7

summary

We have discussed design options for a processing block that transform an IQ baseband signal into a stream of ofdm symbols. We have specified that mode detection will work 2

In fact any multiple number of the period of the useful part of the symbol

using correlation values derived from the IQ stream, giving symbol lengths which in turn can be compared to known values. Symbol acquisition will be done by minimising a particular correlation function. Symbol extraction is done using the fft. Next step is to correct any phase and magnitude errors using known reference cells.

Chapter

CARRIER EQUALISATION AND FRAME ENUMERATION

8.1

introduction

Before the signal is mapped to digital, it is equalised and the frame boundaries are identified. This block is responsible for just that. Once the frame bounds has been determined, in accordance with the time reference cells, the symbols are enumerated relative to the frame bounds. First when a frame bound is found, the channel estimation can start. This is because phases and positions of the scattered pilots are defined from the symbol number. A channel transfer function is estimated and the sub-carriers are equalised accordingly.

8.2

interface

The input of this block is the output of the ofdm block. That is a stream of extracted cells in complex format (real part and imaginary part interleaved, each made up of a 32-bit signed integer, little-endian), delimited at symbol boundaries by a magic word. The magic word is a special complex number Chapter 5. The output only differs from the input, in that a symbol index (32-bit signed integer, little-endian) has been inserted immediately before each symbol. And of course the fact that all the cells have been equalised. If no frame bounds are found, there will be no output. 83

8.3

frame synchronisation

Phase Correction With the symbol synchronisation strategy that has been laid out for the ofdm block (Chapter 7) timing errors will happen, resulting in phase changes of the extracted cells [26]. The relationship is: ∆φ i = ω i τ

(8.1)

Where ∆φ i is the phase change of the i th extracted sub-carrier, ω i is the carrierwaves angular velocity, and τ is the timing offset. This problem is also encountered by Poulsen [30] and Pedersen [29], who solves it by interpolating with a linear least squares fit between the three continual pilots before further processing is done. The same approach will be taken here, until we find a better way of doing symbol synchronisation. (c) 2

β=

(c)

(∑3i=1 (κ i ) ) ⋅ (∑3i=1 ∆d i ) − (∑3i=1 κ i ) ⋅ (∑3i=1 κ i (c) 2

⋅ ∆d i )

(8.2)

3 ∑3i=1 (κ i ) − (∑3i=1 κ i ) (c)

α=

3 ∑31 κ i

(c)

⋅ ∆d i − (∑31 κ i ) ⋅ (∑31 ∆d i ) (c) 2

(8.3)

3 ∑31 (κ i ) − (∑31 κ i )

(c)

Where ∆d i is the residual of the received cell with index i (∆d i = dκ(c) − ℵi ), (c)

(c)

with ℵi being the known reference for continual pilot i. And κ i gives the index of (c) the i th continual pilot (e.g. κ1 is the index of continual pilot 1 in the ofdm symbol1 ). The resulting coefficients can thus be used in estimating the timing error of the ofdm symbol as τ=α⋅i+β

(8.4)

And all the cells d i can be restored by rotating them back with ∆φ i as seen in Equation 8.1 [26]. 1

Think: x value

Synchronisation The first symbol of a frame comprise a series of special time reference cells, that are inserted for frame synchronisation. A correlation based approach is an obvious way to go, where the symbol with the cells that resemble the references most is detected. ∣κ∣

(t)

n0 = min ∑ ∣∠d n,κ(t) − ∠ℵi ∣ n

(8.5)

i=1

Here n is the symbol index, assume that a buffer holds as many symbols as a frame contains (i.e. n = 1, 2, ..., ⌈400 [ms]/Ts ⌉). The expression ∠d n,κ(t) is the phase of cell (t)

number κ i

(t)

of symbol n, and ∠ℵi

is the known phase of time reference number i.

The advantage of only considering phases, is that the cells do not have to be scaled, so we can postpone the scaling to the equalising. (t)

A different approach is to complex conjugate the reference, ℵi , and multiply it by the cell, d n,κ(t) , and maximise the result. i

∣κ∣

(t)

n0 = max ∑ ∣d n,κ(t) ⋅ ℵi ∣ n

i=1

(8.6)

This has the advantage that it solely relies on complex operations, whereas to find the phase (as in Equation 8.5) requires an arcus tangens function call. The dream [16] drm receiver uses a different approach, where it only examines pairs of consecutive pilot references.

8.4

channel estimation (s)

Knowing the symbol number, the expected scattered pilots, ℵn , n ∈ S can be looked up. And the channel estimation can begin. The transfer function H(n) is found from the scattered pilots in the current symbol: ⎧ ⎪ ⎪ ⎪ H(n) = ⎨ ⎪ ⎪ ⎪ ⎩

(s) κn (s) ℵn

undefined

n∈S n∉S

(8.7)

Where dκ(s) is the nth observed scatter pilot. And S spans the set of all scattered pilots n for the current symbol. The density and pattern of the scattered pilots is defined by the robustness mode, and can be seen in Table 8.1 and again on Figure 8.1. We learn about the channels behaviour at particular points, according to the scatter pattern of the mode. 85

Mode

Frequency recurrence, ρr

Symbol recurrence

Sub-carrier coverage

A B C D

20 6 4 3

5 3 2 3

20/5 = 4 6/3 = 2 4/2 = 2 3/3 = 1

Table 8.1: The recurrence (reciprocal density) of scattered pilots. The coverage column

∆

shows how many of the total carriers that at some point become references (e.g. for mode A it is every 4th ).

(a) Pattern for mode B

(b) Pattern for mode D

Figure 8.1: Patterns used for scattered pilots, in mode D, every carrier becomes a reference every third symbol

The most optimal way of finding the channels transfer function, is by using a statistical approach, so noise gets filtered out from the scattered pilots – and is suppressed. This can be done with a Wiener filter [26] [29] [16], and that is how dream does channel estimation per default. However, seeing as this is supposed to run on an embedded microprocessor, we would like to do without heavy matrix inversions as is required by the Wiener filter. Therefore, we stick to trivial interpolation, that can then later be optimised or replaced by an adaptive filter. The different channel interpolation techniques that are implemented are listed below: ▶ 1D Interpolation - no inter-symbol interpolation ⊳ Zero-padding inter-carrier interpolation. (-I dft command line switch). ▶ 2D Interpolation - all inter-symbol interpolation is bi-linear ⊳ Bi-linear inter-carrier interpolation. (-I linlin command line switch). ⊳ Zero-padding inter-carrier interpolation. (-I lindft command line switch). 86

By zero-padding, we mean the process of inverse Fourier-transforming the uninterpolated transfer-function to obtain its impulse response, zero-padding between the samples until the number of samples correspond the number of cells in a symbol, and transforming it back to the frequency domain with the Fourier transform. The 2D linear approach (-I linlin) is the default setting in the developed program.

1D Interpolation The only form of 1D interpolation that is implemented uses the zero-padding technique. It is based on a combination of Poulsen [30] and Fischer [16]. out.

Zero-padding will give a very smooth interpolation, albeit it will not filter noise The basic algorithm outline is as follows: 1 Say we have 34 scatter pilots in a symbol with 206 sub-carriers (mode B/10 kHz) 2 We find H(n) in such a way that the 34 values are placed at n = 1, 2, . . . , 34. 3 A windowing function is applied to minimise the effect of aliasing.

4 Then we transform H(n) to the time domain, using an inverse fast Fourier transform. This yields the impulse response. 5 Zero-padding between the samples of the impulse response until we have 206+ k carriers, the k extra carriers are needed because the scattered pilots are not aligned. 6 The window is undone again.

2D Interpolation When 2D interpolating, a scheme is chosen where bi-linear inter-symbol interpolation is done before inter-carrier interpolation. This involves assembling the channel estimates from several symbols to form a matrix A: ⎡ d1,1 ⎢ ℵ(s) ⎢ 1,1 ⎢ ⎢ h2,1 ⎢ A=⎢ ⎢ h ⎢ 3,1 ⎢ d4,1 ⎢ (s) ⎢ ℵ ⎣ 4,1

d 1,n

h1,2

...

d 2,2

. . . h2,n

(s)

ℵ2,2

(s)

ℵ1,n

h3,2 . . . h3,n h4,2 . . . d4,n (s) ℵ4,n

⎤ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎦

(8.8)

Where

di, j

(s)

ℵi , j

is the channel estimate for the i th symbol and the jth carrier. The above

example looks like a mode D scatter, and the values h2,1 and h3,1 would be found by 1,1 4,1 interpolating between d(s) and d(s) . ℵ1,1

ℵ4,1

Next up the inter-carrier interpolation is done. This is either done using zeropadding as shown in the previous section, or using bi-linear interpolation.

8.5

equalising

The compensation process itself is elementary once the transfer function for the transmission media is found. dˆ n =

dn H(n)

(8.9)

Where dˆ n is the estimated channel.

8.6

summary

The techniques involved in this design can be improved substantially. First off, the phase compensation can be eliminated completely if the symbol (timing) synchronisation in the ofdm process is improved. Then the equalisation does not have to be carried out in two steps. The equaliser can be improved by using a statistical filter on the cost of computation complexity. A compromise could be made, instead of using an adaptive filter, one could average the channel history over a period. This would smooth the channels transfer function without the need to calculate new filter coefficients the whole time.

Chapter

CHANNEL DEMODULATION AND DECODING

9.1

introduction

The theoretical consideration described in Chapter 3 naturally sculpt the design we do here. But the design is primarily defined by the decisions and definitions made in the drm specification – we have to “reverse” the effects of the encoding, interleaving and modulation. This block demodulates and decodes the channel information and generates a steady stream of source data. The source data must then in turn be decoded and presented, which is beyond the scope of this processing block. Demodulation and decoding have been brought together in one processing block to lessen the restraints of inter-block synchronisation. Should it be split into three “horizontal” blocks – one for each logical content channel – dependency issues will arise1 and additional data flows are necessary between these blocks. This would seriously impact the difficulty of configuration in both testing and deployment, since the number of possible block interactions rise rapidly and for each block interaction a Unix pipe would need to be set up. The decoding process could also be split into several “vertical” pieces, with each piece representing a sub-block of processing, e.g. qam-demodulation. This also introduces dependencies: qam-demodulation of sd c and msc logical channel can not begin until the constellation has been determined (from the fac). The result is a system that may be easy to debug, but needs very large buffers (and not least referencing between these) to hold data while the previous logical channel are being decoded. 1

fac is needed for, sdc in turn needed before msc

Large parts of the demodulation and decoding process is similar but varied, splitting this into a library would be very tightly coupled: the library would need to know the type of channel that is being decoded. the library and the program applying the library would need large amounts of domain knowledge of each others workings anyway. Therefore it makes as much sense to simply bundle the decoding library with the decoder. The result of these considerations is a design where demodulation and decoding is kept in a single block and data is input and output as a single stream. This gives the overall system layout a consistent design: each block holds an input and an output. It does ease the development somewhat since all dependencies are handled internally, but is also seems â&#x20AC;&#x153;cleanerâ&#x20AC;?: the user should not care that the fac must be decoded first.

9.2

interface

The decoder reads a stream of frames. These frames are constructed as a sequence of enumerated symbols: 0 . . . 400Tsms . Each symbol is preceded by the magic word, then the symbol number followed by the sequence of symbol cells. The number of cells in the symbol is dictated by the current mode and spectrum occupancy as reported by the shared module library. The input stream is discarded until a first symbol is seen and the stream is synchronised. The output of the decoding process is a stream of source data â&#x20AC;&#x201C; the entire multiplex stream. In order to limit the scope of this block the multiplex stream will then be demultiplexed elsewhere. To do this information from the sdc is required, hence the sdc is included as well. Finally there is a special case if a hierarchical stream is present (if mode was hmsym or hmmix), where this must be signalled in the output, since it is not possible to detect in the sdc whether or not hierarchical modulation is used.2 In summary the output then consists of: magic word, hierarchical flag, length of the sdc, length of the multiplex frame, sdc content and finally msc content.

9.3

process flow

The decoding process must read in an entire frame at a time. Each frame contains fac data which enumerates the frames. This way it is possible to synchronise the stream to the start of a super frame by discarding incorrectly numbered frames. Once the start of a super-frame has been identified the sdc can be decoded since setup information for this stream can also be retrieved from the fac. This is placed in the first symbol of the first frame of the super-frame. This implies that the size of the sdc is dictated by mode and spectrum occupancy. 2

The content of sdc entity type 0 must be interpreted differently if hierarchical modulation is used.

A correct decoding of the sdc gives stream information, needed to setup the msc decoder: whether unequal error protection is used and what the respective lengths of the higher and lower protected parts are. The error codings used for each level and for the higher and lower protected parts. This is all given in the sd c entity type 0. Whether the signal signal is hierarchically modulated is known from the fac. The only information this processing blocks needs from the sdc is thus given in entity type 0. It has however been chosen to abstract this away in a more generic sdc decoder library such that the next processing block (payload) is also able to decode this and other relevant parts of the sdc. With the msc decoder set up, decoding of each of the multiplex frames can begin. There are 3 multiplex frames in a super frame. Note that these are not the same as the transport frames that contain reference cells, fac and sd c cells as well. In fact the multiplex frames are not aligned on to the boundaries of transport frames. They are aligned to the boundary of a super frame. The decoding process is very similar for all the logical channels fac, sd c and msc. First demodulation followed by deinterleaving, depuncturing, Viterbi decoding and demultiplexing. These steps are alike for all channels, with the difference being (primarily) the presentation and application of data that follows these steps. This means that the steps described below is done identical for all logical channels unless otherwise stated.

9.4

creating logical channels

Since so many steps are similar in the decoding of each channel it makes sense to simply abstract away (as much as possible) from which channel is being decoded. If we recall that qam-4 sm can be considered a special case of multi-level coding, with only one level, we can create an abstract view of the decoding process as multi-level decoding with an arbitrary number of levels. To this end we construct a single logical channel structure that holds all relevant information for any of the decoding steps to determine what to do. This structure needs to have information on coding rates for each level and buffers used to store data values. Creating constructors (and destructors) for these structures ensures that individual processing points, such as deinterleaving, can be agnostic about the logical channel it is manipulating. The individual operational blocks need only know information such as lengths, code rates and boundaries. In fact constructors and destructors are used heavily were it makes sense â&#x20AC;&#x201C; this enables us to create a hierarchy of data structures. An example of this can be seen in Figure 9.1. 91

MSC

Super Frame

SDC

Frame

FAC

Symbols

Cells

Figure 9.1: Object interaction between input, output and temporary structures. The pink blocks in the diagram represent structures that can be formed directly from input data. The green blocks represent “the glue” and the orange blocks are decoded data ready for output.

Figure 9.1 illustrates how data is treated in this process – it is to be read bottom up. Data is read in (the pink blocks) and forms frames. Once a frame is formed the fac can be decoded. Frames, in turn, form super frames. Once a super frame is generated the sdc and msc can be decoded. Naturally all decoding must access the cells read from the input as the first part of the decoding procedure.

9.5

decomposing the frame

One of the things that is unique to each logical channel is how to identify the cells belonging to each channel. For the fac this done with table lookups (Table 88–91 in the drmspecification [13]). sdc cells are all the cells that are non-reference cells in the first symbol of the first frame in a super frame. For the msc it is all the non-reference, non-control cells in a super-frame. Cell locations with respect to symbol and frame numbers are needed, both for this block and for the equaliser block: in this block we discard the reference cells, in the equaliser the references are what is relevant. Therefore it makes sense to create an interface that allows identification of cell type based on their position in the superframe. This interface is included in the drm parameters library. Given that the logical channel is set up with the correct constellation, demodulation can commence.

Handling Decoded Content For each logical channel the content must be handled differently: the msc content must be sent to the payload decoder. sdc data must be used later and should be cached (in its entirety), so it too can be forwarded to the payload decoder. The contents of the fac is used to setup both the sd c and msc channels, so it too must be cached. Further the fac is used to signal the true spectrum occupancy once it is first decoded. 92

9.6

cell demodulation

Since the values of the constellation on each axis is symmetrical and equispaced-spaced we can, as previously discussed, divide each qam symbol into two parts, the real and imaginary. This reduces the complexity of the demodulation by practically reducing the signal as two identical M2 -pam signals. Given a received symbol, s, from an M-ary3 rectangular qam constellation, Q, we can derive a confidence value, λ, for each bit modulated onto s. The closer λ is to −1 the more likely it seems that the value of s on that level is 0, and conversely for 1. λ is derived in the following way. Let Λ be the set of levels available for the given modulation, i.e. for qam-64 hmsym Λ = {L0 , L1 , L2 , L3 , L4 , L5 } with l = 6. Then let L p ∈ Λ, be the current demodulation level. L l−1 has the property that it decides the most significant bit(s)4 of the demodulated code-word, that is: the one that decides the first quadrant. L l−2 decides the next bit and so on and so forth. We start the demodulation with sl evel elm[l − 1]. For each axis, k ∈ {R, I}, we define a shifted origin with respect to level L p−1 , based on level L p , as L o k p−1

L ok p

L + Ξk p

p ⎛ cQ + 1 ⎞ ⎝ 2 ⎠

(9.1)

Where L p−1 is taken to mean the next level and Ξ the sign of constellation point with L respect to the shifted origin (see Equation 9.2). Furthermore cQp is a factor we call coordinate scale (defined below for level L p and constellation Q). The shifted origin is 0 at the first iteration, i.e. o L l −1 = 0 + 0j always. L

L Ξk p

s k + ∣o k p ∣ L

∣s k + ∣o k p ∣∣

(9.2)

This simply extracts the sign of the received symbol point with respect to the current shifted origin and discards the magnitude. We use it as a short-hand in the following and above. Equation 9.1 implies that on level two of a 64 qam sm constellation, the origin must be shifted to ±4aqam−64 ± 4aqam−64 depending on the value of s. The coordinate scale is then defined as cqam−64sm∪hmsym = (1, 3, 7), cqam−64hmmix = (1, 1, 3, 3, 7, 7), cqam−16 = (1, 3) and cqam−4 = (1) for the respective constellations. I.e. L3 p = 3 Ô⇒ cqam−64sm = 7. 3 4

M in our case case is 4, 16 or 64 For most modulations this is the first two bit. For hmsym this is only the most significant bit

We can now calculate a metric for the reliability of the symbol position. L

L λk p

L −qΞ k p

∣∣s k ∣ − ∣o k p ∣ ⋅ aQ ∣ L

cQp ⋅ aQ

(9.3)

where q is a constant scaling factor that makes the soft bit representable in a finite L number of bits, o k p is the shifted origin at level p, a is the normalisation factor for L

constellation Q and cQp is the coordinate scale. Thus for each symbol two reliability L

estimates, λRp and λI p , are calculated for each level.

For our purpose the value of q is set 127 as this allows each soft bit to be represented as a single octet (byte). Even though this is a higher resolution than the Viterbi decoder needs5 , it offers numerous advantadges over implementing each soft bit as a nibble (4 bit), especially when it comes to interleaving. After this we repeat for level p −1, where p −1 is taken to mean the level that decides the bit after p. The result is l different streams of reliability estimates. We call these confidence values, soft bits. Each level contains 2N soft bits at this point, where N is the number of cells in an ofdm symbol.6 Figure 9.2 shows how a search for a point in a qam-64 constellation could look. The symbol is −5a − 3aj. The search can be thought of as successively dividing the complex plane into smaller quadrants, until the soft bit value of s is filled in all levels. If the constellation applies hierarchical modulation, this information is simply put as the first level (0 even though it is demodulated first). The logical channel is then marked as hierarchical and level 0 is then treated as if it was equal error protected and the codings that apply to hierarchical frames must be applied.

9.7

deinterleaving

Deinterleaving is done for each level in each logical channel, while ensuring that the higher protected part is kept apart from the lower protected part. This has the result that if unequal error protection is used, permutation must be done in two rounds. The permutation tables can be cached and only recomputed once channel reconfiguration takes place. However this is considered an optimisation that can be postponed until the size of the overhead is clear. In the hierarchical case only hmsym needs special care since the input length is one soft bit per cell instead of the usual two soft bits per cell. 5 6

Litterature suggests that 3-4 bits are adequate [19] hmsym will contain only N soft bits per level since it has three levels for each axis.

I q0 q1 q2 000 100 010 110

R 001

101

011 111

111

011

101

001

110

010

100

000

i0 i1 i2

(a) The grey marks are intermediate, shifted origins. The red mark is the current

symbol. The constellation is the qam-64 sm defined in the drm specification. L

(oRp , oI p )

p cqam−64

(λRp , λI p )

2 1 0

(0, 0) (4, −4) (2, −6)

7 3 1

( 75 , 73 ) ( 31 , − 31 ) (−1, 1)

(b) Numerical values for the example in (a). The cor-

rect demodulated value is 011011 and this is achieved.

Figure 9.2: Example of symbol search in a qam-64 (Standard Mapping) constellation.

Cell deinterleaving must be done separately for the msc, i.e. between cell extraction and demodulation.

9.8

depuncturing

The predefined coding setting, for a logical channel, maps a particular code rate to each level of that channel. As previously described the perforation is what determines the effective code rate. This means that for each level, in each logical channel, a different pattern must be used. These erasure patterns are mapped to the individual levels during creation of the logical channel. The drm standard then employs three different cases of puncturing 95

that must be handled individually ▶ Hierarchical levels must be depunctured using their own puncturing patterns. This mode applies the same puncturing pattern – similar to Equal Error Protection, eep. ▶ Higher protected part is (most often) punctured with a separate set of puncturing patterns than that of the lower protected part ▶ Tail bits (the last 24 bit) are punctured with a special pattern.

9.9

viterbi decoding

For the purposes of this project an implementation of the Viterbi algorithm was handed down to us from Mikael Dich of liab. This soft-decision decoder generates the most likely sequence of inputs given a sequence of soft bits and the polynomials for the decoder that generated the soft bits. The Viterbi decoder only needed to be modified slightly in order to be able to correctly decode the data in each level in our setting. Decoding can then be carried out for each level in the logical channel individually. After decoding the levels are merged to form the logical channels that make up the multiplex frame. All that remains is to re-disperse the energy by XOR’ing the entire logical frame with the pseudo random binary stream.

9.10

summary

In this section we have discussed the design of the decoder block. This block takes a series of enumerated symbols and turns recover a stream of source bits. These source bits must now be manipulated to recover the original audio data.

Chapter

PAYLOAD DECODING

10.1

introduction

There are three payload decoder blocks available. One only presents the sd c information, another decodes and plays audio, and the last intercepts drm data units and saves them in temporary files. These blocks can be used in combination to decode the main content of a drm stream. For example, if a drm stream has two streams, an audio stream (service 0) and a slide show (service 1), and we want to retrieve the slide show images, hear the music and read the sdc information, we can do it by chaining the processes together. I.e. payload_audio -s0 | payload_data -s1 | payload_sdc, for unix-like command shells. This can be done because the payload processes repeats their input as output.

10.2

interface

The payload processes share the interface (i.e. they are abi compatible), so the following applies to all of them. The input consists of both sdc and msc entities. Of lengths ∣sdc∣ and ∣msc∣ respectively. Because there are three msc-frames and only one sdc entity per super-frame, the sdc entity should be repeated for all the msc frames. The input stream then consists of an sdc entity and a logic frame. The lengths of the respective entities are prepended before the entities. This sequences is then preceded by the magic word, M, that was also used by the previous blocks, in order to have a simple protocol structure. 97

All in all, this comes to: ⟨M, ∣sdc0 ∣, ∣msc0 ∣, sdc0 , msc0 , M, ∣sdc0 ∣, ∣msc1 ∣, sdc0 , msc1 , . . .⟩ The output stream is an identical replica of the input stream, so the payload blocks can be combined. As a side product, there is some output in the form of pcm samples (sound), files (data) or text (sdc). This is either directed to output devices (sound card/terminal) or stored in files.

10.3

audio decoding

Our focus has mainly been on the decoding of aac streams. This is the codec that is intended for musical broadcasts. It is also the codec by which all of the test-signals in stock (at least all those that contains audio) are encoded. The main reason that we have exclusively used this codec is that there is a free implementation available, and it can be compiled to use fixed-point arithmetic only. The other codecs that are proposed in the drm specification are celp and hv xc.

Extraction & Decoding An aac frame consists of either 5 or 10 aac sub-frames, according to the sample-rate of the aac stream (12 kHz and 24 kHz) respectively. Before these sub-frames, there is a header with a list of boundary positions to the sub-frames. The extraction process for aac streams involves copying each sub-frame and its checksum and invoking the codec. This produces a pcm-stream that can be played back by a sound card.

Playback We will not worry to much about this, there exist many frameworks for playing pcm streams under linux, such as alsa, oss, or higher level ones like jack and pulse. The final choice should first be done when the system is running on the target platform. Though it should be noted that an oss kernel module is shipped with the nanoliab. For demonstrational purposes the alsa framework is used to demonstrate that audio playback works, but the configuration is not thought through or optimised – however, it works. 98

The codec is poorly documented to say the least, and some quirks was encountered during the development. For instance it was found that a few of the batch signals we have collected, needs re-sampling when they are output from the aac codec. More specifically they must be re-sampled from 24 kHz to 48 kHz to sound right. Others however played fine directly from the codec. A simple re-sampling routine was made, using bi-linear interpolation. This could also have been solved by configuring alsa to assume a different sample-rate.

Text Application Audio-frames may carry four bytes of text-data in the end, the text is coded in utf-8, so if the symbols are from the standard ascii alphabet, it amounts to a text-rate of

4â&#x2039;&#x2026;

The factor of and crc sums.

16 20

16 1 â&#x2039;&#x2026; â&#x2030;&#x2C6; 8 [baud] 20 0.4 [s]

adheres to the fact that at least 4 bytes out of 20 are used for header

Whether text-messages are embedded in the audio stream or not, is signalled in the sdc entity (entity type 9).

10.4

data decoding

A data decoder was implemented that is capable of receiving and decoding drm data units, that is an abstraction level higher than drm packets. The process payload_data works by saving each new data unit in a temporary file (/tmp/random-name) However it was soon found that the data unit abstraction layer is not sufficient to decode the data from the batch signal files. Some of the batch signals contain slideshow applications and one contain written news (a format known as journaline). This is because a further abstraction layer is used when transmitting these applications, namely the mot [9] carousel, that was briefly mentioned in the theory chapter. We did not consider data applications important enough to implement a mot carousel, though the work has begun. 99

10.5

sdc presentation

A trivial process has been made that prints the content of sd c streams, there is not much to say about this.

10.6

summary

Three processes was designed and implemented to present the source content for the user. ▶ payload_audio Uses alsa to playback aac streams, and extracts and presents embedded text messages – it can optionally output a pcm stream to a file instead of playing it. ▶ payload_data Creates a file in /tmp/ for each data unit that is received – this has been found to be quite a lot in practice. ▶ payload_sdc Prints the content of the sd c stream. Only the implemented entity types.

100

PART III DECODER PERFORMANCE AND CHARACTERISTICS

Chapter

TEST STRATEGIES

11.1

introduction

To validate that the receiver works and in order to catch bugs a set of tests was designed. Most of these tests are based on the process that would be used for normal radio listening, but some separate unit tests are also created to test individual functionality of particular blocks. The synchronisation between blocks is very important. When a block detects a new robustness mode or spectrum width, the other blocks must take that into consideration. The decoder is tested against a set of well known signals (found on the internet) that can be decoded by the rival dream.

11.2

signals used for analysis

Ideally, we would have had a radio front-end, capable of receiving some of the European broadcasts that are already available in drm. However, we never succeeded in receiving anything with our Elektor front-end. Except for a single am station. Without a functioning front-end, we had two other options for testing. Using sampled signals from the real world or synthesising signals and playing them in real time. Both options was tried. 103

filename on usb image

timestamp

dream_b3_rtl_slideshow 2004-12-06 14:31 dream_b3_bouquet_flevo_NL 2004-01-09 11:04 winradio_b3_luxembourg_rtl 2004-07-24 22:59 winradio_b3_deutsche_welle dream_b3_deutsche_welle before 2005-05-13 dream_a2_deutschlandradio before 2005-05-13 dream_b3_rtl before 2005-05-13 dream_b3_voice_of_russia before 2005-05-13 dream_a3_synthetic_webcam before 2005-05-13 dream_c3_project_qos before 2005-05-13 dream_c3_rnwb before 2005-05-13 dream_b3_deutsche_welle_journaline fredan_b3_deutsche_welle1 2002-10-03 09:45 fredan_b3_deutsche_welle2 2002-10-03 10:45

source

dream ‡ dream ‡ winradio † winradio † dream ‡ dream ‡ dream ‡ dream ‡ dream ‡ dream ‡ dream ‡ dream ‡ fredan ◇ fredan ◇

! ! ! ! ! % ! ! ! ! (!) ! (!) (!)

†http://www.winradio.com/home/g303-drm.htm ‡http://drm.sourceforge.net/ ◇ http://ftp.fredan.se/drm/

streams aud/dat A(0) D(1) A(0) A(0) A(0) A(0) A(0) A(0) A(0) D(0) A(0) A(0) A(0) D(1) A(0) A(0)

Table 11.1: The signals we have analysed. The timestamps was extracted from the sdc where possible, but not all the signals contain a timestamp, therefore we have written the date they were uploaded on some.

Virtual front-end Music files

PC running SPARK

dab Radio

coaxialcable, stereo

Sound card f s = 48 kHz

PC with our receiver coaxialcable, mono

Sound card f s = 48 kHz

audio feed

Figure 11.1: Laboratory test-bench for streaming (real time) tests.

Recorded if Signals Sampled drm signals are not easy to come across. Three sources for if streams has been found (the same as seen in Table 11.1). These files are also included on the accompanying usb-disk. The reason why these signals are such a scarce resource on the internet might be that they cannot be compressed by a lossy audio codec. Lossy audio codecs will remove (hence lossy) parts of the spectrum that the human ear cannot perceive, but is crucial to the decoding of a drm transmission. 104

Real Time Signals The experiment that is lined-up in Figure 11.1, was carried out with a pc, running a free1 version of the german drm encoder spark[15], the very same encoder that Kalundborg Langbølge transmitter has been using for their drm field tests. And another pc running the developed drm decoder. Note the overwhelming number of a/d-d/a junctions in the signal path of Figure 11.1. The dream drm decoder can also be used as an encoder, and some tests was done with this configuration too albeit it does not have as many configuration settings as spark, and we were getting tired with tweaking its source code.

11.3

test execution

Batch Testing The purpose of this test is to go through all the signals of Table 11.1. Whether they can be fully decoded or not, is signified by the “OK” column. Files with a “ ” symbol, can be fully decoded, including the source audio. Files with a “( )” can be decoded, but the source audio codec is not able to decode the audio.

The dream_a2_deutschlandradio sample does not decode. The decoder is not even able to find a single frame-start. This file is decodable by dream. It should be noted that because dream uses the same codec, it is not able to decode the audio streams with a “( )” in the “OK” column.

Continuance Testing The purpose of this test is to configure the real time test depicted in Figure 11.1 and let it run through as long a period as possible, and then see if it will crash or run out of memory. To stress the decoder (and simulate signal dropout) the connection between the two pc’s will be disconnected at random times, and then reconnected again. Results The process described above was carried out between 2010-01-22 and 201001-23, with poor results. The first thing to break down was the spark transmitter, but this is most likely due to the fact that spark is a Java program and the pc running it is somewhat outdated (and running windows). 1

as in no-cost, non-commercial license

105

The next event would happen consistently in the next four tests; the receiver chain stopped after having run for approximately 2½ hours. It turned out that this was due to a memory leak in the payload_audio process that made the process claim increasingly more memory, in the end forcing the operating system to kill it (after 2½ hours). The bug was fixed, and the receiver can now relay DR P3 uninterrupted from a dab radio for 6h12m. But then it dies, and we have not fund the bug yet. This bug also occurs consistently, but the fix seems to be less trivial than freeing some memory (as was the case in the aforementioned scenario). Therefore we will just leave it for now.

Unit Testing Some of the libraries (mostly the fixed-point library and the fft-library) was causing a bit of trouble during the development. In order to properly verify their functionality, unit tests was developped. This gave rise to a lot of unit tests, and at present we have the following unit tests (that can be found under the tests/ directory). These unit tests were developed in an ad-hoc manner to support any given task. They have been adapted runningly, and does not necessarily still do what they started out to do. The unit tests are ▶ drm_params_test.c For testing the drm-parameter library. The process prints the number of carriers and the pilot indexes for a given mode. ▶ ofdm_test.c Outputs the phase and size of all the sub-carriers in a symbol. ▶ enc-test.c Is used to generate graphs for this report. ▶ filter_test.c Was created in order to compare fixed point fir filtering with a floating point implementation – but has also been used to analyse run-time differences. ▶ fxmathtrig_test.c and fxmath_test.c Is one of the most important unittests. As it has revealed (several times) that there was problems in the fixed point math library. ▶ ifft_test.c and fxfft_test.c Are also important tests. They run the kiss librarys fft and a different library, the fastest fourier transform in the west, and compares the output of the two. ▶ sdc_test.c For testing the extraction of sdc entities. ▶ bitmanip_test.c Is for testing the bit-manipulation library, that was implemented to extract certain parts of the sdc and fac. ▶ frame_tool.c Is used to by-pass the match process, by enumerating symbols without equalising them. Note, one has to define the symbol offset manually since there is no time reference correlation. 106

▶ count_interval.c looks for the magic word in a bit stream and outputs the first occurrence of it (the offset) ▶ audiotext_test.c Extracts an audio text from an msc stream. ▶ audiostream_test.c and audiostream_testdata.c Was created while the payload_audio process wass developed, to test if the audio frames was assembled correctly.

Monkey Testing Different things was done in order to provoke the decoder to stop working: Removing the signal was tried under the continuance testing session. Another way of stressing the blocks is to give them unexpected input and see if anything breaks down. This is called monkey testing, because you can use a monkey to generate input. In the lack of monkeys, we used a pseudo random number generator and piped its input into each of the blocks, one at a time, to see if anything would happen (e.g. segfaults or memory leaks). cat /dev/urandom | downmix The example above shows monkey testing the downmix process. Other blocks was tested in a similar manner. The result of these tests was that the blocks did not brake down.

11.4

analysing the signal

A process was developed to plot the signal at intermediate points, using gnuplot. Some examples of such plots can be seen in Chapter 12. This tool is called scope. The scope program was a handy little tool, and was used to generate many of the graphs in this document. It can be thought of as a multi-function oscilloscope for sdr-radios. Strictly for the purpose of accessibility, a graphical user interface was created for the scope process, it can be run from the accompanying usb-disk, and instructions can be found in Appendix A.

107

Chapter

SIGNAL PERFORMANCE AND CHARACTERISTICS

12.1

introduction

One major benefit of having the receiver divided into blocks is the ability to track the signal all the way through. Some of the previous discussed batch signals (seen in Table 11.1), have been analysed in detail. These signals will be discussed here. More signal evaluations are included in Appendix B.

12.2

downconversion

4.5e+06

0.002

magnitude fxfftr

4e+06 3.5e+06

0.0016

3e+06

0.0014 0.0012

2.5e+06

0.001

2e+06

0.0008

1.5e+06

0.0006

1e+06

0.0004

500000 0

magnitude fxfftc

0.0018

0.0002 0

5000

10000 f i

15000

20000

25000

0 -25000

-15000

-5000 0 5000

15000

25000

Figure 12.1: The first 48000 samples of the Voice of Russia allows us to determine the pilot frequencies to a resolution of 1Hz. Left, the real (if) signal (as taken from the file). Right, the downmixed I/Q signal.

The down-conversion process finds the intermediate frequency, f i of the Voice 109

1.2e+09

Sample correlation Symbol start

1e+09

Correlation

8e+08 6e+08 4e+08 2e+08 0 0

5000

10000

15000

20000

25000

30000

I/Q Sample offset

Figure 12.2: The process of determining symbol starts by correlating the cyclic prefix to

the received signal. Here seen for Voice of Russia. The large downward linear slopes is due to a forward skip in correlation search. The near-vertical lines are ramp-ups while summing the initial guard interval.

8e+06 6e+06

symbol no.

⋅10−3

sym: 9 25 1

4e+06

2e+06 0

-2e+06

−1

-4e+06

-6e+06 -8e+06 -8e+06-6e+06-4e+06-2e+06 0

2e+06 4e+06 6e+06 8e+06

−2 −2

−1

2 ⋅10−3

Figure 12.3: Left: An extracted ofdm-symbol from Voice of Russia before equalisation. Right: 30 symbols that has been pre-equalised, i.e. linearly phase compensated.

of Russia file to be f i = 11964 kHz, the same frequency is found by the rival dream. Figure 12.1 shows the result of the 48000 point real fft that is used to identify the intermediate frequency, as well as the result after the signal has been I/Q modulated to be centered at dc (48000 point complex fft).

12.3

ofdm demultiplexing and equalising

The symbol synchronisation can be seen on Figure 12.2. The sharp drops are where the correlation starts. The symbol start is found on a small interval after the correlation value is summed over the entire guard interval. That is: as the deepest “valley” in one of the peaks. 110

3 phase sym no.: 0

2.5 2 0.005 0.0045 0.004 0.0035 0.003 0.0025 0.002 0.0015 0.001 0.0005 0

magnitude sym no.: 0

1.5 1 0.5 0 -0.5 -1 0

100

150

200

250

Figure 12.4: A graph showing an instance of 2D-equalisation (linear by linear). This is a symbol from the Voice of Russia signal.

When the ofdm demultiplexer has processed a symbol, it looks like the somewhat chaotic pattern of Figure 12.3 (left). We attribute the phase changes in the symbols to the time synchronisation, and note that the scatter plot to the right that has been phase compensated looks much tidier. Both plots show the same symbol. for.

Luckily the time displacement only result in a linear phase error, that we can correct

The equaliser utilises an interpolated channel transfer function to equalise all the sub-carriers in an ofdm symbol. Figure 12.4 shows a 2D bi-linear interpolated channel envelope for Voice of Russia. It can be seen why filtering of the transfer function might be a good idea. Namely the phase transfer functions has some quirks and spikes that could be filtered out.

12.4

case studies - batch signals

Now follows a discussion of some of the signals that has been found on the internet, and their characteristics.

Voice of Russia â&#x2013;ś Mode B, Occupancy 10 kHz â&#x2013;ś qam Constellations, sdc: 16 states, msc: 64 states In what has almost become a legend in these drm-projects [30] [29] is the very powerful signal from Voice of Russia. Its high snr allows for smooth decoding. 111

+0, 4

magnitude

+3,×10−3

+0, 2 t (one symbol per line)

t (one symbol per line)

+4,×10−3

0, 0 −0, 2 ϕ −0, 4 −0, 6 −0, 8

+2,×10−3

Figure 12.5: Linearly interpolated transfer function for Voice of Russia. Left graph shows attenuation and right graph shows delay. Both graphs have 200 symbols along the time direction, and 206 used carriers along the frequency axis

What is most noticeable about the transfer function (Figure 12.5) is that the channel behaviour is almost completely frequency dependent, there are no sudden bursts in the time direction during the 200 symbols shown. We may even conclude that the receiver 15 [ms−1 ]) = has been somewhat stationary, as 200 symbols of mode B comes to 200/( 400 5.33 [s], so if it had been received in a car, we had probably seen some effects of alternating multi-paths in the course of this time.

MSC SDC FAC !Use Time Gain Freq

Figure 12.6: qam-Constellation for Voice of Russia. It can be seen that the points form a nicely constellation, and should be easy decodable

The points points in the qam constellation in Figure 12.6 indicate a very high snr, so we expect that the decoding will run smoothly and without any problems, which is indeed the case. 112

Breaking down the individual points in the scatter plot (Figure 12.6) – starting with the qam-modulated points; we note that the content channel msc ( ) is aligned in a qam-64 constellation. And that the control cells that carries the sd c ( ) are aligned in qam-16. Finally we see that the cells that carry the fac ( ) control channel are (as always) qam-4. The ( ) marks unused sub-carriers (in this case the dc carrier, that can obviously not be modulated in an orthogonal way, so here it is set to zero). The reference cells also form a pattern, though not a qam constellation. First it is noted that all reference cells have one of two possible amplitudes. The scattered pilots ( ), that are used for channel estimation, are “scattered” around in a circle, to cover many phase possibilities. It is also noted that a few of the ( ) cells have a larger amplitudes than the rest. This is because the scattered pilots at the outer points of the ofdm symbols are boosted. The time references ( ), occur in the first symbol of a frame in order to identify frame boundaries, and are also scattered around. Finally we see the three continual pilots ( ). Mind you, the continual pilots are included in every symbol. Otherwise it can be said that Voice of Russia contains one audio service, labelled Voice of Russia, it is aac encoded with 24 kHz, mono, and has an embedded text message announcing: “Dear friends! Our frequency will be changed to 9490 kHz on 12-th of January. - Taldom, Russia (E-mail: rc-3-buch@mtu-net.ru).” And the audio plays fine. But the spoken language is (probably) Russian.

Deutsche Welle ▶ Mode B, Occupancy 10 kHz ▶ qam Constellations, sdc: 4 states, msc: 64 states This signal is interesting because it has a time-varying channel transfer function, at least the phase, as is can be seen in Figure 12.7. It is also interesting because the audio does not decode correctly. There are small interruptions within the audio stream. The resulting constellation of Figure 12.8 implies a very poor snr, but yet the decoder is able to decode the frames from that messy msc constellation. This shows the importance of error coding. The only service in signal has the label DW DRM, and the audio (aac, 24 kHz, mono) has an embedded text message saying “DW - RADIO live program from relay station Sines / Portugal operated by Deutsche Welle”. But the audio playback is not very well, it is chunky – and we have not had the time to address the origins to this problem yet. Instinct tells us that it is a problem with the aac extraction or codec initialisation. 113

+0, 4

+6,×10−3

magnitude

+5,×10−3

+4,×10−3

t (one symbol per line)

+0, 2 0, 0 −0, 2 ϕ −0, 4 −0, 6 −0, 8

Figure 12.7: Linearly interpolated transfer function for Deutsche Welle. Left graph shows attenuation and right graph shows delay. Both graphs have 200 symbols along the time direction, and 206 used carriers along the frequency axis

MSC SDC FAC !Use Time Gain Freq

Figure 12.8: qam-Constellation for Deutsche Welle. Note in particular the noisy mscpoints, they are surprisingly decodable.

Radio Netherlands Worldwide – Bonaire ▶ Mode C, Occupancy 10 kHz ▶ qam Constellations, sdc: 16 states, msc: 16 states The above examples were both mode B, and that is indeed the mode of which we have most signals. To show that the decoder can cope with other robustness modes as well we will present a Mode C signal too. Note how robust the msc channel of Figure 12.10 is. This channel is coded too be sent long range. In spite of the good snr that it has been received with. 114

+6,×10−3

+3,×10−3

magnitude

t (one symbol per line)

+4,×10−3

t (one symbol per line)

+0, 5

+5,×10−3

0, 0 ϕ −0, 5

+2,×10−3 −1, 0

+1,×10−3

Figure 12.9: Linearly interpolated transfer function for Radio Netherlands Worldwide. Left graph shows attenuation and right graph shows delay. Both graphs have 200 symbols along the time direction, and 206 used carriers along the frequency axis

MSC SDC FAC !Use Time Gain Freq

Figure 12.10: qam-Constellation for Radio Netherlands Worldwide. This constellation seem very “sharp” and reveals a good snr.

This signal has one service with the label RNWB, and an embedded text message saying “This is a DRM test transmission originating from Bonaire, Netherlands, Antilles, 12.1N 68.3W”. Unfortunately, at present, the text decoder prints the text message prematurely, and therefore the part of the message marked in red is not printed out with the message, unless a verbose operation mode is chosen. 115

The audio stream is signalled by the sd c to aac, 12 kHz, mono. But it will not play at present. Again, we think the problem rests with the payload_audio process.

12.5

summary

A summary has been given about a few of the batch signals that the receiver has been tested with. The tests has revealed some minor errors that needs to be corrected, but we feel that they should be easy to fix.

116

Chapter

COMPUTATIONAL PERFORMANCE CHARACTERISTICS â&#x20AC;&#x153;Testing can only prove the presence of bugs, not their absence.â&#x20AC;? Edsger W. Dijkstra

13.1

introduction

In this chapter the run times of the drm encoder are evaluated. The theoretical worstcase run times have already been presented on some algorithms. Here they are further substantiated. And the computational profiles of each process is documented. In the following we try to get a grasp on the run time behaviour of the developed programs, their average running time and their memory consumption. These are not insignificant, especially in an embedded context, where memory and cpu time are scarce resources. We also compare the time of decoding with another decoder to see the performance benefits of the developed fixed point libraries.

13.2

experiences with the target platform

It was assumed that portability problems would be few and far apart due to the choice of operating system. This assumption failed us. It seems that all processes behave differently, in terms of output, when executed on the target. Or stated more concise: the same input does not produce the same output. Problematic areas that produce incoherent results have been identified and include, but are not limited to the fft 117

library and the Viterbi decoder. Without these crucial operations the receiver chain can not work. fft is fundamental to retrieving cell values and the Viterbi algorithm is crucial in recovering the data modulated onto these. We have not yet been able to determine what exactly causes the inconsistency, that was discovered by comparing the outputs from the target with those of the development platform. We have ruled out endianess issues since both development and test platform makes use of little endian memory access. This reduces the testing we are able to do on the platform to the “dumber” processes. In particular the first two programs, downmix and ofdm are simple and extremely robust state machines. These processes generate data streams almost regardless of the input, and are not concerned with the correctness of their calculations1 . The computation involved in generating the faulty data stream is asymptotically the same irrelevant of “correctness” of output. By this we mean that while minor variations may occur in the computational demands, the average computational effort is roughly the same – the processes simply does not handle the cases differently.

Test Setup on the nanoliab Testing on the nanoliab is a delicate process. Each program must be configured and compiled for the target, uploaded and then executed on the platform before test data can be analysed and compared to known good values. The kernel running on the platform is Linux version 2.6.16, as supplied by liab. For simplicity, usb mass storage and scsi disc support is compiled into the kernel running on the nanoliab. This allows code and data to be placed on an external flash drive where data can be gathered for later analysis on a development machine. The flash drive is mounted in read-write mode. The ftp server, as supplied by liab, is used to move binaries with minor modifications back and forth between development machine and target platform.

13.3

processing requirements

The processing requirements for a drm decoder, in fact for any real-time sound processing device, is very easily understood: decoding of one seconds worth of sound data must be done in less than one second. Of course there may be some initial overhead which delays the proper, real-time decoding but this overhead may only be present during initialisation and channel acquisition. For the purposes of testing we examine the processing requirement of a well known sample input: “Voice of Russia”. 1

118

except when they are in their frequency- and symbol synchronisation states, respectively

Time on dev [s]

Time on nanoliab [s]

Ratio

real

user

system

real

user

system

downmix ofdm match∗ decode†

1.041 0.895 0.218 3.200

0.956 0.696 0.188 1.271

0.004 0.008 0.008 0.450

42.094 25.416 2.637 6.953

40.860 24.104 2.322 6.536

0.761 0.691 0.220 0.108

42.74 34.63 12.35 5.14

Chain‡

1.723

1.588

0.032

73.013

68.917

4.024

43.40

Process

∗

Did not identify start of frame.

†

Could not decode fac, not included in sum.

‡

The chain of pipes from downmix to match. Table 13.1: Benchmarks comparisons between a laptop and the nanoliab. Programs were

run identically on both laptop and nanoliab: <program> -PB3 -i <infile> where the program can be found in the Process column. <infile> is identical on both platforms and an appropriate input file to the program.

The sample from Voice of Russia is 62 seconds long. From this it follows directly that the time allowed for decoding of this stream is less than 62 seconds. From this input file several manipulated versions was prepared: one input file for each program block. For the down-mixer block this was simply the regular input file, for the ofdm symbol extraction block the prepared file was the output of a correct down-mix run. This was repeated throughout the receiver chain. Table 13.1 show the execution times for the processes that was able to run “correctly”. Only the two first values can be used when looking at this table, since these are the only two processes that run in their entirety while actually processing their respective inputs. All times where generated with the time Unix tool that reports resource consumption of a process. The specifics of the development machine mentioned in Table 13.1 is not relevant. It is just used to produce a baseline for processing time for each of the processes. The Ratio column is the ratio of the (user) time spent of the target platform over the (user) time spent on the development machine. The reason for only using the “user” column is that this excludes time when other processes have been scheduled in or the process waited for I/O: it is the time the cpu was dedicated to this process. The time spent in kernel, on behalf of this process, is thus ignored in this comparison. It is listed in the “system” column. The “real” column is the entire duration of the process – including the time it was not scheduled on the processor. As mentioned there are some issues with the processes when they are executed on the target platform. This is also indicated in Table 13.1. downmix and ofdm generate output at expected rates, even though the data is wrong. match never succeeded in finding the start of frame marker by matching the time reference cells – this implies that no equalising is performed, which explains the good performance ratio (12.35). 119

decode never correctly decoded the fac so all symbols were simply discarded after this failed. Since no audio support for payload was developed for nanoliab it was not tested. The last line (Sum) displays the run time of a run where the output of downmix is piped through first ofdm and then match. As it can be seen from Table 13.1 downmix is the slowest process, which is also indicated by the value of the performance ratio â&#x20AC;&#x201C; it is almost identical to that of downmix alone. This suggests that the rest of the processes are simply waiting for output from down-mixer. It can be derived from this table that the target platform is roughly 40 times slower than the development platform on which these experiments was carried out. This value will be used in a little while.

13.4

computational performance of processes

In order to document the performance of the individual processes a test was conducted. Resource consumption was logged for all processes every 0.2 seconds while an input file was being processed. This file was generated with spark [15] and contained only a single audio stream. It can be seen from Figure 13.1 that the demands for cpu time is higher in the first seconds of the receiver chains operation. Why this is the case is explained in Section 13.6. Figure 13.1a shows the cpu time for each process â&#x20AC;&#x201C; how much time the process has been scheduled in for on the cpu. From this graph it is can be seen that the process with the steepest gradient is the process with highest computational load. Clearly this is the demodulation and decoding process. In fact this process consume more than 20% of the cpu time, on average.2 Bear in mind that this is on the development platform. 20% is also consistent with the slope seen in Figure 13.1a. Based on these findings it is obvious to examine decode in order to determine the cause of this higher slope. Table 13.2 reveals that the Viterbi decoding takes nearly 95% of the total running time of the decode process since the first two functions are in effect the entire Viterbi decoding algorithm. It was not unexpected that Viterbi decoding would take large computational resources, but it was still assesed to be less than this. The remaining functions in Table 13.2 is the depuncturing routine and demodulation. Tables similar to Table 13.2 was created for the other processes in the receiver chain. For downmix, ofdm and match the cpu time was primarily used by the fixed point math library or the fft library built on top of this. 2

120

The average cpu percentage in Figure 13.1b is 21.22%.

Accumulated CPU Time [10â&#x2C6;&#x2019;2 s]

Downmix

OFDM

Decode

Match Payload

20 15 10 5 0

100

200

300

400

500

600

700

Time [0.2s]

(a)

Downmix OFDM

Match

Decode Payload

CPU Usage [%]

60 50 40 30 20 10 0 0

40 60

80 100 120 140 160 180 200 Time [0.2s]

(b) Figure 13.1: cpu usage as a function of time. (a) accumulated cpu time per process. (b) Instantaneous cpu occupancy.

121

Name

Time [%]

Cumulative [s]

Self [s]

Calls

single_metric ViterbiDecoder† depuncture eval_softbyte is_gain_ref_cell __divdi3 build_permutation demap_qam

67.38 27.49 1.03 0.72 0.72 0.72 0.51 0.51

6.57 9.25 9.35 9.42 9.49 9.56 9.61 9.66

6.57 2.68 0.10 0.07 0.07 0.07 0.05 0.05

≈ 139 ⋅ 106 703 354 2174986 499851 1109 354

†

renamed for legibility

Table 13.2: The top-most cpu absorbing function in decode. This table was generated using

the gprof profiling tool. All times are approximate. Calls numbers are precise (except when otherwise stated).

Decoder Current impl. Poulsen

Time to decode, td [s]

Content duration, t c [s]

td tc

7.557 5.652

56.238 17.532

0.134 0.322

Table 13.3: Decoding time per second comparison. The current implementation compared to the implementation proposed by Poulsen [30]

13.5

processing performance comparisons

To determine the effectiveness of the implemented receiver we compare the decoding times of a decoder known to work. In this case we use the decoder developed by Poulsen [30]. We do this because it does not produce audio data directly which makes it possible to time the decoding alone. Table 13.3 deserves a few notes. First: the comparison can not be made directly since Ole’s decoder produces data that has already been run through the audio codec, we produce data that have not been run through the aac codec. However tests where all references to the aac codec have been removed from the comparison drm decoder indicate only a roughly 100 ms improvement in performance. Second: naturally we wanted to test up against something a little more robust and commonly used, i.e. dream but due to the design of the dream receiver modules this is not possible without severe modifications3 . By close examination of Table 13.1 and Table 13.3 we can arrive at a very rough estimate for the needed increase in speed necessary for the nanoliab to be able to – 3 Simply ensuring that audio is not generated is not enough: the input file is just read over and over again.

122

3800

Downmix

3600

OFDM

3400

Match Decode

Allocated Memory [kB]

3200

Payload

3000 2800 2600 2400 2200 2000 1800 1600 0

10 20 30 40 50 60 70 80 90 100 Time [0.2s]

Figure 13.2: Memory consumption as a function of time for all processes. This figure shows the total amount of memory allocated for the process.

successfully â&#x20AC;&#x201C; decode an audio stream. Since if the relationship between the performance of the nanoliab and the development machine is 43.40 and the development machine can decode 1 seconds worth of content in 0.134 seconds this implies that the nanoliab should be capable of decoding the same amount of audio in roughly 0.134 Ă&#x2014; 43.40 = 5.82 seconds. Thus a rough estimate is that a six-fold performance increase on the target platform should enable decoding on the by the nanoliab. We would like to emphasise that this estimate is very rough â&#x20AC;&#x201C; the data is extrapolated quite a lot. It is worth noting that the only other fully functional drm decoder we have at out disposal, dream, takes roughly 31 of the CPU time available on the development platform which is well above the 200 MIPS of the nanoliab.

13.6

memory constraints

Having discussed the need for processing power needed to execute the receiver chain in a timely fashion, we now turn our attention to the constraints imposed on memory. 123

From Figure 13.2 it can be seen that most of the developed processes grab a large chunk of memory from the start and do not release it. The memory handling of the decode process is a little different. Here memory is allocated and freed in most decoding steps. This is due to the demands on buffer size being different for different parts of the decoding procedure (see Figure 3.2 for an illustration). As a result of the unique memory handling behaviour of decode we get a glimpse of how the underlying operating system works. In the first five or six seconds of Figure 13.2 the amount of memory reserved for decode is, on average, higher than during the remaining time on the diagram. It is important here to remember that this is a discrete time diagram – the memory has been released and re-acquired between resource measurements. All processes have been tested with Valgrind4 to ensure that memory leaks do not occur – at least when running batch files. The duration tests discussed in Section 11.3 revealed short-comings in the rigour of applying this tool. Aside from this Valgrind has lessened the development and debugging time significantly. Figure 13.2 shows the bounds on memory consumption in the receiver chain – the data amount has been reduced to fit into the plot. During the tests mentioned in Section 11.3 memory utilisation was monitored and it did not exceed the bounds shown here (asymptotically). The total memory consumption for the receiver configurations shown here is less than 15MB all inclusive – containing both code and data segments. Coupling Figure 13.2 with Figure 13.1 it can be seen that in first seconds the receiver chain processes more data than in the rest of the decoding period. This is possible because the input is a file: data can be read and processed as fast as possible – not just at sampling frequency speed. A side effect of this is that at some point the operating system buffers between each process is full. The process producing data (e.g. decode) will then block while trying to write data to the consuming process (payload in this case). Conceptually Unix pipes are shallow, but in fact they are implemented with a certain buffer size. Often this is set to a page size which is 4[kB] in most cases5 .

13.7

summary

Here we have discussed the computational requirements of the developed chain of processes that comprise the receiver. We have shown that a six-fold increase in processing power would, most likely, enable the receiver to run on the nanoliab. Further we have shown that, without comparision, the most resource intensive task in the entire receiver is the Viterbi decoding.

4 5

124

See http://valgrind.org see <linuxsrc>/include/asm-<arch>/page.h for the exact size of a page on your architecture

PART IV SUMMARY

Chapter

DISCUSSION

14.1

introduction

In this chapter we discuss the results proposed previously and we summarise the development of the receiver and comment on it.

14.2

current status

Here we give a status summary of the implemented functionality as of this writing. The implemented set of features is a significant, though not complete, subset of the drm Standard Radio Receiver Profile [6]. The missing parts primarily related to service following.

Signal Configurations Spectrum widths of 4.5, 5, 9 and 10 kHz all work and have been thoroughly and successfully tested. 18 and 20 kHz signals should work with small adaptions to the downmix process, as long as the nyquist interval is respected. All robustness modes work and are identified (mostly) correct.

Decoder All logical channels can be decoded: fac, sd c and msc. Standard mapping works. uep and eep can be applied to the streams. Work on hmsym have been started but is untested. hmmix will not work in the current state â&#x20AC;&#x201C; special care must be taken while decoding content of this type. For sm all constellations have been tested to work: 127

qam-4 (for both sd c and fac), qam-16 (for both sd c and msc) and qam-64 (for msc only). All code rates should work but has not been exhaustively tested. Alternative Frequency Switching has not been considered, though a “Standard Radio Receiver Profile” must conform to it [6].

Payload aac with (and without) parametric stereo can be handled. Audio output is directed to an alsa sink. The text message application embedded in the audio frame can be handled. Data streams consisting of data-units, can be stored directly in files, but it seems that raw data-units are rarely used alone.

14.3

future work

Future work based on the platform proposed, may start by examining the bug that leads to incorrect operation on the nanoliab board – current efforts have it pinned down to somewhere in the fft library and the Viterbi decoder. Having sorted this, the next avenue of improvement may lie in adding modular audio output support, allowing the nanoliab sound api to be supported. Hierarchical modulation should be completed – it is a requirement in the specification. Only very little work remain for hmsym decoding to be correct. Building on top of this, the mixed mode hierarchical modulation should easily be added. Finally an interesting topic would be to investigate how to add support for mode E, the new robustness mode in drm+. This mode is fundamentally different in using 100 kHz channels, no continual pilots, and the fac cells are placed in the entire 100 kHz spectrum. This is just to mention a few examples of why adding mode E to the developed receiver will be a challenge. In Chapter 13 we found that the nanoliab did not provide sufficient processing power for the decoding process to execute in a timely manner. To counter this problem several of these processor boards may be coupled together to form a complete decoder – letting each do only one job in a pipe lined fashion. This should be nearly trivial with the developed architecture. To accomplish this it would only require nf s support in the kernels, running the nanoliabs, thereby enabling the boards to communicate through a network file system.

14.4

perspectives

The future of drm as a broadcasting technology looks bright indeed. Russian authorities have decided to introduce drm on the medium and short wave bands in 128

the beginning of 2009. Further, in 2009 All India Radio decided to expand its drm operations.1 Given the sheer size of the population of these countries we will most likely see a rise in the production of receivers capable of decoding and presenting drm content. Maintaining a network of transmitters used for radio broadcasts is expensive. So by reducing the number of transmitters more funding can go directly to the content production rather than the infrastructure. By using drm (and powerful transmitters) on the radio bands below 30 MHz the coverage can become significantly larger (without too much loss in audio quality) than traditional fm transmissions in the vhf bands. The work presented here offers a new way to approach this problem. By implementing the receiver as a software stack, flexibility is gained. Flexibility to easily combine new blocks with the existing, and to analyse the signals. By allowing broadcaster a larger degree of freedom to distribute the content they wish, drm transforms itself into a truly versatile large-scale distribution platform for digital content. While empowering the radio operators, drm also poses a number of interesting engineering problems, in the design and construction of receivers. We feel that what we have presented here contributes to the exploration of a feasible receiver scheme for drm.

Source: The drm Consortium. http://www.drm.org/press/

129

Chapter

CONCLUSIONS Radio transmissions are nearly omnipresent in everyday life. Whether shopping, driving or camping, there is often a radio within earshot. In this project a drm receiver was developed. The receiver was designed to allow for parallel execution by splitting the major processing blocks into separate processes, that can be scheduled by the under-lying operating system. This design promoted rigorous and simple interfaces between the blocks. A fixed point math library was developed to avoid the use of floating point operations. This was then integrated with an fft library. The developed receiver is able to identify and decode most of the configurations for drm broadcasts. In particular all robustness modes. Both control information and the actual content channel may be recovered from a given drm signal. Nearly all recordings of if streams found on the dream website can be decoded. Selected recordings of if signals have been analysed to show their time and frequency dependant behaviour. Real-time transmissions have been decoded successfully in a variety of ways through the use of the real-time drm modulator, spark. Tests have been conducted on the feasibility of running the receiver on an embedded platform, the nanoliab. It was found that the embedded platform lacked the processing power needed to decode the signal in a timely fashion. The most processing intensive operations in the receiver have been identified. Finally it was also found that the developed receiver matched closely, in terms of features, to the profile of a basic receiver published by the drm consortium. Though the goal of developing a drm compatible software radio on the nanoliab was not achieved, we feel that the work presented here is a success: real-time drm signals can be decoded with a parallel architecture using only fixed point arithmetic. 131

BIBLIOGRAPHY [1]

Jan 2004. File Hierarchy Standard. The Linux Foundation. http://www. pathname.com/fhs/. Cited on p. 65.

[2]

rajar. Oct 2009. rajar Quarterly Summary of Radio Listenin. http://www. rajar.co.uk/docs/2009_09/2009_Q3_Quarterly_Summary_Figures.pdf. Retreived on 2010-01-04. Cited on p. 1.

[3]

Bahai, Ahmad R.S.; Saltzberg, Burton R.; and Bahai, Ahmad. 1999. Multi-Carrier Digital Communications - Theory and Applications of ofdm. Springer. Cited on p. 14.

[4]

Borden, Lance. -. Build a World War II Foxhole Radio. In Electronics Handbook, vol. XVII, p. 47. Via http://www.wnyc.org/files/foxhole_radio.pdf. Cited on p. 4.

[5]

Chang, R. W. 1966. Synthesis of band-limited orthogonal signals for multi-channel data transmission. In Bell System Technical Journal, vol. 46, pp. 1775–1796. Cited on p. 14.

[6]

Consortium, DRM. 09 2009. Digital Radio Receiver Profiles. DRM Consortium. From: http://www.drm.org/uploads/media/DRM_Receiver_Profiles_Final. pdf. Cited on pp. 127 and 128.

[7]

David, O. and Lyandres, V. 2000. Heapmod algorithm for computing the minimum free distance of convolutional codes. In Electrical and ELectronic Engineers in Israel, 2000. The 21st IEEE Convention of the, pp. 435–438. doi:10.1109/EEEI. 2000.924461. Cited on pp. 42 and 43.

[8]

etsi. May 1997. etsi Technical Standard 300 401. Digital Audio Broadcasting. European Telecommunications Standards Institute, 0th edn. From: http:// webapp.etsi.org/workprogram/Report_WorkItem.asp?WKI_ID=3891. Cited on p. 4.

[9]

———. 02 1999. Technical Specification 301 234 Digital Audio Broadcasting (dab); Multimedia Object Transfer (mot) protocol. European Telecommunications Standards Institute. www.lrr.in.tum.de/Par/arch/dab/mpspecs/mot_spec.pdf. Cited on pp. 55 and 99. 133

[10] ———. 09 2001. Technical Specification 101 968 Digital Radio Mondiale; Data applications directory. European Telecommunications Standards Institute. www. drm.org/fileadmin/media/downloads/ETSI_TS_101_968.pdf. Cited on p. 55. [11] ———. 09 2001. Technical Specification 101 980. Digital Radio Mondiale, System Specification. European Telecommunications Standards Institute. http://webapp. etsi.org/workprogram/Report_WorkItem.asp?WKI_ID=12593. Cited on p. 3. [12] ———. 02 2004. Technical Specification 101 968. Digital Radio Mondiale, Data Applications Directory. European Telecommunications Standards Institute, ’1.2.1’ edn. From: http://www.drm.org/uploads/media/spec_17.pdf. Cited on p. 7. [13] ———. 02 2008. European Standard 201 980. Digital Radio Mondiale, System Specification. European Telecommunications Standards Institute. From: http:// webapp.etsi.org/workprogram/Report_WorkItem.asp?WKI_ID=27291. Cited on pp. 5, 6, 7, 38, 39, 53, and 92. [14] ———. 08 2009. European Standard 201 980. Digital Radio Mondiale, System Specification. European Telecommunications Standards Institute, 3rd edn. From: http://webapp.etsi.org/workprogram/Report_WorkItem.asp?WKI_ ID=30464. Cited on p. 20. [15] Feilen, Michael; Schad, Felix; and Steil, Andreas. 2007. Spark drm Transmitter. In . URL http://www.drm-sender.de/. Cited on pp. 105 and 120. [16] Fischer, Volker. 2004. DReaM drm Receiver. In . URL http://www.drm. sourceforge.net/. Cited on pp. 6, 25, 67, 85, 86, and 87. [17] Fischer, Volker and Kurpiers, Alexander. ???? Frequency Synchronization Strategy for a PC-based DRM Receiver. Cited on p. 71. [18] Gersho, A. Jun 1994. Advances in speech and audio compression. In Proceedings of the ieee, vol. 82, no. 6, pp. 900–918. ISSN 0018-9219. doi:10.1109/5.286194. Cited on p. 36. [19] He, Kai and Cauwenberghs, G. 1999. Performance of analog Viterbi decoding. In Circuits and Systems, 1999. 42nd Midwest Symposium on, vol. 1, pp. 2–5 vol. 1. doi:10.1109/MWSCAS.1999.867194. Cited on p. 94. [20] Host, S.; Johannesson, R.; Zigangirov, K.Sh.; and Zyablov, V.V. Mar 1999. Active distances for convolutional codes. In Information Theory, IEEE Transactions on, vol. 45, no. 2, pp. 658–669. ISSN 0018-9448. doi:10.1109/18.749009. Cited on p. 42. [21] Imai, H. and Hirakawa, S. May 1977. A new multilevel coding method using errorcorrecting codes. In Information Theory, ieee Transactions on, vol. 23, no. 3, pp. 371–377. ISSN 0018-9448. Cited on p. 31. 134

[22] iso. 1999. iso C Standard 1999. Tech. rep. URL http://www.open-std.org/jtc1/ sc22/wg14/www/docs/n1124.pdf. ISO/IEC 9899:1999 draft. Cited on p. 66. [23] Jordan, R.; Pavlushkov, V.; and Zyablov, V.V. Oct. 2004. Maximum slope convolutional codes. In Information Theory, IEEE Transactions on, vol. 50, no. 10, pp. 2511–2526. ISSN 0018-9448. doi:10.1109/TIT.2004.834780. Cited on p. 42. [24] Link, Michael W., ed. 2009. How U.S. Adults Use Radio and Other Forms of Audio. The Nielsen Company. http://blog.nielsen.com/nielsenwire/wp-content/ uploads/2009/11/VCM_Radio-Audio_Report_FINAL_29Oct09.pdf, retrieved on 2009-01-04. Cited on p. 1. [25] Martin, P.A. and Taylor, D.P. Nov 2001. On multilevel codes and iterative multistage decoding. In Communications, IEEE Transactions on, vol. 49, no. 11, pp. 1916–1925. ISSN 0090-6778. doi:10.1109/26.966068. Cited on p. 32. [26] Nee, Richard van and Prasad, Ramjee. 2000. ofdm for Wireless Multimedia Communications. Artech House, Inc., Norwood, MA, USA. isbn 0890065306. Cited on pp. 25, 26, 78, 84, and 86. [27] NTIA. October 2003. United States Frequency Allocations. http://www.ntia.doc. gov/osmhome/allochrt.pdf, retrieved on 2009-01-04. Cited on p. 2. [28] Ouchi, Shigeki; Volmat, Alain; Ouchi, Shigeki; and Volmat, Alain. 2004. Linux porting onto a digital camera. In In Linux 2004 Conference. ukuug. Cited on p. 5. [29] Pedersen, Dennis. December 2007. Anvendelsen af Coded Orthogonal FrequencyDivision Multiplexing i Digital Radio Mondiale. Cited on pp. 26, 84, 86, and 111. [30] Poulsen, Ole Gammelgaard. May 2008. Design and Implementation of a DRM Software Defined Radio Receiver in C. Cited on pp. 26, 67, 84, 87, 111, and 122. [31] Press, W. H.; Flannery, B. P.; Teukolsky, S. A.; and Vetterling, W. T. 1992. Numerical Recipes in C. Cambridge University Press, 2nd edn. isbn 052135465X. URL http://www.amazon.com/exec/obidos/redirect?tag= citeulike07-20&path=ASIN/052135465X. From http://www.fizyka.umk.pl/ nrbook/bookcpdf.html. Cited on p. 49. [32] Proakis, John. August 2000. Digital Communications. McGraw-Hill Science/Engineering/Math, 4th edn. isbn 0072321113. URL http://www.amazon.com/exec/ obidos/redirect?tag=citeulike07-20&path=ASIN/0072321113. Cited on p. 33. [33] Raymond, Eric S. 2003. The Art of UNIX Programming. Pearson Education. isbn 0131429019. Cited on p. 62. [34] Sari, H.; Karam, G.; and Jeanclaude, I. Feb 1995. Transmission techniques for digital terrestrial TV broadcasting. In Communications Magazine, ieee, vol. 33, no. 2, pp. 100–109. ISSN 0163-6804. doi:10.1109/35.350382. Cited on p. 14. 135

[35] Spragg, Donald. 2005. drm Transmitter Requirements and Applying drm Modulation to Existing Transmitters. Cited on p. 2. [36] Vidkjær, Jens. Autumn 2007. Class Notes, 31415 RF-Communication Circuits. Tech. rep., Technical University of Denmark - Department of Electrical Engineering. Cited on p. 29. [37] Viterbi, A. Apr 1967. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. In Information Theory, IEEE Transactions on, vol. 13, no. 2, pp. 260–269. ISSN 0018-9448. Cited on p. 44. [38] Woerz, T. and Hagenauer, J. Dec 1992. Iterative decoding for multilevel codes using reliability information. In Global Telecommunications Conference, 1992. Conference Record., GLOBECOM ’92. Communication for Global Users., ieee, pp. 1779–1784 vol.3. doi:10.1109/GLOCOM.1992.276690. Cited on p. 32. [39] Yaghmour, ByKarim; Masters, Jon; Ben-Yossef, Gilad; and Gerum, Philippe. August 2008. Building Embedded Linux Systems. O’Reilly Media, 2nd edn. Cited on p. 5. [40] Zyablov, V. V.; Johannesson, R.; and Pavlushkov, V. A. July 2004. Detecting and Correcting Capabilities of Convolutional Codes. In Problems of Information Transmission, vol. Volume 40, no. 3, pp. 187–194. From http://www.springerlink. com/content/t287333130127361. Cited on p. 42.

136

PART V APPENDICES

Appendix

PRODUCT MANUAL Accompanying this report is a bootable usb memory stick, with a linux distribution, including all the required libraries, some test files and a tool-chain. This means that the product can be tested without installation. For PC make sure that the bios is configured to allow usb boot. For MAC booting from a newer (Intel based) Mac was tested using a live-cd. Several forums report that it should also work for live-usb’s, but this is untested. However, when run from a cd, the “c” key must be pressed immediately after the chime sound. Mac’s also need to have the sound un-muted in a MacOSX session before booting (muting seems to be persistent in the mac-bios, and we found no way of un-muting from the boot-image). For both architectures the instructions are as follows: 1 In the boot-up menu, choose the drm thesis 2009 2 A second boot menu follows hereupon. Choose drm thesis 2009 again. 3 A License agreement appears. Accept it. 4 The graphical interface should now show up and present a menu with options. The options are presented in detail below.

a.1

overview of the contents

When the boot-sequence is done, you should be greeted with a window similar to Figure A.1 Here is a walk through of what the important buttons do: 139

Figure A.1: The main menu that will greet the user

Run drm Decoder (GUI), will ask for an input source (i.e. file or live), and will then run the developed drm decoder and present the output in a nice manner. Analyze drm File, is probably the next-most important tool. It will decode a file with the developed decoder, and then plot the signal at intermediate stages. This tool (at least the core of it) was used to make many of the graphs in this report. Goto drm Decoder directory (CLI), will launch a terminal in the source directory, so the source can be re-build. Read project documentation, is this report. Control RF-frontend, is for controlling the elektor sdr front-end. 2D Spectrum analyzer, is meant for calibrating the elektor sdr front-end. Browse code directory, launch a file-manager to browse the source code. man page for x, the help for process x. Enjoy!

a.2

building the drm decoder

The programs that make up the drm-decoder are already compiled. However, if you feel like playing with the source, or would just like to re-compile it, this can be done by the following steps: Launch a terminal (click on the terminal-icon in the panel). go to the source directory, configure and compile. The commands involved are given below: cd /home/drm/embedded-drm-receiver 140

autoreconf --install ./configure make Look in the shell script debug-run.sh, for information on running the individual processes, or check the man pages: downmix(1), ofdm(1), match(1), decode(1), payload_audio(1) Many processes (mainly for regression test purposes) are placed in the directory /home/drm/embedded-drm-receiver/tests/ including one program named scope that will visualise drm signals. The code is documented and commented using Doxygen markup which means that auto-generated code-documentation can be produced. This is done simply by executing the command doxygen in the code directory. A toolchain for arm cross-compiling is also available on the live-usb, it is used by

a.3

using the drm decoder

A simple gui was made for your convenience, it is shown when the boot has commenced, and allows to configure the drm receiver for either live-input (the soundcard must be fs = 48 kHz) or for file-playback.

a.4

analysing the drm signal

Another gui was devised for tracing the signal path through the receiver. In this mode, the program will not find the mode and bandwidth of the files that are analysed, and it has to be chosen, but the testfiles all use a naming convention that reveals the mode and spectrum

141

Figure A.2: The decoder-gui in its intial state, note that the first letter on each line of the console corresponds to the process that has output that message

Figure A.3: The decoder-gui after it has run for a while

142

Figure A.4: To the sdr-engineers, plotting data is what the oscilloscope is to the rfengineer. This front-end allows to plot the output data of the individual processes.

143

Appendix

SIGNAL CHARACTERISTICS Here we present some more results from the signal processing. This part was excluded from the main report for brevity.

Bouquet Flevo ▶ Mode B, Occupancy 10 kHz ▶ qam Constellations, sdc: 16 states, msc: 16 states

+3,×10−3

magnitude

t (one symbol per line)

+4,×10−3

t (one symbol per line)

+5,×10−3

0, 0 ϕ

+2,×10−3 f

−1, 0 f

Figure B.1: Bouquet Flevo, 2D channel transfer function for 200 symbols

Bouquet Flevo is a Dutch station, apparently broadcasting in English. What must particularly be noticed about this recorded signal is the burst that is seen in the channels transfer function Figure B.1. There is some interference at some point, but it disappears again. As for the constellation, Figure B.2, it looks like it has a decent snr ratio, and that the channel ought to be possible to decode. 145

MSC SDC FAC !Use Time Gain Freq

Figure B.2: Bouquet Flevo, single frame qam constellation after equalising

We note that the channel is indeed possible to decode, with minor outages. Try it yourself on the accompanying usb-stick.

146

Radio Luxembourg ▶ Mode B, Occupancy 10 kHz ▶ qam Constellations, sdc: 16 states, msc: 64 states +1, 2 +0, 8

+3,×10−3

magnitude

+4,×10−3

t (one symbol per line)

+5,×10−3

+0, 4 0, 0

−0, 4 −0, 8 −1, 2 f

−1, 6

Figure B.3: RTL (Radio Luxembourg), 2D channel transfer function for 200 symbols

MSC SDC FAC !Use Time Gain Freq

Figure B.4: RTL (Radio Luxembourg), single frame qam constellation after equalising

Now broadcast by rtl, the old hit-radio station Radio Luxembourg is once again in the ether. This was a station that was very popular in the seventies for broadcasting rock and pop-music to European listeners. We have been able to track down an if recording with this station. http://www. winradio.com/home/g303-drm.htm. We note that the channels transfer function Figure B.3 indicate severe frequency selective fading (maybe not deep fades, but fades nonetheless). Still the audio decodes. 147

RTL - with Slideshow ▶ Mode B, Occupancy 10 kHz

magnitude

+1,×10−2

t (one symbol per line)

▶ qam Constellations, sdc: 16 states, msc: 64 states

+0, 5 ϕ 0, 0

Figure B.5: RTL, 2D channel transfer function for 200 symbols

MSC SDC FAC !Use Time Gain Freq

Figure B.6: RTL, single frame qam constellation after equalising

We are not quite sure, but this broadcast, found on the dream webpage, might also be Radio Luxembourg.This is speculations though, as we do not know the origins (carrier frequency and location) of this file. It simply is some random if recording that claims to be RTL – and RTL is a big network.

148

Deutsche Welle - with News Data ▶ Mode B, Occupancy 10 kHz ▶ qam Constellations, sdc: 4 states, msc: 64 states +9,×10−3 +0, 5

+6,×10−3 +5,×10−3

magnitude

+7,×10−3

t (one symbol per line)

+8,×10−3 0, 0 ϕ −0, 5

+4,×10−3 f

+3,×10−3

−1, 0

Figure B.7: Deutsche Welle, 2D channel transfer function for 200 symbols

MSC SDC FAC !Use Time Gain Freq

Figure B.8: Deutsche Welle, single frame qam constellation after equalisation

We have gathered many batch signals from Deutsche Welle, and what makes this one interesting, is mostly the fact that it carries Journaline news. Whereas all other data streams we have found exclusively contained images. We are not able to process Journaline yet. The qam constellation Figure B.8 seem ok, and the transfer function indicate frequency-selective attenuation Figure B.7. 149

Appendix

DIVISION OF LAB OUR In compliance with the regulations for exams in Denmark, we include a list of responsibility division within the project â&#x20AC;&#x201C; who did what. In general Anders MĂ¸rk-Pedersen did all work relating to downmixing, equalising and payload decoding. Brian Stengaard did all work related to symbol synchronisation and signal decoding. For all work not mentioned here we share the credit (or blame). More specifically, the work division for the main topics that are not shared are as seen in Table C.1, though overlapping of course occurs. As for the source code, the responsibilities are indicated in the header of each individual file.

Credit

Task

Anders Brian Anders

Chapter 2: Orthogonal Frequency Division Multiplexing Chapter 3: Channel Decoding Chapter 4: Content Channels in drm

Anders Brian Anders Brian Anders

Chapter 6: Frequency Acquisition and Signal Down-mixing Chapter 7: ofdm Demultiplexing Chapter 8: Design section Carrier Equalisation and Frame Enumeration Chapter 9: Design section Channel Demodulation and Decoding Chapter 10: Design section Payload Decoding

Table C.1: Responsibilities in relation to this report

151