## **VIRTUDE**

Very High Rate Turbo Decoder with Interleaver in the TTCP



Trademarks "Elecnor Deimos", "Deimos" and the logo of Deimos (Elecnor Group) encompass Elecnor Group's companies of Aerospace, Technology and Information Systems: Deimos Space S.L.U. (Tres Cantos, Madrid, Spain), Deimos En *Deimos Space UK Ltd. (Harwell, Oxford, United Kingdom), Deimos Space S.R.L. (Bucharest, Romania).*

#### **Agenda**



#### **14:00 – Introduction**

- **14:10 – Synchronization and Channel Coding (S&CC) Sublayer**
- **14:35 – Software of the CCSDS S&CC Sublayer (SW4)**
- **14:45 – COTS Platform**
- **14:55 – Breadboard Design**
- **15:20 – Testing Tool**
- **15:25 – Results Overview**
- **15:45 – Q&A**
- **16:00 – Meeting closure**

Trademarks "Elecnor Deimos", "Deimos" and the logo of Deimos (Elecnor Group) encompass Elecnor Group) encompass Elecnor Group's companies of Aerospace, Technology and Information Systems: Deimos Space S.L.U. (Tres Cantos, *Portugal), Deimos Space UK Ltd. (Harwell, Oxford, United Kingdom), Deimos Space S.R.L. (Bucharest, Romania).*

### **Agenda**



#### **14:00 – Introduction**

- **14:10 – Synchronization and Channel Coding (S&CC) Sublayer**
- **14:35 – Software of the CCSDS S&CC Sublayer (SW4)**
- **14:45 – COTS Platform**
- **14:55 – Breadboard Design**
- **15:20 – Testing Tool**
- **15:25 – Results Overview**
- **15:45 – Q&A**
- **16:00 – Meeting closure**





#### • **Project overall objectives**

- Aiming **Envision mission** to transmit data from a **SAR (Synthetic Aperture Radar)**
- Analyse the possible architectures to implement a **Very High Rate (VHR) Turbo Decoder** and respective **test Turbo Encoder** for Turbo rates 1/2, 1/3, 1/4 and 1/6 (information block lengths of 1784, 3568, 7136 and 8920 bits) in order to work in **realtime up to 80 Mbps**  $(75 \text{ Mbps} + 5 \text{ Mbps of margin}).$
- Design and Implementation of the **Channel Interleaver**
- **Implementation** the transmitter and receiver chains in the TTCP platform fulfilling the required performance of **80 Mbps** and **using only the current resources available at TTCP**
- **Validation of the transmitter and receiver chains in the TTCP platform,** all modules running in the single master processing FPGA (**ALTERA Stratix V**) of TTCP.









#### • **Actual schedule**



#### • KO date: 17<sup>th</sup> July 2020

## **Agenda**



#### **14:00 – Introduction**

#### **14:10 – Synchronization and Channel Coding (S&CC) Sublayer**

- **14:35 – Software of the CCSDS S&CC Sublayer (SW4)**
- **14:45 – COTS Platform**
- **14:55 – Breadboard Design**
- **15:20 – Testing Tool**
- **15:25 – Results Overview**
- **15:45 – Q&A**
- **16:00 – Meeting closure**



- VIRTUDE has implemented the TM Synchronization and Channel Coding (S&CC) sub-layer for the Turbo code case
	- □ Transmitter Chain on left and Receiver Chain on right





▪ VIRTUDE includes the Channel Interleaver module (at blue color) which is placed after the Randomizer in order to overcome the burst vulnerability of the Turbo codes

❑ The CCSDS Blue Book may be updated soon with this updated organization



## Encoder and decoder design methodology



- Hierarchical approach:
	- the overall VHDL models are structured as multiple levels of description
	- the highest level reflects the component architecture presented in TN1,
	- the lower levels provide a detailed description of the internal units
- Each single unit has been individually verified, by means of functional simulations and comparisons of input and output patterns against C or Matlab behavioural models
- Integration to verify the correct description of all internal interfaces
- Synthesis:
	- Individual synthesis of the key internal units
	- Overall synthesis to estimate clock period and occupied resources

#### **Reference turbo decoder architecture**



- The architecture is built around *P* parallel SISO decoders that process separate trellis windows of the constituent convolutional encoders of size W.
- **The processors read the LLR values** from the memory storing the channel LLR and update (read/write) the extrinsic information stored in memory EXT.
- **Throughput (approximation)** 
	- $\Box$   $N_{D}$  is the number of full turbo decoders working on separate frames,
	- $\Box$   $f_c$  is the clock speed of each SISO processor, (trellis steps per second),
	- $\Box$  *i* is the required number of iterations per received packet.



#### **Increase the parallelism (***P***)**



- Increase the number of SISO processors *(P)*
- Advantages

 $\Box$  Memory requirements for EXT and LLR do not increase with  $P$ 

 $\Box$  Decoding latency does not increase with P

▪ Disadvantages

 $\Box$  memory collision and routing problem.

• The P SISO processors must access a single memory in two different ways (natural and permuted order) with P parallel data at frequency  $f_c$ 

#### **Permutation decomposition**

- **.** In [Tarable] it has been shown that, for any desired parallelism P, and any desired interleaving law  $\pi$  it is possible to find a collision-free mapping
- In the decomposed architecture a set of P memory banks is accessed in parallel using a set of smaller permutations of size  $\frac{K}{R}$  $\boldsymbol{P}$ through the time varying permutations  $\rho(i)$  of size P on the input parallel data
- **The block denoted by**  $\rho_a(i)$  are then programmable crossbars with size  $P$  routing the data from the SISO processors to the P banks of memory and viceversa



14





## **FER performances K=3568**





## **FER performances K=7136**





#### **FER performances K=8920**





#### **Proposed architectural solution:**



- 16 parallel SISO processors, Window size 56 trellis steps
- **Delayed initialization of forward and backward recursion**
- 2 fully programmable 16 crossbars
- Per block parallelism:

 $\Box N_d$  = 4 and P = 4 for K = 1784,

 $\Box N_d$  = 2 and  $P = 8$  for  $K = 3568$ ,

 $\Box N_d = 1$  and  $P = 16$  for  $K = 7136,8920$ .

- Fixed point representation of quantization bits for LLR: 5 bits (3.2)
- Fixed point representation of quantization bits for EXT: 7 bits (5.2)







From/to SISO memories





- Given the specified throughput of 80 Mbit/s and the target clock frequency of 153 MHz, the encoding unit can proceed by encoding one information bit per cycle.
- No need of double input buffer as module receives 8 unencoded bits per cycle (see next slide)
- Key components:
	- A single input buffer (one M20K, less logic complexity)
	- The encoding unit, which includes the convolution encoder, the additional logic required to handle the termination bits and the puncture
	- A programmable counter to generate the write addresses for storing the received bytes in the buffer.
	- A second programmable counter to generate the in-order read addresses, which are used by the encoding unit to read the frame bits in the natural order
	- A permutation unit that implements the specified interleaving rules and generates the scrambled read addresses



Encoder input buffer:

- ❑ With the byte parallel input interface, a frame is loaded in K/8 cycles.
- ❑ The encoder generates one encoded bit per cycle
- ❑ By using a single input buffer and running sequentially both the buffer loading and the frame encoding, the achievable throughput is

$$
\frac{K}{K\left(1+\frac{1}{8}\right)T_{CK}} = \frac{8}{9}f_{CK}
$$

❑ which is compliant with the throughput specification if the clock frequency is equal to 153.85 MHz





- **The two convolutional encoders (green blocks) receive the in-order and scrambled** sequences from the buffer and generate the four bit coded outputs. After completing the sequence of k symbols, additional four steps are required two terminate the trellis (multiplexers)
- The Coded symbol generation unit (red block) provides the final coded symbols in a packetized way, based on the code rate. The output signal, BYTE\_OUT, is an eight bit value for all supported code rates; however, the meaningful bits are aligned with the least significand positions of the byte



# CCSDS Turbo codes Channel interleaver: Design and Performance results with tumbling spacecraft and solar scintillation channel models

Guido Montorsi, POLITO CCSDS meeting, 28 May 2021





**DET Department of Electronics and Telecommunications**

## Background and contributions



- In CCSDS fall meeting 2017, JPL presented the Stereo-B anomaly, which is a practical example of a "tumbling spacecraft"
- This anomaly put in evidence that the Turbo decoders inherit a vulnerability to burst errors from their constituent convolutional codes
- In a previous contribution we proposed another channel model for considering other physical phenomena affecting the transmission like the fading induced by the solar scintillation
- A well know potential solution to such impairments, is to place a channel interleaver immediately following the turbo encoder
- In this contribution we provide performance results and design conclusions for the row-column interleaver and golden angle interleaver on the two considered channel models



## **TUMBLING SPACECRAFT**

## Tumbling spacecraft

**FIRST EXAMPLE OF OSCILLATIONS**

**OF SNR WITH TUMBLING**

**SPACECRAFT**



**Duration of a codeword:** 25 20 dB-Hz 15 10 5  $\circ$ 16:04 16:06 16:08 16:10 16:12 16:14 16:16 16:18 16:20 16:22 16:24 16:26 UTC-ERT

#### 2016/249 Stereo-B DSS-63 X Signal-to-Noise Ratio

**SECOND EXAMPLE OF OSCILLATIONS OF SNR WITH TUMBLING SPACECRAFT**

2016/239 Stereo-B DS5-43 X-Band Signal-to-Noise Ratio



2022-11-25 CCSDS Meeting - Guido Montorsi - Channel interleaver design 32

## Tumbling spacecraft model



- The two examples show an approximately sinusoidal and deterministic behavior of the SNR (in dB).
- A general model for such behavior is:

$$
SNR(t) = SNR_0 + A \sin\left(\phi + 2\pi t \frac{\alpha}{T_f}\right) [dB],
$$

where:

- $-$  A is the SNR oscillations amplitude in dB,
- $-\alpha$  is the frequency of the SNR oscillations, normalized to the codeword rate (1/ $T_f$ )
- $T_f$  is the codeword duration, and  $\phi$  is an arbitrary phase offset.
- As an example, for first figure,  $A \approx 7$  dB and  $\alpha \approx 1$ , whereas for second,  $A \approx 5$  dB and  $\alpha \approx 3$ .

## Tumbling spacecraft



- The value  $\alpha$  in a general scenario can vary in a range that depends on the relative values of the baud-rate and the oscillation frequency of spacecraft.
- We report the performances as a function of  $\alpha$  for four considered interleaver choices:
	- –no interleaver,
	- –golden angle interleaver,
	- –Two row-column interleavers with properly tuned depth.

## **Tumbling** spacecraft: performances

SNR thresholds  $(SNR_0)$  vs relative periodicity of tumbling spacecraft.

 $R_c = 1/6$ , short codewords, .



**Tumbling** spacecraft: performances

SNR thresholds  $(SNR_0)$  vs relative periodicity of tumbling spacecraft.  $R_c = 1/2$ , long codewords, A=7.





Inter-frame interleaver

## **SOLAR SCINTILLATION MODEL**
# Solar Scintillation



• We proposed a non-deterministic fading channel for assessing the interleaver performance in the presence of solar scintillation phenomena.

$$
r_k = a_k s_k + n_k,
$$

•  $a<sub>k</sub>$  is a Gaussian correlated stationary process with given spectrum and possibly nonzero mean values, yielding a Rician first order statistic, characterized by the rician factor  $K$ .

$$
S_a(f) \propto \frac{1}{1 + \left(\frac{f}{f_0}\right)^2} = \frac{1}{1 + \left(\frac{T_c R_s f}{1 - \frac{T_c}{T_c}}\right)^2}
$$
\nNormalized coherence time

# Comments



In the solar scintillation model, both coherence time and Rician factor are related to the scintillation index *m*. The measured relationship with the coherence time is reported in [RD-2], while the Rician factor can be obtained as:

$$
K = \frac{\sqrt{1 - m^2}}{1 - \sqrt{1 - m^2}}
$$

- Thus, when increasing the scintillation index both the coherence time and the Rician factor decrease, yielding opposite effect on the performance.
- To jointly consider these effects in the next figure we report the SNR thresholds versus the *scintillation index*.
- The black curve reports the values of normalized coherence time  $T_c/T_s$  (right vertical axis). The corresponding Rician factor is indicated in the label. The considered symbol rate in this case is  $R_s = 10$  kbaud.



- Both Rician factor and coherence time varies as function of the scintillation index *m*
- Black curve reports values of coherence time (right vertical axis), while the corresponding Rician factor is reported in the label.
- The baud rate is Rs=10 kbaud



**POLITECNICO DI TORINO** 

# **Coments**



- The performance degradation increases for larger scintillation indexes, that is for smaller values of the coherence time and smaller Rician factors.
- As an example, with *m*=0.95, the Rician factor is *K=*-3.4 dB, and the coherence time around 10. The SNR threshold degradation is 0.4 dB for rate 1/6, 0.6 dB for rate  $1/4$ , 0.7 dB for rate  $1/3$  and 1.0 dB for rate  $\frac{1}{2}$ .
- The degradation is due to the transition from the performance associated to the AWGN channel (high values of K) and that associated to the Rayleigh Fading channel (low values of K).
- In order to assess the impact on performances of the insertion of an interframe interleaver, we focus on the case with highest loss, with R=1/2, K=1784*.*
- In next figure we show the SNR thresholds vs the baud rate for two high values of the scintillation index, corresponding to low Rician factors

### SNR thresholds vs baud rate

for m=0.80 and m=0.95.

 $R_c = 1/2, K = 1784$ 



# Comments



- Increasing the baud rate increases the normalized coherence time  $R_s T_c$  and consequently reduces the effective codelength.
- The reduction of the effective codelength in turns degrades the slope of the FER performance curve and consequently the threshold.
- The figure shows that performance degradation occurs only when  $R_s > 10$  kbaud.
- The interleaver depth D has the effect of reducing the normalized coherence time
	- the plot allows to quantify the threshold gain introduced by the insertion of an inter-frame interleaver.
	- The gain is obtained by comparing the threshold at a given rate  $R_s$  with that at  $R_s/D$
- For example, an interleaver depth  $D=10$ , with scintillation index 0.95, yields a 1.4 dB gain at baud rate 1Mbaud, 0.9 dB at 100 kbaud, 0.2 dB at 10 kbaud and negligible gain at symbol rate below 1kbaud. These gains would increase considering smaller FER values for the SNR threshold definition.
- Considering that the typical mission baud rates are below 10kbaud and that the considered case is the worst case (small block size, large code rate and large scintillation index), the adoption of an interframe interleaver is then not reccommended

# Summary of conclusions



- To summarize, the recommendation of this contribution are the following
- **The adoption of an intra-frame interleaver is strongly recommended**.
	- It has no impact of memory and latency of the system and provides vary large performance improvements in some realistic scenario.
- **Row-column interleaver is mildly recommended**.
	- It provides similar performances as the "golden angle" with a slight improvement in the permutation representation and memory access for the implementation.
- **The adoption of an inter-frame interleaver is not recommended.**
	- It has impact of memory and latency of the system and provides performance improvements in scenarios corresponding to large baud rates and/or low SNR oscillation frequencies, which are not considered relevant at the moment with the current assumptions

# Interleaver specifications (proposal of standard modification)



- At the encoder side, the bits produced by the turbo encoder of rate  $Rc = 1/n$ , with  $n = 2, 3, 4, 6$  are written on the interleaver rows of size  $Nc = 4n = 8$ , 12, 16, 24. Each row thus contains four consecutive outputs of the two constituent encoders of the turbo encoder.
- The bits are then read out from the interleaver column by column. The column length  $Nr$  depends on the codeword length as  $Nr = N/ Nc$ .



- **Rationale**: to have  $p = 4$  consecutive trellis steps of the constituent encoders affected by fading amplitudes that are spread in time "as much as possible". This ensures maximum time diversity on the short error events for both constituent encoders. The choice of 4 is well matched to the numerology of the standard as 4 is a divisor for all values of  $N$   $n = (1788, 3572, 7140, 8924)$ .
- Values larger than 4 for  $p$  were also considered at the design stage but were found to provide negligible gains wrt this simpler solution.



# **Agenda**



#### **14:00 – Introduction**

#### **14:10 – Synchronization and Channel Coding (S&CC) Sublayer**

#### **14:35 – Software of the CCSDS S&CC Sublayer (SW4)**

- **14:45 – COTS Platform**
- **14:55 – Breadboard Design**
- **15:20 – Testing Tool**
- **15:25 – Results Overview**
- **15:45 – Q&A**
- **16:00 – Meeting closure**

#### **SW4 Description**

#### *Functionality*

The delivered "**SW4" delivered on 28th July 2021** contains the self-contained software for the simulation and the text vector generation of the full VIRTUDE system

#### TX

- ❑ Turbo encoding
- ❑ Intra and inter frame interleaving
- ❑ Randomizer
- ❑ Modulation

#### Channel

- ❑ AWGN or two models of fading channel
- RX
- ❑ Demodulation
- ❑ Frame synchronization
- ❑ Derandomizer
- ❑ Deinterleaving
- ❑ Turbo decoding

A target FER can be set. An SNR loop in steps of 0.1 dB ends when the measured FER falls below the target FER

If desired, by uncommenting a section of code in the main program, the user can write test vectors taken from encoder sections in an output file

The project CCSDS\_Sync can be used to check the performance of frame synchronizer

elecnor group

**POLITECNICO DI TORINO** 



*File and Folder structure*

**CCSDS\_Full**: Main files and input text file for the simulation and BER statistic of the full VIRTUDE system

 $\Box$ The MSVC++ solution "CCSDS\_Full.sln" can be used to build all executables □ "test\_CCSDS\_PCCC.cpp" is the main file

❑"input\_PCCC.txt" is the input file for configuring the simulation parameters

**CCSDS\_Sync**: Main file and input text file for the simulation and performance analysis of the frame synchronizer

The project can be accessed from the previous solution CCSDS Full.sln

**Classes**: All TOPCOM++ classes (declarations and definitions) used by the programs

Output results are stored in the "Data" subfolders

# **Input file (input\_PCCC.txt)**





#### Structure of main simulation loop



Trademarks "Elecnor Deimos", "Deimos" and the logo of Deimos (Elecnor Group) encompass Elecnor Group's companies of Aerospace, Technology and Information Systems S.D.U. (Tres Cantos, Madrid, Spain), Deimos Engineering and



Trademarks "Elecnor Deimos", "Deimos" and the logo of Deimos (Elecnor Group) encompass Elecnor Group's companies of Aerospace, Technology and Information Systems S.D.U. (Tres Cantos, Madrid, Spain), Deimos Engineering and



Trademarks "Elecnor Deimos", "Deimos" and the logo of Deimos (Elecnor Group) encompass Elecnor Group's companies of Aerospace, Technology and Information Systems S.D.U. (Tres Cantos, Madrid, Spain), Deimos Engineering and

# **Agenda**



#### **14:00 – Introduction**

- **14:10 – Synchronization and Channel Coding (S&CC) Sublayer**
- **14:35 – Software of the CCSDS S&CC Sublayer (SW4)**

#### **14:45 – COTS Platform**

- **14:55 – Breadboard Design**
- **15:20 – Testing Tool**
- **15:25 – Results Overview**
- **15:45 – Q&A**
- **16:00 – Meeting closure**



▪ Platforms available at the TTCP:

❑ CPU

❑ ARM based FPGA

❑ FPGA

**.** Integrated in two types of units/devices: □ Data Processing Unit (DPU)

❑ Signal Processing Modules (SPM)

▪ VIRTUDE modules will be integrated in the ALTERA Stratix V 5SGXAB FPGA within the SPM of the SPU with constrained resources:





### **Interface Requirements**



The breadboard shall receive soft -symbols at the maximum symbol rate of 320 MSym/s:

- ❑ Corresponds to a useful data rate of 80 Mbps at rate of 1/4 or 53 Mbps at rate of 1/6
- ❑ Thus, the input interface needs to support a bit rate of 2.56 Gbps (each symbol is 8 bits)
- ❑ Hence, the 10 Gb Ethernet interface is a suitable interface for the *input interface!*
- **Typical Ethernet interfaces could** be used either for the **output interface** or for the **monitoring and configuration interface**





### **Output interface – TTCP-ICD-TM**



- The application frame has 2 possible formats for:
	- **Data Message**: includes the decoded frame alongside other header fields;
	- **Heartbeat Message:** sent periodically when no data is outputted to keep the connection active.



#### Symbol duration Timecode



- **Timecode**: reception time of first sample in UDP frame
- The length of the soft-symbol frame is not fixed!

**Input Inteface – Protocols**

#### • **Transfer Frames**

- For VIRTUDE, the FECF field is mandatory!
- TM Transfer Frame: 223, 446, 892 or 1115 bytes

**POLITECNICO DI TORINO** 



elecnor group

# **Selected COTS for the Project (1/2)**



- The "Stratix V GX FPGA Development Kit" was the COTS chosen as it includes external memory
	- Foreseen in case there was a lack of M20K modules for the Interleaver but ended up not being required!
- This COTS has a 40 Gbps (QSFP+) and a 10/100/1000 Mbps (RJ45) Ethernet interfaces
	- The 40 Gb interface is split in 4x 10 Gbps Ethernet interfaces using an adapter cable from QSFP+ to 4x SFP+





# **Selected COTS for the Project (2/2)**



- The 3 physical interfaces between the selected COTS and the external computer then are:
	- 10 Gbps Input interface (either to receive Transfer Frames or Receive Symbols);
	- 10 Gbps Output interface (Decoded Transfer Frames or Heartbeat Messages);
	- 10/100/1000 Mbps Monitoring and Configuration (M&C) interface.
- The M&C interface was implemented at 1 Gbps due to a limitation of the IP core used in the project
	- The output interface did not require 10 Gbps but the RJ-45 was already selected for the M&C interface



# **Agenda**



#### **14:00 – Introduction**

- **14:10 – Synchronization and Channel Coding (S&CC) Sublayer**
- **14:35 – Software of the CCSDS S&CC Sublayer (SW4)**
- **14:45 – COTS Platform**

#### **14:55 – Breadboard Design**

- **15:20 – Testing Tool**
- **15:25 – Results Overview**
- **15:45 – Q&A**
- **16:00 – Meeting closure**

#### **System Decomposition**

- **Breadboard:**
	- **External Interface**
	- Transmitter Chain
	- **Receiver Chain**
	- Auxiliary Modules

#### • **Testing Tool**

- Runs in external computer
- Injects Transfer Frames or Soft-Symbols
- Receives Decoded Transfer Frames
- Measures BER/FER and throughput



# **Data Flow and Throughput**

- Target bit rate of 80 Mbps
	- 320 MSps for the code rate of 1/4
	- 318 MSps for the code rate of 1/6
- Keep up with 136.75 Mbps from Encoder
	- 6 XOR in parallel in Randomizer
	- Write 2/3/4/6 bits in parallel in Interleaver
- Keep up x8 parallelism for Turbo Decoder
	- Read 8 bits from Interleaver concurrently
	- Add noise to 8 symbols in Channel Emulator
	- Write/read 8 symbols in parallel in De-Interleaver
	- Apply Boolean logic to 8 symbols in De-Randomizer
- Decoder outputs 1 byte per clock cycle
	- Validate FECF byte by byte



# **External Interfaces – IP Cores Evaluation**



- Evaluated open-source MAC IP cores for both 10G and 1G Ethernet Interfaces
	- The one selected comes from the same source due to its permissive license and includes an UDP stack
- The MAC IP core chosen for the M&C interface only supports 1Gb connection



#### **10 Gb Interface 1 Gb Interface**

#### **External Interfaces – Message Flow**



- The 3 interfaces use the UDP protocol
	- The Input and Output Interfaces have no flow control
	- The M&C Interface adopted user-defined acknowledgment messages to ensure the configurations are properly set and to receive the results of the Unit Tests



Monitoring



#### **M&C Interface**

**I** Monitoring

#### **Transmitter Chain – Turbo Encoder**

- **POLITECNICO** DI TORINO elecnor group
- First module of the transmitter chain based on a combination of two convolutional encoders
- For each received information bit, it generates the parity symbols outputting 2, 3, 4 or 6 symbols per clock cycle for the code rates of 1/2, 1/3, 1/4 and 1/6



#### **Block Diagram Actual Implementation**



DEG-CMS-SUPSC03-PRE-12-E © Copyright DEIMOS

#### **Transmitter Chain – Randomizer**



- Ensures sufficient bit transitions to keep receiver symbol synchronizers in lock by applying an exclusive-OR to each bit of the codeword with a standard pseudo-random sequence
	- This sequence is fixed, repeats each 255 bits and is generated by the following polynomial:

 $h(x) = x^8 + x^7 + x^5 + x^3 + 1$ 

- The implementation is parallelized to allow the computation of all input bits concurrently
	- 6 XORs to apply the operation to all 6 bits, when required

DEG-CMS-SUPSC03-PRE-12-E

- The ROM stores the pseudo-random sequence and the shift register aligns its data based on the code rate
- The multiplexer allows to bypass the module if the randomization is not desired!



#### **Transmitter Chain – Channel Interleaver**



- Writes in rows and reads in columns.
- **Approach**: Store each Interleaver row in a different memory position
	- 1 symbol from 8 consecutive rows are read concurrently  $\rightarrow$  Data must be stored in 8 memories!
- The **writing process** is simple: data is joined for 4 cycles in a register before being actually written in the correspondent memory
- The **reading process** is mostly straightforward: the same address position and bit within that position of all 8 memories is being read at the same clock cycle
	- However, there is an alignment issue at the end of each column as the number of rows is never divisible by 8 (447, 893, 1785 or 2231)
	- **Solution**: Use a valid signal for each of the 8 output bits (i.e., use a 8-bit valid signal) which is then handled by the Align Module



#### **Transmitter Chain: ASM**



- Developed jointly with the Interleaver module
- State machine for reading part has initial state for first outputting ASM
- Number of clock cycles to read data (ASM + encoded bits) must be equal or less than for writing. This is guaranteed as:
	- Parallelism of writing is 2/3/4/6
	- Parallelism of reading is always 8



### **Transmitter Chain: Align Module**



- Solves the alignment issue when reading data from interleaver
- Ensures all or none outputted bits are valid in a given clock cycle:
	- If all 8-bits are valid, the valid output signal is asserted ('1')
	- Otherwise, it deasserts the valid output signal ('0') and stores the valid bits in a register
	- On the next clock cycle, the previous valid bits are joined with the current valid bits until forming again 8 valid bits
	- The spare valid bits are then joined with the ones on the following clock cycle and so on





# Adds digital noise to the incoming bits and outputs 8-bit soft symbols

- The noise (N) corresponds to an AWGN dependent on the chosen Es/No
- Being X is the incoming bit, the output Y is:  $\Box$
- The receiver chain expects the LLR value of Y:
- For each LLR value:
	- 2 multiplications and 1 addition
- The AWGN IP core generates two independent AWGN signals
	- LLRs are then calculated in pairs
- Parallelism level of 8 is required
	- Each pair is instantiated 4 times!
- The second term of both multiplications are pre-processed and stored in the same ROM



 $LLR_V = Y *$ 



And,  $\sigma =$ 



 $Es/N0$  $10$ 



#### **Proposed Frame Synchronizer Algorithm**



- [AD-2] specifies different possible metrics for the Frame Synchronizer algorithm.
- **•** The proposed Frame Synchronizer algorithm uses the approximated optimal Massey metric providing the best complexity-performance trade off (see [AD-2] for details).
- **EXTED THE SHIP IS SURVE ACQUISITION phase**, the metrics of all candidate positions are accumulated at each frame. Acquisition is declared when the difference between the highest metric and second one exceeds a threshold computed as

$$
T_h = S\sqrt{n} \cdot 2 \sqrt{\frac{M}{\sigma^2}} \text{erfc}^{-1} \left(\frac{2 \cdot 10^{-1}}{F}\right)
$$

- **•** Where n is the number of accumulated metrics (received frames), M the ASM length F the total frame length and  $\sigma^2$  the noise variance.
- **•** During the **sync tracking phase**, the running average of the last P metrics is updated for the best position (estimated sync) and the two adjacent ones.
- A sync loss is declared if the metric relative to the estimated sync is worse than one of the other two.
- **•** Parameters  $P$  and  $S$  of the adopted procedure are designed to guarantee a desired maximal False Lock Probability (FLP) and False Sync Loss Probability (FSLP) and a high True Sync Loss Probability (TSLP) when insertion or deletions occurs.
- **•** Conclusions on the design of optimal parameters  $P$  and  $S$  of the adopted algorithm are provided in the next slide.

# **Design of parameters** *S* **and** *P*



▪ We performed extensive simulation campaign to design the optimal values for the parameters *P* and S

❑ Results reported in TN3

- In the following we report the conclusions
- **S=0.4**, guarantees a false lock probability below 1e-8, and an average lock time below one frame at the threshold of the code
	- $\Box$  While in principle the correspondent threshold  $T_h$  depends on rate, code length and SNR, we propose to use a different value for each rate and size, while fixing the SNR to that corresponding to the code thresholds. This yields a table of 16 entries for  $T_h$  (x.3 fixed point representation of LLR)



**P=2** guarantees an average false sync loss probability below 1e-8 at the SNR corresponding to the code threshold and an average of two frame for reacting to a true sync loss

# **Frame Synchronizer: Acquisition**



- The frame synchronization architecture is divided in two parts: acquisition and tracking
	- The goal of the acquisition phase is to obtain the two maximum correlation results (out of the total number of correlation results of a given frame) and to check if their difference is higher than the threshold
	- 4 correlations windows (one for each ASM pattern) are computed for each position in the symbol frame
	- Many slices are cascaded in a pipeline fashion so that several correlations can be performed in parallel


# **Frame Synchronizer: Input FIFO**

- To match the storage and computation time (for acquisition) of the frame synchronizer:
	- 192 slices would be needed, however, there are enough resources to allocate that many
- For the acquisition phase, the correlations results need to be accumulated
	- Decided to use 64 slices (good trade-off between the memory requirements and the acceleration of the computation of the frame acquisition phase at the expense of consumption of ALMs)
- To avoid loosing frames, an input FIFO is required:
	- Stores a maximum of 3 of the biggest symbol frames









# **Frame Synchronizer: Tracking**



- In the tracking phase, 3 correlations are calculated (for the ASM position and its right and left adjacent)
	- Only the correlations results from the last 2 frames need to be accumulated
- New slice architecture to calculate correlations for tracking:
	- Keep level of parallelism of 8 and 2 pipeline stages
	- Each slice requires only about 55 ALMs => 3 slices are required in total!
- The synchronization is kept as long as the 2-frame accumulated correlations for the ASM position is greater than both adjacent ones



#### **Receiver Chain: Phase Error Correction**



- Receives 2-bit "rot" signal from Synchronizer
- The constellation uses 2 bits per symbol:  $\Box$  If phase error is 0<sup>o</sup> -> I and Q in phase  $\Box$  If phase error is 90° -> IR = -QT and QR = IT
- Implementation design includes:
	- ❑ 8 signal inverters to compute 8 symbols in parallel  $\Box$  Multiplexer to choose order of output "IQ" samples
	- $\Box$  Inverter and MUX are controlled by "rot" signal









#### **Receiver Chain: De-Align Module**



- Goal: data to be stored in the same order as it is stored in the Interleaver;
- The last data of one column can be received alongside the initial data of the next column
	- Jeopardizes the integrity of the order of the data stored in the de-interleaver memories;
	- Solution: last data from one column is sent in the current clock cycle, storing the remaining data in a FIFO alike register.
	- In the next clock cycle, the previous stored data is joined with the data now being received until forming again an 8-symbol bus;
	- The remaining symbols are kept in the register to be sent in the next clock cycle and so on.



# **Receiver Chain: De-Interleaver**

MEM<sub>1</sub>

MEM<sub>2</sub> MEM<sub>3</sub> MEM<sub>4</sub> MEM<sub>5</sub> MEM<sub>6</sub> MEM<sub>7</sub> MEM<sub>8</sub> MEM 9



elecnor group



- **Same Approach:** Store each De-Interleaver row in a different memory position
	- **Advantage:** data comes already well-aligned
	- At least 8 memories are needed to write in 8 different rows in the same clock cycle  $\rightarrow$  9 memories used to save M20Ks
- The **writing process** is more complex as there are 9 memories: there is one shift register than controls the write enable of each memory as well as the address counter of each memory
- The **reading process** is simple for the code rates of 1/2, 1/4 and 1/6 where only one memory is read per clock cycle. For 1/3, from time to time 2 memories are read on the same clock cycle:







DEG-CMS-SUPSC03-PRE-12-E

#### **Receiver Chain: De-Randomizer**



- Uses the same 255-bit pseudo-random sequence as the Randomizer. Two main differences:
	- $\Box$  Single level of parallelism of 8 -> ROM is configured as 255x8 (1 M20K)
	- Performs signal inversion operation rather than XOR operation
- The implementation is quite straightforward:
	- 8 signal inversion modules to ensure the proper parallelism

❑ Output multiplexer that allows to bypass this module if randomization is not desired



#### **Receiver Chain: Turbo Decoder**



- Most complex module of the design
	- Receives the codewords and outputs the Decoded Transfer Frames
- Achievable throughputs:



### **Receiver Chain: FECF Validation**



- The FECF validation is performed at the end of the Receiver Chain
	- The registers (S<sub>15</sub> to S<sub>0</sub>) are updated for each bit injected from the Decoded Transfer Frame
	- After injecting the last bit, the value of the registers will be zero if no errors are detected.
- By default, the diagram receives 1 bit and performs 3 XOR operations per clock cycle
	- Design was parallelized to receive 1 byte and perform 24 XOR operations per clock cycle!





- **Configuration module:** receives the parameters from the Testing Tool and stores them in registers which are accessed by all the other modules in the design
- **Monitoring module:** receives the status parameters from the other modules in the FPGA and provides periodically the status of the breadboard to the external computer via the M&C interface
- **Test data:** allows to test internally both Transmitter and Receiver Chains of the breadboard by making use of predefined sequences of data stored in local memory without using the external 10Gb interfaces. At the end of the Receiver Chain is verified if the decoded sequence is equal to the transmitted data
	- The same data pattern can be sent several times to the transmission chain, which then inserts randomness via the AWGN added in the channel emulator
	- Decided to use only one 1 M20K to save resources -> two different test vectors are stored



### **FPGA Final Resources**

- Despite the available resources at ALTERA Stratix V for the VIRTUDE project are limited, the final FPGA implementation shows that less than 80% of the available ALMs and M20Ks are used:
	- ❑ About 80% for the ALMs;
	- ❑ About 79% for the M20Ks;
	- ❑ About 7% for the DSPs.
- All breadboard modules will be implemented using only FPGA resources:
	- □ Previous possibility of using external memory for the Interleaver/De-Interleaver to release M20K blocks was not needed!



# **Agenda**



#### **14:00 – Introduction**

- **14:10 – Synchronization and Channel Coding (S&CC) Sublayer**
- **14:35 – Software of the CCSDS S&CC Sublayer (SW4)**
- **14:45 – COTS Platform**
- **14:55 – Breadboard Design**
- **15:20 – Testing Tool**
- **15:25 – Results Overview**
- **15:45 – Q&A**
- **16:00 – Meeting closure**



# **Testing Tool: Software Pre-requisites**



- The Testing Tool was developed and tested in a Debian GNU/Linux 9 (Stretch) machine
	- Other Linux distributions such as Ubuntu should be compatible
- The generation of Transfer Frames and Soft-Symbols is slow in the Testing Tool
	- To overcome this limitation, data is injected at the target bit rate in bursts of 1000 frames
	- Synchronization is required between the Input and Output Interfaces via *mkfifos*
- The **wondershaper** Linux tool is used to limit the speed of input interface to achieve the configured bit rate









 $\odot$  Copyright DEIMOS

# **Testing Tool: Procedure (1/2)**



1) Edit the configuration file "*VIRTUDE\_Configurations.txt"* with the desired configurations

The Tool includes example configurations files for testing Transfer Frames, Soft-Symbols, Heartbeat Messages, BPSK modulation, etc.

Any of this file can be renamed to "*VIRTUDE\_Configurations.txt"* to be use in the Testing Tool

2) Run "*Virtude\_TT conf"* to send a new configuration to the Breadboard

In this example, it is configured to send Soft-Symbols generated with the maximum Es/No

The monitoring messages are also enabled

dcpp@nereida:~/virtude/TestingTools/bin\$ ./VIRTUDE\_TT conf Arguments obtained: conf

dcpp@nereida:~/virtude/TestingTools/bin\$

Client will try to reach server on IP address 192.168.1.13 and port number 21006 Configurations selected: lessage length: 58 Message type: 1 Operating Mode: 1 Input Type: 1 Bi**t Rate: 80000000** Code Rate: 2 TF length: 0 **Randomizer: 1** interleaver: 1 emulator: 0 Es/No: 140 Monitor period: 1000 eartbeat period: 0 Input Remote IP: 192.168.2.1 Input Local IP: 192.168.2.100 Input Local Port Number: 21001 Output Local IP: 192.168.3.101 Output Local Port Number: 21003 Output Remote IP: 192.168.3.2 Output Remote Port Number: 21004 Monitor Local IP: 192.168.1.13 lonitor Local Port Number: 21006 lonitor Remote IP: 192.168.1.102 Monitor Remote Port Number: 21005 FPGA MAC: 0:7:237:29:20:69 Modulation Type: 0 Created server on IP address 192.168.1.102 and port number 21005 Configuration message sent  $\star$ Response received: 0x3a 0x11 0x01 0x01 0x04 0xc4 0xb4 0x00 0x02 0x00 0x01 0x01 x00 0x8c 0x03 0xe8 0x00 0x00 0xc0 0xa8 0x02 0x01 0xc0 0xa8 0x02 0x64 0x52 0x09 xc0 0xa8 0x03 0x65 0x52 0x0b 0xc0 0xa8 0x03 0x02 0x52 0x0c 0xc0 0xa8 0x01 0x0d 0 x52 0x0e 0xc0 0xa8 0x01 0x66 0x52 0x0d 0x00 0x07 0xed 0x1d 0x14 0x45 0x00 Breadboard confirms new configurations \* Configuration Interface closing...

# **Testing Tool: Procedure (2/2)**



3) Run "*VIRTUDE\_TT mon"* to open the monitoring interface (in case it is enabled). The status parameters of the number of UDP frames received and of decoded Transfer Frame increase after the input interface is executed

dcpp@nereida:~/virtude/TestingTools/bin\$ ./VIRTUDE\_TT mon Arguments obtained: mon Reading configurations from file... Created server on IP address 192.168.1.102 and port number 21005 Timestamp since last configuration: 1000 ms Number of received frames with wrong CRC: 0 Number of received frames with wrong length: 0 Latency of the turbo decoder: 70841 ns Number of Transfer Frames received since last configuration: 0 Number of UDP frames received since last configuration: 0 Number of decoded Transfer Frames since last configuration: 0 Number of decoded Transfer Frames with errors since last configuration: 0

…

Timestamp since last configuration: 6000 ms Number of received frames with wrong CRC: 0 Number of received frames with wrong length: 0 Latency of the turbo decoder: 70841 ns Number of Transfer Frames received since last configuration: 0 Number of UDP frames received since last configuration: 891 Number of decoded Transfer Frames since last configuration: 1000 Number of decoded Transfer Frames with errors since last configuration: 0 

4) Run "*VIRTUDE\_TT out"* to open the output interface. Data starts being received after input interface is run

dcpp@nereida:~/virtude/TestingTools/bin\$ ./VIRTUDE TT out Arguments obtained: Save received decoded Transfer Frames: Yes Reading configurations from file... Server created on IP 192.168.3.2 and port number 21004 Waiting for incoming messages.. … dcpp@nereida:~/virtude/TestingTools/bin\$ ./VIRTUDE TT out Arguments obtained: Save received decoded Transfer Frames: Yes Reading configurations from file... Server created on IP 192.168.3.2 and port number 21004 Waiting for incoming messages... --> First Frame after acquisition! (lost 0 frames before) Received 1 notification from Input Interface Saving 303000 bytes on the file, which correspond to 1000 TFs. Getting input TF from file... ................  $RFR:6$ FER: 0 \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*\* cummulative average BER: 0 cummulative average FER: 0 time between first decoded Transfer Frame and last received: 22139.7us throughput (useful output bit rate) calculated (223000 bytes): 80.5791 Mbps cummulative average throughput (useful output bit rate): 80.5791 Mbps DLC value of last decoded Transfer Frame: 0

Notified Input Interface for the 1 time

\*\*\* Final Stats \*\*\* cummulative average BER: 0 cummulative average FER: 0 <u>cummulative average throughput (useful output bit rate): 80.5791 Mbps</u>

Average lock time (number of frames lost per acquisition): 0 Total number of frames received: 1000

DEG-CMS-SUPSC03-PRE-12-E © Copyright DEIMOS dcpp@nereida:~/virtude/TestingTools/bin\$

5) Run "*VIRTUDE\_TT in –n 1000"*  to send 1000 Soft-Symbols via the input interface



e between first UDP frame (carrying SSs) and last sent: 22204.1us throughput calculated (7280000 bytes): 2622.93 Mbps useful input bit rate: 81.9667 Mbps Notified Output Interface for the 1 time Received 1 notification from Output Interface dcpp@nereida:~/virtude/TestingTools/bin\$ |

# **Testing Tool: Unit Test**



1) Edit the configuration file "*VIRTUDE\_UT.txt"* with the desired unit test configurations

2) Run "*Virtude\_TT ut"* to send a UT request and to wait until receiving the UT response

When there are no decoding errors (high Es/No), the Tool shows the message "PASSED"

When there are decoding errors (low Es/No), the Tool show the message "FAILED" and reports the BER/FER metrics



# **Agenda**



#### **14:00 – Introduction**

- **14:10 – Synchronization and Channel Coding (S&CC) Sublayer**
- **14:35 – Software of the CCSDS S&CC Sublayer (SW4)**
- **14:45 – COTS Platform**
- **14:55 – Breadboard Design**
- **15:20 – Testing Tool**
- **15:25 – Results Overview**
- **15:45 – Q&A**
- **16:00 – Meeting closure**

#### **Breadboard Unit Tests**







### **Breadboard Performance Tests**











(External input,

De-interleaver

(according to

**TTCP/IFMS TM Transfer** ICD [AD-3])





 $1/2$ 

 $1/3$ 

 $1/4$ 

 $1/6$ 

8920

80Mbps

53Mbps

#### **Performance: Turbo Decoder**



• **PeR-2:** The technical loss of the Turbo decoder shall be less than 0.16 dB with respect to the reference figures of the Turbo decoders in [AD-2]









#### **Performance: Frame Synchronizer (QPSK)**



- Average number of frames lost during acquisition -> compared with simulation results
- Proved that the performance of Frame Synchronizer is superior than the Decoder



#### **Performance: Frame Synchronizer (BPSK)**



- Average number of frames lost during acquisition -> compared with QPSK results
- Proved that the performance of Frame Synchronizer is superior than the Decoder



#### **Performance: Channel Interleaver**



• **Gain between 2 and 5 dB** on the SNR threshold for the FER of 1e-3 when using the Channel Interleaver on the Tumbling Spacecraft scenario for normalized oscillation frequencies above 1!







- All breadboard implementation only uses the FPGA (i.e., without using external memories or any external co-processor, as had been identified at the proposal stage)
- The channel interleaver was assessed and designed considering "tumbling spacecraft" and "solar scintillation" channel models
	- The row-column interleaver was successfully implemented and validated
- The Testing Tool (external tool) running on an external computer was developed to configure, monitor and evaluate the breadboard performance
- Main results:
	- The complete breadboard implementation supports the useful data rate of **80 Mbps**
	- Turbo Decoder complies with the maximum allowable implementation loss of **0.16 dB**
	- Channel Interleaver results are in line with the simulations allowing a gain between **2 and 5 dB** for the Tumbling Spacecraft model
- The project is ready to be integrated at TTCP

# **Agenda**



#### **14:00 – Introduction**

- **14:10 – Synchronization and Channel Coding (S&CC) Sublayer**
- **14:35 – Software of the CCSDS S&CC Sublayer (SW4)**
- **14:45 – COTS Platform**
- **14:55 – Breadboard Design**
- **15:20 – Testing Tool**
- **15:25 – Results Overview**

#### **15:45 – Q&A**

**16:00 – Meeting closure**



# **Thank you**

www.elecnor-deimos.com

Trademarks "Elecnor Deimos", "Deimos" and the logo of Deimos (Elecnor Group) encompass Elecnor Group's companies of Aerospace, Technology and Information Systems S.D.U. (Tres Cantos, Madrid, Spain), Deimos Engineering and

DEG-CMS-SUPSC03-PRE-12-E © Copyright DEIMOS