Prototypage Rapide et Génération de Code pour DSP Multi-Coeurs Appliqués à la Couche Physique des Stations de Base 3GPP LTE - PDF

Prototypage Rapide et Génération de Code pour DSP Multi-Coeurs Appliqués à la Couche Physique des Stations de Base 3GPP LTE Maxime Pelcat To cite this version: Maxime Pelcat. Prototypage Rapide et Génération

Please download to get full document.

View again

of 175
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Leadership & Management

Publish on:

Views: 13 | Pages: 175

Extension: PDF | Download: 0

Prototypage Rapide et Génération de Code pour DSP Multi-Coeurs Appliqués à la Couche Physique des Stations de Base 3GPP LTE Maxime Pelcat To cite this version: Maxime Pelcat. Prototypage Rapide et Génération de Code pour DSP Multi-Coeurs Appliqués à la Couche Physique des Stations de Base 3GPP LTE. Réseaux et télécommunications []. INSA de Rennes, 00. Français. NNT : 00ISAR00 . tel HAL Id: tel Submitted on 8 Mar 0 HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d enseignement et de recherche français ou étrangers, des laboratoires publics ou privés. Thèse THESE INSA Rennes sous le sceau de l Université européenne de Bretagne pour obtenir le titre de DOCTEUR DE L INSA DE RENNES Spécialité : Traitement du Signal et des Images Rapid Prototyping and Dataflow-Based Code Generation for the 3GPP LTE enodeb Physical Layer Mapped onto Multi-Core DSPs présentée par Maxime Pelcat ECOLE DOCTORALE : MATISSE LABORATOIRE : IETR Thèse soutenue le devant le jury composé de : Shuvra BHATTACHARYYA Professeur à l Université du Maryland (USA) / Président Guy GOGNIAT Professeur des Universités à l Université de Bretagne Sud / Rapporteur Christophe JEGO Professeur des Universités à l Institut Polytechnique de Bordeaux / Rapporteur Sébastien LE NOURS Maître de conférences à Polytech Nantes / Examinateur Slaheddine ARIDHI Docteur / Encadrant Jean-François NEZAN Professeur des universités à l INSA de Rennes / Directeur de thèse Contents Acknowledgements Introduction 3. Overview Contributions of this Thesis Outline of this Thesis I Background 9 3GPP Long Term Evolution. Introduction Evolution and Environment of 3GPP Telecommunication Systems... Terminology and Requirements of LTE Scope and Organization of the LTE Study From IP Packets to Air Transmission Network Architecture LTE Radio Link Protocol Layers Data Blocks Segmentation and Concatenation MAC Layer Scheduler Overview of LTE Physical Layer Technologies Signal Air transmission and LTE Selective Channel Equalization enodeb Physical Layer Data Processing Multicarrier Broadband Technologies and Resources LTE Modulation and Coding Scheme Multiple Antennas LTE Uplink Features Single Carrier-Frequency Division Multiplexing Uplink Physical Channels Uplink Reference Signals Uplink Multiple Antenna Techniques Random Access Procedure LTE Downlink Features i ii CONTENTS.5. Orthogonal Frequency Division Multiplexing Access Downlink Physical Channels Downlink Reference Signals Downlink Multiple Antenna Techniques UE Synchronization Dataflow Model of Computation Introduction Model of Computation Overview Dataflow Model of Computation Overview Synchronous Data Flow SDF Schedulability Single Rate SDF Conversion to a Directed Acyclic Graph Cyclo Static Data Flow CSDF Schedulability Dataflow Hierarchical Extensions Parameterized Dataflow Modeling Interface-Based Hierarchical Dataflow Rapid Prototyping and Programming Multi-core Architectures 6 4. Introduction The Middle-Grain Parallelism Level Modeling Multi-Core Heterogeneous Architectures Understanding Multi-Core Heterogeneous Real-Time Embedded DSP MPSoC Literature on Architecture Modeling Multi-core Programming Middle-Grain Parallelization Techniques PREESM Among Multi-core Programming Tools Multi-core Scheduling Multi-core Scheduling Strategies Scheduling an Application under Constraints Existing Work on Scheduling Heuristics Generating Multi-core Executable Code Static Multi-core Code Execution Managing Application Variations Conclusion of the Background Part II Contributions 77 5 A System-Level Architecture Model Introduction Target Architectures Building a New Architecture Model The System-Level Architecture Model The S-LAM operators Connecting operators in S-LAM Examples of S-LAM Descriptions CONTENTS iii 5..4 The route model Transforming the S-LAM model into the route model Overview of the transformation Generating a route step Generating direct routes from the graph model Generating the complete routing table Simulating a deployment using the route model The message passing route step simulation with contention nodes The message passing route step simulation without contention nodes The DMA route step simulation The shared memory route step simulation Role of S-LAM in the Rapid Prototyping Process Storing an S-LAM Graph Hierarchical S-LAM Descriptions Enhanced Rapid Prototyping Introduction The Multi-Core DSP Programming Constraints Objectives of a Multi-Core Scheduler A Flexible Rapid Prototyping Process Algorithm Transformations while Rapid Prototyping Scenarios: Separating Algorithm and Architecture Workflows: Flows of Model Transformations The Structure of the Scalable Multi-Core Scheduler The Problem of Scheduling a DAG on an S-LAM Architecture Separating Heuristics from Benchmarks Proposed ABC Sub-Modules Proposed Actor Assignment Heuristics Advanced Features in Architecture Benchmark Computers The route model in the AAM process The Infinite Homogeneous ABC Minimizing Latency and Balancing Loads Scheduling Heuristics in the Framework Assignment Heuristics Ordering Heuristics Quality Assessment of a Multi-Core Schedule Limits in Algorithm Middle-Grain Parallelism Upper Bound of the Algorithm Speedup Lowest Acceptable Speedup Evaluation Applying Scheduling Quality Assessment to Heterogeneous Target Architectures Dataflow LTE Models 9 7. Introduction Elements of the Rapid Prototyping Framework SDF4J : A Java Library for Algorithm Graph Transformations Graphiti : A Generic Graph Editor for Editing Architectures, Algorithms and Workflows iv CONTENTS 7..4 PREESM : A Complete Framework for Hardware and Software Codesign Proposed LTE Models Fixed and Variable enodeb Parameters A LTE enodeb Use Case The Different Parts of the LTE Physical Layer Model Prototyping RACH Preamble Detection Downlink Prototyping Model Uplink Prototyping Model PUCCH Decoding PUSCH Decoding Generating Code from LTE Models Introduction Execution Schemes Managing LTE Specificities Static Code Generation for the RACH-PD algorithm Static Code Generation in the PREESM tool Method employed for the RACH-PD implementation Adaptive Scheduling of the PUSCH Static and Dynamic Parts of LTE PUSCH Decoding Parameterized Descriptions of the PUSCH A Simplified Model of Target Architectures Adaptive Multi-core Scheduling of the LTE PUSCH Implementation and Experimental Results PDSCH Model for Adaptive Scheduling Combination of Three Actor-Level LTE Dataflow Graphs Conclusion, Current Status and Future Work Conclusion Current Status Future Work A Available Workflow Nodes in PREESM 59 B French Summary 63 B. Introduction B. Etat de l Art B.. Le Standard 3GPP LTE B.. Les Modèles Flot de Données B..3 Le Prototypage Rapide et la Programmation des Architectures Multicoeurs B.3 Contributions B.3. Un Modèle d Architecture pour le Prototypage Rapide B.3. Amélioration du Prototypage Rapide B.3.3 Modèles Flot de Données du LTE B.3.4 Implémentation du LTE à Partir de Modèles Flot de Données B.4 Conclusion CONTENTS v Glossary 89 Personal Publications 9 Bibliography 0 vi CONTENTS Acknowledgements I would like to thank my advisors Dr Slaheddine Aridhi and Pr Jean-Francois Nezan for their help and support during these three years. Slah, thank you for welcoming me at Texas Instruments in Villeneuve Loubet and for spending so many hours on technical discussions, advice and corrections. Jeff, thank you for being so open-minded, for your support and for always seeing the big picture behind the technical details. I want to thank Pr Guy Gogniat and Pr Christophe Jego for reviewing this thesis. Thanks also to Pr Shuvra S. Bhattacharyya for presiding the jury and to Dr Sébastien Le Nours for being member of the jury. It has been a pleasure to work with Matthieu Wipliez and Jonathan Piat. Thank you for your friendship, your constant motivation and for sharing valuable technical insights in computing and electronics. Thanks also to the IETR image and rapid prototyping team for being great co-workers. Thanks to Pr Christophe Moy for his LTE explanations and to Dr Mickaël Raulet for his help on dataflow. Thanks to Pierrick Menuet for his excellent internship and thanks to Jocelyne Tremier for her administrative support. This thesis also benefited from many discussions with TIers: special thanks to Eric Biscondi, Sébastien Tomas, Renaud Keller, Alexandre Romana and Filip Moerman for these. Thanks to the High Performance and Multi-core Processing team for the way you welcomed me to your group. This thesis benefited from many free or open source tools including Java, Eclipse, JFreeChart, JGraph, SDF4J, LaTeX... Thanks to the open source programmers that participate to the general progress of knowledge. I am also grateful to Pr Markus Rupp and his team for welcoming me at the Technical University of Vienna and to Pr Olivier Déforges for supporting this stay. This summer 009 was both instructive and fun and I thank you for that. I am thankful to the many chocolate cheesecakes of the Nero café in Truro, Cornwall that were eaten while writing this document: you were delicious. Many thanks to Dr Cédric Herzet for his help on mathematics and his Belgian fries. Thanks to Karina for reading and correcting this entire document. Finally, Thanks to my friends, my parents and sister and to Stéphanie for their love and support during these three years. Acknowledgements CHAPTER Introduction. Overview The recent evolution of digital communication systems (voice, data and video) has been dramatic. Over the last two decades, low data-rate systems (such as dial-up modems, first and second generation cellular systems, 80. Wireless local area networks) have been replaced or augmented by systems capable of data rates of several Mbps, supporting multimedia applications (such as DSL, cable modems, 80.b/a/g/n wireless local area networks, 3G, WiMax and ultra-wideband personal area networks). One of the latest developments in wireless telecommunications is the 3GPP Long Term Evolution (LTE) standard. LTE enables data rates beyond hundreds of Mbit/s. As communication systems have evolved, the resulting increase in data rates has necessitated higher system algorithmic complexity. A more complex system requires greater flexibility in order to function with different protocols in diverse environments. In 965, Moore observed that the density of transistors (number of transistors per square inch) on an integrated circuit doubled every two years. This trend has remained unmodified since then. Until 003, the processor clock rates followed approximately the same rule. Since 003, manufacturers have stopped increasing the chip clock rates to limit the chip power dissipation. Increasing clock speed combined with additional on-chip cache memory and more complex instruction sets only provided increasingly faster single-core processors when both clock rate and power dissipation increases were acceptable. The only solution to continue increasing chip performance without increasing power consumption is now to use multi-core chips. A base station is a terrestrial signal processing center that interfaces a radio access network with the cabled backbone. It is a computing system dedicated to the task of managing user communication. It constitutes a communication entity integrating power supply, interfaces, and so on. A base station is a real-time system because it treats continuous streams of data, the computation of which has hard time constraints. An LTE network uses advanced signal processing features including Orthogonal Frequency Division Multiplexing Access (OFDMA), Single Carrier Frequency Division Multiplexing Access (SC-FDMA), Multiple Input Multiple Output (MIMO). These features greatly increase the available data rates, cell sizes and reliability at a cost of an unprecedented level of processing power. An LTE base station must use powerful embedded hardware platforms. 4 Introduction Multi-core Digital Signal Processors (DSP) are suitable hardware architectures to execute the complex operations in real-time. They combine cores with processing flexibility and hardware coprocessors that accelerate repetitive processes. The consequence of evolution of the standards and parallel architectures is an increased need for the system to support multiple standards and multicomponent devices. These two requirements complicate much of the development of telecommunication systems, imposing the optimization of device parameters over varying constraints, such as performance, area and power. Achieving this device optimization requires a good understanding of the application complexity and the choice of an appropriate architecture to support this application. Rapid prototyping consists of studying the design tradeoffs at several stages of the development, including the early stages, when the majority of the hardware and software are not available. The inputs to a rapid prototyping process must then be models of system parts, and are much simpler than in the final implementation. In a perfect design process, programmers would refine the models progressively, heading towards the final implementation. Imperative languages, and C in particular, are presently the prefered languages to program DSPs. Decades of compilation optimizations have made them a good tradeoff between readability, optimality and modularity. However, imperative languages have been developed to address sequential hardware architectures inspired on the Turing machine and their ability to express algorithm parallelism is limited. Over the years, dataflow languages and models have proven to be efficient representations of parallel algorithms, allowing the simplification of their analysis. In 978, Ackerman explains the effectiveness of dataflow languages in parallel algorithm descriptions [Ack8]. He emphasizes two important properties of dataflow languages: data locality: data buffering is kept as local and as reduced as possible, scheduling constraints reduced to data dependencies: the scheduler that organizes execution has minimal constraints. The absence of remote data dependency simplifies algorithm analysis and helps to create a dataflow code that is correct-by-construction. The minimal scheduling constraints express the algorithm parallelism maximally. However, good practises in the manipulation of imperative languages to avoid recalculations often go against these two principles. For example, iterations in dataflow redefine the iterated data constantly to avoid sharing a state where imperative languages promote the shared use of registers. But these iterations conceal most of the parallelism in the algorithms that must now be exploited in multicore DSPs. Parallelism is obtained when functions are clearly separated and Ackerman gives a solution to that: to manipulate data structures in the same way scalars are manipulated. Instead of manipulating buffers and pointers, dataflow models manipulate tokens, abstract representations of a data quantum, regardless of its size. It may be noted that digital signal processing consists of processing streams (or flows) of data. The most natural way to describe a signal processing algorithm is a graph with nodes representing data transformations and edges representing data flowing between the nodes. The extensive use of Matlab Simulink is evidence that a graphically editable plot is suitable input for a rapid prototyping tool. The 3GPP LTE is the first application prototyped using the Parallel and Real-time Embedded Executives Scheduling Method (PREESM). PREESM is a rapid prototyping tool with code generation capabilities initiated in 007 and developed during this thesis with the first main objective of studying LTE physical layer. For the development of this Overview 5 tool, an extensive literature survey yielded much useful research: the work on dataflow process networks from University of California, Berkeley, University of Maryland and Leuven Catholic University, the Algorithm-Architecture Matching (AAM) methodology and SynDEx tool from INRIA Rocquencourt, the multi-core scheduling studies at Hong Kong University of Science and Technology, the dynamic multithreaded algorithms from Massachusetts Institute of Technology among others. PREESM is a framework of plug-ins rather than a monolithic tool. PREESM is intended to prototype an efficient multi-core DSP development chain. One goal of this study is to use LTE as a complex and real use case for PREESM. In 008, 68% of DSPs shipped worldwide were intended for the wireless sector [KAG + 09]. Thus, a multi-core development chain must efficiently address new wireless application types such as LTE. The term multi-core is used in the broad sense: a base station multi-core system can embed several interconnected processors of different types, themselves multi-core and heterogeneous. These multi-core systems are becoming more common: even mobile phones are now such distributed systems. While targeting a classic single-core Von Neumann hardware architecture, it must be noted that all DSP development chains have similar features, as displayed in Figure.(a). These features systematically include: A textual language (C, C++) compiler that generates a sequential assembly code for functions/methods at compile-time. In the DSP world, the generated assembly code is native, i.e. it is specific and optimized for the Instruction Set Architecture (ISA) of the target core. A linker that gathers assembly code at compile-time in an executable code. A simulator/debugger enabling code inspection. An Operating System (OS) that launches the processes, each of which comprise several threads. The OS handles the resource shared by the concurrent threads. Software Text Editor C Source Code Hardware Dedicated Compiler Compiler Options Object Code Linker Executable Simulator Loader Editor Algorithm Architecture Description Description Compiler Generic Compiler Options Multicore OS Object Codes Directives Simulator Linker Executables Loader Debugger OS DSP Core Debugger OS DSP Core OS DSP Core... (a) A Single-core DSP Development Chain (b) A Multi-core DSP Development Chain Figure.: Comparing a Present Single-core Development Chain to a Possible Development Chain for Multi-core DSPs Incontrast with the DSP world, the generic computing world is currently experiencing an increasing use of bytecode. A bytecode is more generic than native code and is Just- In-Time (JIT) compiled or interpreted at run-time. It enables portability over several ISA 6 Introduction and OS at the cost of lower execution speed. Examples of JIT compilers are the Java Virtual Machine (JVM) and the Low Level Virtual Machine (LLVM). Embedded systems are dedicated to a single functionality and in such systems, compiled code portability is not advantageous enough to justify performance loss. It is thus unlikely that JIT compilers will appear in DSP systems soon. However, as embedded system designers often have the choice between many hardware configurations, a multi-core development chain must have the capacity to target these hardware configurations at compile-time. As a consequence, a multi-core development chain needs a complete input architecture model instead of a few textual compiler options, such as used in single-core development chains. Extending the structure of Figure.(a), a development chain for multicore DSPs may be imagined with an additional input architecture model (Figure.(b)). This multi-core development chain generates an executable for each core in additio
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks