Technologies for Single Cell Genome Analysis Erik Borgström - PDF

Technologies for Single Cell Genome Analysis Erik Borgström Erik Borgström Stockholm 2016 Royal Institute of Technology (KTH) School of Biotechnology Division of Gene Technology Science for Life Laboratory

Please download to get full document.

View again

of 28
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.


Publish on:

Views: 85 | Pages: 28

Extension: PDF | Download: 0

Technologies for Single Cell Genome Analysis Erik Borgström Erik Borgström Stockholm 2016 Royal Institute of Technology (KTH) School of Biotechnology Division of Gene Technology Science for Life Laboratory SE Solna Sweden Printed by Universitetsservice US-AB Drottning Kristinas väg 53B SE Stockholm Sweden ISBN TRITA-BIO Report 2016:1 ISSN Erik Borgström Abstract During the last decade high throughput DNA sequencing of single cells has evolved from an idea to one of the most high profile fields of research. Much of this development has been possible due to the dramatic reduction in costs for massively parallel sequencing. The four papers included in this thesis describe or evaluate technological advancements for high throughput DNA sequencing of single cells and single molecules. As the sequencing technologies improve, more samples are analyzed in parallel. In paper 1, an automated procedure for preparation of samples prior to massively parallel sequencing is presented. The method has been applied to several projects and further development by others has enabled even higher sample throughputs. Amplification of single cell genomes is a prerequisite for sequence analysis. Paper 2 evaluates four commercially available kits for whole genome amplification of single cells. The results show that coverage of the genome differs significantly among the protocols and as expected this has impact on the downstream analysis. In Paper 3, single cell genotyping by exome sequencing is used to confirm the presence of fat cells derived from donated bone marrow within the recipients fat tissue. Close to hundred single cells were exome sequenced and a subset was validated by whole genome sequencing. In the last paper, a new method for phasing (i.e. determining the physical connection of variant alleles) is presented. The method barcodes amplicons from single molecules in emulsion droplets. The barcodes can then be used to determine which variants were present on the same original DNA molecule. The method is applied to two variable regions in the bacterial 16S gene in a metagenomic sample. Thus, two of the papers (1 and 4) present development of new methods for increasing the throughput and information content of data from massively parallel sequencing. Paper 2 evaluates and compares currently available methods and in paper 3, a biological question is answered using some of these tools. Keywords: DNA, sequencing, single molecule, single cell, whole genome amplification, exome sequencing, emulsions, barcoding, phasing. Sammanfattning Under det senaste decenniet har storskalig DNA-sekvensering av enskilda celler utvecklats från bara en idé till att bli ett av de mest uppmärksammade forskningsområdena. En stor del av denna utveckling har möjliggjorts av den dramatiska reduceringen av kostnaderna för denna typ av analys. De fyra artiklar som ingår i den här avhandlingen beskriver eller utvärderar tekniska framsteg för storskalig DNAsekvensering av enskilda celler och enskilda molekyler. Allteftersom sekvenseringsteknologierna förbättras ökar också antalet prover som kan analyseras parallellt. I artikel 1 presenteras en automatiserad metod för preparering av prover inför storskalig DNAsekvensering. Tekniken har tillämpats inom flera projekt och har senare också vidareutvecklats (av kollegor) vilket gjort det möjligt att parallellt analysera ett ännu större antal prover. Amplifiering av DNA från enskilda celler är en förutsättning för sekvensanalys av deras genom. Artikel 2 utvärderar fyra kommersiellt tillgängliga kit för amplifiering av fullständiga genom från enskilda celler. Resultaten visar att täckningen av genomet skiljer sig väsentligt mellan amplifierings-produkterna och som väntat påverkar detta efterföljande analyssteg. I artikel 3 utförs genotypning av enskilda celler med hjälp av exom-sekvensering. Detta för att i en donationsmottagares fettvävnad bekräfta förekomsten av fettceller som härstammar från donerad benmärg. Nära hundra enskilda celler exom-sekvenserades. En delmängd av dessa celler validerades med hjälp av sekvensering av hela genomet. I den sista artikeln presenteras en ny metod för bestämning av fysisk koppling av alleler för specifika sekvens-varianter över en hel DNA-molekyl. Amplikon från enstaka molekyler skapas i emulsionsdroppar och märks med specifika DNAsekvenser. Efter sekvensering kan denna märkning användas för att bestämma vilka av varianternas alleler som var närvarande på samma ursprungliga DNA-molekyl. Metoden tillämpas på två variabla regioner i den bakteriella 16S genen i ett metagenomiskt prov. I två av artiklarna (1 och 4) presenteras nya metoder som är utvecklade för att öka genomströmningen av prover samt öka informationsinnehållet i det data som kommer från den efterkommande storskaliga DNAsekvenseringen. Artikel 2 utvärderar och jämför tillgängliga metoder för amplifiering av hela det genomiska DNA t från enstaka celler. Slutligen, i artikel 3, besvaras en biologisk frågeställning med hjälp av storskalig DNA-sekvensering av enstaka celler. Erik Borgström List of publications 1. Erik Borgström, Sverker Lundin, Joakim Lundeberg. (2011) Large Scale Library Generation for High Throughput Sequencing. PLoS ONE 6(4): e doi: /journal.pone Erik Borgström, Marta Paterlini, Jeff Mold, Jonas Frisen, Joakim Lundeberg. Comparison of Whole Genome Amplification Techniques for Human Single Cell (Exome) Sequencing. Manuscript. 3. Mikael Rydén, Mehmet Uzunel, Joanna L. Hård, Erik Borgström, Jeff E. Mold, Erik Arner, Niklas Mejhert, Daniel P. Andersson, Yvonne Widlund, Moustapha Hassan, Christina V. Jones, Kirsty L. Spalding, Britt-Marie Svahn, Afshin Ahmadian, Jonas Frisén, Samuel Bernard, Jonas Mattsson, Peter Arner. (2015). Transplanted Bone Marrow-Derived Cells Contribute to Human Adipogenesis. Cell Metabolism, 22(3), doi: /j.cmet Erik Borgström, David Redin, Sverker Lundin, Emelie Berglund, Anders F. Andersson, Afshin Ahmadian. (2015). Phasing of single DNA molecules by massively parallel barcoding. Nature Communications, 6, doi: /ncomms8173 Contents Erik Borgström Erik Borgström Uniqueness and the Ultimate Resolution Your fingerprint is unique, it can be used to unlock your smartphone or to identify you when you travel abroad. Likewise, your genome is unique and might in the future be used in the same manner as your fingerprint. Still, your genome contains much more information than your fingerprint. It is the basis for how the cells in your body function and respond to influence from the world around you. Each cell in your body harbours a copy of your genome, two sets of all the 23 human chromosomes 1. One set inherited from each of your parents, enclosed together in the first of your cells. This cell has then divided thousands of times to form the approximately thirty seven thousand billions (3.72e13) of cells in your body 2. On rare occasions during cell division an error occurs while copying the genome, which leads to one of the two resulting cells getting a genome slightly different from all the others. Due to these so-called replication errors, each of the cells in your body is unique, slightly different from all the others. Most changes will never be noticed, though some of the cells might acquire changes that lead to cellular malfunction or disease, such as cancer. Therefore it is important to be able to analyze not only the average human genome present in an individual's body but also the precise genome present in a given cell. Ultimately one would desire to be able to deduce on which molecule, of the two sets of chromosomes within the cell, the error has occurred. This thesis aims to be a part of the technological advance leading to this ultimate resolution of biological data. The four papers included in the thesis either address one of the main obstacles encountered during single cell DNA analysis and/or are applications thereof. In Paper 1 the throughput of library generation for DNA sequencing is addressed while in Paper 2 amplification techniques prior to sequencing of single cells are evaluated. In Paper 3 a high throughput single cell study is conducted and in Paper 4 a method for single molecule analysis by massively parallel sequencing is presented. 1 The cell and its content. A Compartmentalization The term cell stems from the book Micrographia published by Robert Hooke in In Micrographia Hooke describes his observations of various matters through microscopes. While examining wood and cork he mentions observing little boxes, pores, caverns and cells within the materials 3. The word cell comes from the Latin Cella meaning small room 4 and is a very suitable name for these fundamental biological compartments. The cell harbours the biochemical reactions considered to be the basis for life and separates them from the surroundings by compartmentalization. This fundamental type of compartmentalization allows the cells to manipulate their microenvironment and is the physical link between genotype and phenotype and thereby a fundamental building block in the process of evolution (it has also inspired molecular techniques such as emulsion PCR, described later). Cellular life ranges from relatively simple unicellular prokaryotic life forms to very complex eukaryotic multicellular organisms like humans. The great diversity of cellular life is reflected by a diverse range of possible molecular components. The molecular components available to a cell are largely determined by the genes that are present within the cell. The full collection of genes in a cell is harboured within the genome and is encoded in the structured sequence of deoxyribonucleic acid (DNA) polymers. Information from active parts of the genome is copied into ribonucleic acid (RNA) molecules called transcripts, collectively known as the transcriptome. Some molecules in the transcriptome are translated into amino acid polymers, known as proteins. The protein molecules are responsible for carrying out specific functions inside a cell and the presence of a specific protein can for example determine if an organism is able to metabolize a certain nutrient or not. The direction of the information flow within cells, from nucleic acids (the genome through the transcriptome) to functional protein molecules in the proteome, is called the central dogma of molecular biology 5. Each cell in an organism encapsulates the three main components present in the encoding of life; the genes within the genome, the transcripts within the transcriptome and finally the proteins in the proteome. Compartmentalized together with a plethora of metabolites and other biologically active molecules they 2 Erik Borgström form the basis for all cellular life as we know it and each of the three components are described in brief below. The DNA Deoxyribonucleic acid, DNA, was first isolated by Friedrich Miescher in the middle of the 19th century. Many findings and theories about its chemical composition and role as the carrier of genetic information were presented during the latter half of the 19th century and the first half of the 20th. In 1953 Watson and Crick presented the structure of the DNA molecule and thereby paved the way for our understanding of how the genetic material is able to be replicated and propagated to following generations 6. As suggested by Watson and Crick the (double stranded) DNA molecule consist of two helical chains each coiled around the same axis. The chains consist of alternating phosphate and deoxyribose sugar groups whereof to each sugar there is a nucleobase attached. The bases are located so that they pair to each other by hydrogen bonding and thereby hold the two chains together. The base pairing is specific where adenine (A) always pairs to thymine (T) and guanine (G) pairs to cytosine (C) 7. The Genes and the Genome Studies on inheritance were initially performed by Gregor Mendel in the mid 19th century. Without using the term gene, Mendel described some of the basic principles of inheritance e.g. the concept of dominant and recessive traits 8. The word gene was first used in 1909 after the rediscovery of Mendel's work. Initially the definition was discrete unit of heredity. As the understanding deepened, the concept of a gene was developed from this early idea through a number of intermediate and more strict definitions. At the end of the 20th century it ended up in more open definitions such as a DNA segment that contributes to phenotype/function. A popular metaphor in the recent decades has been to think of genes as subroutines within a computer operating system, using transcription and translation as means to run the subroutines. A recent definition is a gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products 9. The definition of a gene is constantly evolving, large parts of the genome are transcribed and regulatory regions might be considered. Furthermore the definition of function is also debated. Still, common denominators among 3 the definitions are that a gene is a subset of genomic sequence(s) with a function and its inherent coupling to heritability. A Genome is an organism s complete set of DNA, including all of its genes 10 (there are also organisms with RNA as genetic material, eg. RNA viruses). The first organism to get its genome sequenced was Bacteriophage MS2 11. The bacteriophage has a 3.5kb RNA genome and the sequence of the final gene of the genome was published in This was soon followed by the first DNA genome, the 5.4kb genome of bacteriophage phix, sequenced by Frederick Sanger s plus and minus method in (prior to the invention of his chain termination method). The Human Genome Project (HGP) was initiated in October The three giga-base human genome was 25 times larger than any previously sequenced genome and eight times larger than all previously sequenced genomes together. In February 2001, two draft assemblies of the sequence were published simultaneously by the public HGP 14 and a privately funded initiative 15. The finished genome sequence was published in Several projects aiming at variant discovery and functional annotation of the sequence followed the completion of the human genome. A few examples are the HapMap Project 17 and 1000 genomes project 18, both with the aim of cataloging human variations, as well as the ENCODE project 19 assigning functional annotations for large parts of the genome. The RNA The ribonucleic acid (RNA) molecules within a cell are generally the product of transcription from DNA. During transcription active parts of the genome are copied into complementary RNA molecules called transcripts. RNA is structurally similar to DNA though it has a very different role in the cell. The two fundamental chemical differences of RNA compared to DNA is firstly an extra hydroxyl group on each sugar moiety of the backbone and secondly the presence of the uracil nucleotide bases instead of thymine bases. Just as DNA, the RNA molecules can hybridize to themselves, other RNA species or DNA and function in a double stranded fashion. However, as opposed to DNA, it often appears in single stranded form (as is the case for messenger RNAs). The transcribed RNA molecules that later are translated into proteins are termed messenger RNA (mrna). Apart from mrnas there are many 4 Erik Borgström additional types of functional RNA molecules in a cell, each with a distinct function for example transfer RNAs, ribosomal RNAs, noncoding RNAs and many more 20. The Transcripts and the Transcriptome In a cell, RNA polymerase produces primary transcript from active genes. Depending on the organism and the function of the gene, the primary transcripts can be modified into more mature forms by processes such as capping, poly A tailing and splicing. The mature transcripts are then translated into proteins or fulfill other functions in the cell (for example in regulation). All cells within an organism have the same overall genome (except some rare and potentially impactful somatic variants). Still, cells from separate tissues may be very different from each other. This is due to the fact that only a subset of the genome is transcribed in a cell at a certain time point. The resulting set of transcripts dynamically changes during the lifetime of a cell as response to external and internal stimuli. The population of transcripts that are present in one cell at a certain time point are collectively called the transcriptome 21. The transcriptome is the basis for the state of that cell, i.e. which proteins are produced and in which levels and thereby how the cell will appear and function. The Proteins and the Proteome Mature messenger RNA molecules are translated into amino acid chains. These polypeptides can be edited and post translationally modified, for example by addition of sugar moieties. Ultimately they form functional three dimensional protein molecules. The proteins are the working force of the cell. To mention a few functions, they catalyze chemical reactions, and function as structural building blocks, channels, transporters and signaling substances. The collection of all the proteins present in a cell is called the proteome and is studied in the field of proteomics. Communities and Tissues Cells interact with the environment and each other within multicellular communities such as biofilms or tissues. These multicellular structures are quite diverse and analyzing this heterogeneity is important. For example, some of the organisms in environmental samples might possess 5 interesting properties such as production of antibiotic substances 22 or ability to metabolize specific compounds 23. Still, these microorganisms live side by side with many other organisms, potentially in symbiosis or competing for the same resources. Thus, their function might be hard to locate and characterize in these communities. In the case of human tissues, analyzing intercellular heterogeneity could provide information about disease progress or deepen our understanding of fundamental processes in developmental biology 24. Some microorganisms are not cultivable under laboratory conditions. Moreover, human tissues are traditionally analyzed as an average of multicellular samples. Direct investigation of single cells can avoid cultivation completely and allow a holistic view of the environmental samples. The single cell analysis approach also provides an increase in the resolution and sensitivity for human samples 25. Throughout the rest of this thesis, the focus will be on analysis of DNA from single cells. 6 Erik Borgström Analysis of Cells One, Two or Several Molecules High throughput analysis of DNA sequences from single cells is a relatively new research field. PCR was performed on single cells already during the 1980s 26 and amplification of whole genomes soon followed 27. It was not until recently that molecular techniques and the cost of sequencing reached a level that allowed for high throughput analysis of single cells. Still, it is worth noting that other techniques for single cell analysis such as histological observations has been done for hundreds of years and more modern techniques like fluorescent in situ hybridization (FISH) 28 has been around for decades. Genetic material for sequence analysis is generally extracted from samples containing a large number of cells (for example from an individual, a population or an environmental sample). This classical approach to DNA sequence analysis assays a mixture of hundreds to several thousands of cells (and/or molecules) and any conclusions drawn are thus from population wide averages. This introduces a risk of missing important information about sample heterogeneity and rare entities are especially hard to detect 29 (e.g. circulating tumor cells). The two main challenges when working with single cell DNA sequencing as comp
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks