Please download to get full document.

View again

of 29
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.

Data & Analytics

Publish on:

Views: 7 | Pages: 29

Extension: PDF | Download: 0

LEBENSWISSENSCHAFTLICHE FAKULTÄT INSTITUT FÜR BIOLOGIE BACHELORARBEIT ZUM ERWERB DES AKADEMISCHEN GRADES BACHELOR OF SCIENCE Ein mathematisches Modell, das oszillatorische Genexpression in Mycobacterium tuberculosis mit DNA supercoiling verbindet A mathematical model linking oscillatory gene expression in Mycobacterium tuberculosis with DNA supercoiling vorgelegt von Adrian Zachariae geb. am in Berlin angefertigt in der Arbeitsgruppe Theoretische Biophysik am Institut für Biologie Berlin, Januar 2015 Zusammenfassung Das pathogene Bacterium Mycobacterium tuberculosis übersteht lange Zeiträume unter Stress indem es in einen dormanten Zustand übergeht, der sich durch fast vollständiges Einstellen seines Metabolismus und der Transkription auszeichnet. Wiederbelüftung dormanter M. tuberculosis Kulturen führte, entgegen früherer Experimente, nicht zu einem sofortigen, synchronisierten Zellwachstum, stattdessen blieb das Zellwachstum für etwa 42 h aus. Während dieser Zeit scheinen zwei Gruppen von Genen stark antikorrelierten, scheinbar oszillatorischen Mustern zu folgen. Diese Muster suggerieren einen gemeinsamen Regulationsmechanismus, Ziel dieser Arbeit ist zwischen einer Regulation durch einen gemeinsamen Traskriptionsfaktor und einem Regulationsmechanismus auf der Basis von DNA supercoiling zu unterscheiden. Dazu wurden die Genloci nach Transkriptionsfaktorbindungsstellen sowie abweichenden AT Gehalt und periodischen AT tracts untersucht, die beide mit DNA supercoiling assoziiert werden. Weiterhin wurden drei einfache mathematische Modelle der DNA supercoiling vermittelten Regulation vorgeschlagen und zu den Daten gefittet. Abstract The pathogenic bacterium Mycobacterium tuberculosis can endure long time periods of stress by entering a dormant state, characterised by an almost complete, temporary stop of its metabolism and transcription. Re aeration of dormant M. tuberculosis led, contrary to previous experiments, not to an immediate, synchronised cell growth, but to a resuscitation suspended for about 42 h. During this time the expression time courses of two sets of genes seem to follow highly anti correlated, seemingly oscillatory patterns. These patterns suggest a mutual regulatory mechanism and the intent of this thesis is to differentiate between a regulation mechanism based on a shared transcription factor and a regulation mediated by DNA supercoiling. To do so, the gene loci were searched for transcription factor binding sides as well as for unusual AT content and periodic AT tracts associated with DNA supercoiling. Furthermore three simple mathematical models of DNA supercoiling mediated regulation were purposed and fitted to the data. i CONTENTS Contents 1 Introduction DNA Binding Transcription Factors Chromosome Supercoiling Sequence Periodicity Goal Methods Transcription Factor Binding Sides Periodic AT tracts AT Content Differential equations Stability of Steady States and Hopf bifurcations Model Criteria for the Model Selection Model A Model B Results Transcription Factor AT Content Sequence Periodicity Model Finding a Possible Regulator Conclusion 25 6 Acknowledgments 27 7 Appendix AT-tract Spectra Bifurcation Diagrams Estimated Parameters Eigenständigkeitserklärung 36 ii 1 INTRODUCTION 1 Introduction Over 130 years after its discovery by Roland Koch, Mycobacterium tuberculosis remains a major health threat, killing almost 2 million annually[1]. The SysteMTb project is a collaborative project, funded by the European Commission FP7, aimed to create a framework to understand key features of M. tuberculosis. One of the key capacities of M. tuberculosis is ability to enter a dormant state triggered by nutrient or oxygen depletion[2, 3]. By shutting down its central metabolism and transcription it can endure long periods of stress and also becomes extremely resistant to drug treatment[3]. The dormancy can be triggered by oxygen depletion and was used by Wayne and Hayes[3] to synchronise cell growth and replication. This was achieved by slowly depleting oxygen under constant, gentle stirring to trigger the dormancy. The resuscitation was, after complete cease of growth, triggered by dilution in a new, oxygen-rich medium. Wayne and Hayes observed a constant population size for 20 h after re-aeration, then an approximate 2-fold increase, followed by another interval of constant population size. This experiment was repeated in the SysteMTb project in order to create a cell cycle model for M. tuberculosis. To ensure enough cell mass was available for highthroughput experiments, the cultures were not diluted in a new medium. Instead the flasks were re opened for re aeration, restoring the oxygen tension in the medium by diffusion through the surface. The resuscitation took longer than expected with no cell growth until 42 h passed and during this period, the expression of some of the genes is highly anti-correlated, specifically the time course of dnaa and ftsz expression. Expression (log 2 ) d -15 d -10 d -5 d 0 2 h 4 h 6 h 8 h 10 h 12 h 14 h 16 h 18 h Time dnaa ftsz 20 h 22 h 24 h 26 h 28 h 30 h 32 h 34 h 36 h 38 h 40 h 42 h 44 h 46 h Figure 1: Time course of dnaa and ftsz expression. The expression patterns are highly anticorrelated. Those patters are present in all three iterations of the experiment and seem to have an oscillatory component. 230 genes have been found that show similar expression patterns, 200 similar to ftsz and 30 similar to dnaa, suggesting a mutual transcriptional regulation mechanism. 1 1 INTRODUCTION Two possible mechanisms will be explored in greater detail: 1. DNA-binding transcription factors are an ubiquitous regulation mechanism 2. DNA supercoiling is responsible for large-scale transcriptional regulation in cyanobacterial leading to circadian oscillations[4]. 1.1 DNA Binding Transcription Factors Bacterial genes are ordered in co regulated clusters called operons[5]. An operon consists of four elements: a promotor, an operator, a set of genes and the terminator. The promotor is a sequence allowing the RNA polymerase to bind and is thus needed for the initiation of replication. The operator is a regulatory sequence and the terminator causes the termination of transcription. All elements of the operator are located in cis (i.e., they are all located on the same strand of the DNA)[5]. Regulatory proteins, called transcription factors (TF), can bind to the operator and either upregulate gene expression (activation) or downregulate gene expression (repression)[5]. In prokaryotes, genes expression is usually non-restrictive and thus the RNA polymerase can bind to the operons without a TF. Therefore most transcription factors are repressors (i.e., they downregulate gene expression)[6]. Transcription factors have DNA binding domains binding to a specific DNA sequence in the operator. There are different types of DNA binding domains like the classical helix turn helix (HTH) type in prokaryotes, consisting of two α helices binding to the DNA groves connected by a short polypeptide chain[7]. The structural features of the TF binding sides induce certain similarities in the DNA motifs they bind. HTH type TF are usually symmetric homodimers and therefore the binding motif is usually a palindrome (i.e., a symmetric sequence) but since the two binding sites are usually not directly adjacent, the centre of the motif is less conserved[7]. Activation of large sets of genes in bacteria is often controlled by alternative σ factors [8]. σ factors are a subunit of the RNA polymerase complex and necessary for the recognition of promoter sequences[8]. M. tuberculosis, like most bacteria, has one primary σ factor (σ A ) responsible for the so called housekeeping genes, which are universally expressed[9]. Alternative σ factors can bind to different target promoter regions and therefore activate large sets of genes usually as a direct response to environmental stress factors like nutrient depletion or heat. M. tuberculosis has 13 σ factors, the highest number of alternative σ factors of all obligate human pathogens even relative to its large chromosome[9]. 1.2 Chromosome Supercoiling A linear, unbound DNA molecule in the B-DNA conformation, the most likely predominant DNA conformation, forms a double helix with base pairs (bp) per turn[10]. This a result of the hydrophobic effect, minimising the contact of the hydrophobic base pairs with the water molecules. Such a DNA molecule is called relaxed. 2 1 INTRODUCTION A DNA molecule with a stronger or weaker curvature in comparison to the relaxed form is considered supercoiled. If the DNA has a stronger curvature (i.e., more than 10.5 bp per turn) it is positively supercoiled and if the DNA has a weaker curvature (i.e., less than 10.5 bp per turn) it is negatively supercoiled[11]. Most DNA is negatively supercoiled, with few exceptions, usually the DNA of thermophiles[12]. The supercoiled state can be modified by DNA topoisomerases, notably ATP-dependent gyrases can introduce negative supercoiling in bacteria[12]. While the mechanism remains unknown, the oscillations of the supercoiled state of the chromosome seems to play an important role in cyanobacterial circadian expression[13, 14, 15, 16]. It has also been proposed to regulate endobacterial growth[17, 4] and the oscillating metabolism of yeast[18] Sequence Periodicity Bacterial DNA sequences contain two periodic patterns[19] 1. a strong one with a 3 bp frequency due to the codon length[20, 21, 22] 2. a relatively weak one with a bp frequency. The second one can result from correlations in the corresponding protein sequences due to the amphipathic character of α-helices [23]. These patterns are about 35 bp long. Less well understood are the patterns formed by short runs of A and T, called ATtract, with an average length of 100 bp[23, 24, 19] preferentially encoded in the 3rd codon position[24]. AT-tracts do not include TpA elements (with p = phosphate) and induce a bend in the minor grove of the DNA[25]. Phased with the length of a single turn of the DNA double helix the individual bents can accumulate and induce an intrinsic curvature[25] and may aid DNA compaction[26]. Periods of AT-tracts slightly out of phase with the DNA curvature may indicate or induce DNA supercoiling [24], with the period of the AT-tracts corresponding with the period of the DNA turns. Alternatively, AT-tracts with periods 10.5 may correspond to plectonemes, twisted loops formed by negatively supercoiled DNA [27] or represent nucleosome-like structures with DNAbinding proteins like the HU protein. The location of highly expressed genes is significantly biased towards segments lacking strong periodic signals[19] further suggesting a connection between periodic patterns and regulation of gene expression. 1.3 Goal The goal of this thesis is to differentiate between the two possible regulation mechanisms causing the anti correlated, seemingly oscillatory gene expression patterns: 1. DNA supercoiling 2. DNA binding transcription factors. 3 1 INTRODUCTION Co regulation by a mutual transcription factor require shared transcription factor binding sides and therefore shared upstream sequence motifs. If shared motifs do exist, they can be found and their significance can be assessed using bioinformatical sampling methods, which will be the subject of section 4.1. DNA supercoiling has been associated with unusual AT content, both in the upstream region as well as the coding region, which can easily computed as described in section 2.3 and are compared to the findings in other bacteria in section 4.2. DNA supercoiling has also been associated with periodic AT tracts albeit much more loosely. Means to assign each gene a value corresponding to the strength of the local AT tract periodicity will be described in section 2.2 and discussed in section 4.3. Abstract models of a DNA supercoiling mediated regulatory mechanism capable of sustained oscillation are proposed in section 3. All models were fitted to the available data in order to verify that the time courses could represent oscillatory time courses formed by DNA supercoiling at all. Furthermore, comparing the generated time courses with the experimental time course data could lead to the identification other elements of the mechanism. The discussion of the different models and generated time courses can be found in section 2 METHODS 2 Methods 2.1 Transcription Factor Binding Sides The Gibbs Motif Sampler[28] was used to search for shared transcription factor (TF) binding sites in each of the two groups of genes. Two different models for the binding site were considered, one reflecting current knowledge of known bacterial TF binding sides of the classic bacterial helix turn helix transcription factor type and one representing the 9 bp binding site[29] of DnaA, since DnaA is one of the proteins in the groups known to interact with DNA. The first is a palindromic motif with a possible gab between the first eight positions and their reverse complement resulting in a variable overall length of 16 bp to 24 bp (referred to as palindromic motif). The second is a non palindromic motif with a length of 9 bp (referred to as non palindromic motif). Genes on the same strand with less than 50 bp long intergenic sequence were assumed to form an operon and the upstream region of the leading gene was used for all genes. The definition of operons and the TF binding sides in M. tuberculosis were obtained from the tutorial on co expression data analysis in M. tuberculosis on the Gibbs Motife Sampler web page[30]. A Wilcoxon Signed Rank Test[31] was performed to assess the significance of the alignments using the study sequences randomly shuffled by the Gibbs Sampler as negative controls. A single σ factor is part of the proteins encoded by the two groups of genes, the σ H factor which belongs to the ftsz group. The σ H factor is an alternative σ factor induced by oxidative stress and heat shock [9] sigh Expression (log 2 ) d - 15 d - 10 d - 5 d 0 2 h 4 h 6 h 8 h 10 h 12 h 14 h 16 h 18 h Time 20 h 22 h 24 h 26 h 28 h 30 h 32 h 34 h 36 h 38 h 40 h 42 h 44 h 46 h Figure 2: Time course of the sigh expression. The gene is part of the ftsz group and is the structural gene of the σ H factor. The promoter sequences of both groups of genes were searched using the web application GLAM2SCAN that is part of the MEME toolbox[32]. A consensus sequence of the 5 2 METHODS σ H factor was proposed by Riccardo Manganelli et al.[33] and the alignment of the promoter sequences Riccardo Manganelli et al. used to build their consensus sequence was submitted to GLAM2SCAN as a motif. To determine whether the motif is enriched in the promoter sequences of the genes, alignments were compared to alignments using shuffled sequences and the significance was assessed by performing a Wilcoxon Signed rank test. 2.2 Periodic AT tracts To compute the AT tract periodicity of a certain sequence the method described by Herzel et al.[24, 23] can be used. The computational tools provided by Mrazek et al.[19] expand on this method and allow the assignment of periodicity signals to parts of the chromosome. These tools were implemented in Matlab and are explained below. First, starting by each occurrence of the motif all further occurrences of the motif in the next 100 bp are stored in an array. A histogram N(s) is build by summation of all these arrays with s being the distance between the motifs. Multiple sequence motifs were used, a motif of single nucleotides A/T (Motif AT), a binucleotide motif AA/TT (Motif A2T2) and a tetranucleotide motif AAAA/AAAT/AATT/ATTT/TTTT (Motif AT4), the same motifs Mrazek et al. used. The histogram was normalised in three ways: Firstly, the counts C(s) were converted to odds ratios R(s) = C(s)/E(s) using expected counts E = n(s) p 2 based on the probability p of a motif in a specific place in a shuffled sequence with an average AT ratio f A+T. n(s) = L s + 1 (with L being the length of the analysed sequence) is a correction factor to account for the incomplete arrays in the last 100 bp of the analysed sequence. The p values were computed as p = f A+T (Motive AT), p = 1/2 fa+t 2 (Motive A2T2) and p = 5 [1/2 f A+T ] 4 (Motive AT4). Secondly, the 3 bp periodicity was removed by averaging 3 bp wide windows (P (s) = (P (s 1) + P (s) + P (s + 1))/3). Thirdly, the histogram was fitted using a parabolic function and the linear and quadratic terms were subtracted from R resulting in R. This eliminates bias induced by varying AT content. 6 2 METHODS R * s [bp] Figure 3: Normalised histogram R of the whole chromosome of Mycobacterium tuberculosis using the motif AT4. R (s) is the corrected odds ratio for another motif in distance s from each occurrence of the motif. From this histogram R a spectrum S can be obtained by using discrete Fourier transformation. To avoid using periodicities caused by sequences coding α helices, the first 35 bp were omitted. The spectrum S was normalised to an average of 1 over the relevant range of 5 to 20. The normalised spectrum is referred to as S Normalised signal intensity Periodicity [bp] Figure 4: Normalised spectrum S of the tetranucleotide motif (Motif C) tracks in the whole Mycobacterium tuberculosis chromosome. Since the signal is relatively weak only sequences of above 1000 bp can be analysed and thus it is not applicable to analyse the sequence of a single gene or upstream region. Instead the chromosome was divided into overlapping bp windows and for each partition S was computed independently. 7 2 METHODS Figure 5: Periodicity Scan of the Mycobacterium tuberculosis chromosome using the tetranucleotide motif AT4. The normalised strength of the signal Q (P ) is represented by the brightness of the colour. Black areas correspond to an aperiodic signal strength Q (P ) below 1.5 and white areas to a strongly periodic signal strength Q (P ) above 3. The 3 bp periodicity is masked by averaging 3 bp windows. Subsequently the strongest genome wide periodicity with a period above 10 bp and below 12 bp was determined and the maximum signal S max within 0.5 bp of this period in all windows containing parts of the gene was assigned to each gene. 2.3 AT Content The varying AT content in DNA supercoiling regulated genes in cyanobacteria was found in the upstream region and in the coding region close to the translation start side. To designate a translation start side to each gene the operons defined in section 2.1 were used. The average AT content of the 4000 bp long window centred on the translation start side was computed for both groups of genes and compared to the average AT content of all genes comparable with the method employed to assess the AT content in cyanobacteria[4]. The AT content of two additional windows was computed, in the 500 bp of the upstream and downstream region adjacent to the translation start side respectively. 2.4 Differential equations To characterise the temporal properties of biological systems, ordinary differential equations (ODEs) are a frequently employed tool[34]. Each of the n dependent variables x i of the system, usually the concentrations of molecules, is described by a differential equation of the independent variable t (time): dx i = f i(x 1,..., x n, t) The whole system written in vector annotation where x = (x 1,..., x n ) T and f = (f 1,..., f n ) T is dx = f(x, t) 8 2 METHODS An ODE system can have so called steady states (x SS ), were all dxi = 0. If a system attains a steady state, the ODE system will remain there indefinitely unless perturbed. Steady states can be characterised by their behaviour after small perturbations: 1. stable steady states attract close trajectories. 2. unstable steady states repel close trajectories 3. metastable steady states do neither Then close trajectories converge to the stable steady state for t, the steady state is called asymptotically stable. A linear approximation of the ODE system at the st
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks