Description

Institutionen för systemteknik Department of Electrical Engineering Examensarbete Benchmarking Global Optimization Algorithms for Core Prediction Identification Examensarbete utfört i Reglerteknik vid

Information

Category:
## BusinessLaw

Publish on:

Views: 10 | Pages: 49

Extension: PDF | Download: 0

Share

Transcript

Institutionen för systemteknik Department of Electrical Engineering Examensarbete Benchmarking Global Optimization Algorithms for Core Prediction Identification Examensarbete utfört i Reglerteknik vid Tekniska högskolan vid Linköpings universitet av Oscar Samuelsson LiTH-ISY-EX--10/4341--SE Linköping 2010 Department of Electrical Engineering Linköpings universitet SE Linköping, Sweden Linköpings tekniska högskola Linköpings universitet Linköping Benchmarking Global Optimization Algorithms for Core Prediction Identification Examensarbete utfört i Reglerteknik vid Tekniska högskolan i Linköping av Oscar Samuelsson LiTH-ISY-EX--10/4341--SE Handledare: Examinator: Gunnar Cedersund ike, Linköpings universitet Christian Lyzell isy, Linköpings universitet Martin Enqvist isy, Linköpings universitet Linköping, 27 August, 2010 Avdelning, Institution Division, Department Division of Automatic Control Department of Electrical Engineering Linköpings universitet SE Linköping, Sweden Datum Date Språk Language Svenska/Swedish Engelska/English Rapporttyp Report category Licentiatavhandling Examensarbete C-uppsats D-uppsats Övrig rapport ISBN ISRN LiTH-ISY-EX--10/4341--SE Serietitel och serienummer Title of series, numbering ISSN URL för elektronisk version Titel Title Utvärdering av globala optimeringsmetoder för core-prediktionsidentifiering Benchmarking Global Optimization Algorithms for Core Prediction Identification Författare Author Oscar Samuelsson Sammanfattning Abstract Nyckelord Keywords Mathematical modeling has evolved from being a rare event to becoming a standard approach for investigating complex biological interactions. However, variations and uncertainties in experimental data usually result in uncertain estimates of the parameters of the model. It is possible to draw conclusions from the model despite uncertain parameters by using core predictions. A core prediction is a model property which is valid for all parameter vectors that fit data at an acceptable cost. By validating the core prediction with additional experimental measurements one can draw conclusions about the overall model despite uncertain parameter values. A prerequisite for identifying a core prediction is a global search for all acceptable parameter vectors. Global optimization methods are normally constructed to search for a single optimal parameter vector, but methods searching for several acceptable parameter vectors are required here. In this thesis, two metaheuristic optimization algorithms have been evaluated, namely Simulated annealing and Scatter search. In order to compare their differences, a set of functions has been implemented in Matlab. The Matlab functions include a statistical framework which is used to discard poorly tuned optimization algorithms, five performance measures reflecting the different objectives of locating one or several acceptable parameter vectors, and a number of test functions meant to reflect high-dimensional, multimodal problems. In addition to the test functions, a biological benchmark model is included. The statistical framework has been used to evaluate the performance of the two algorithms with the objective of locating one and several acceptable parameter vectors. For the objective of locating one acceptable parameter vector, the results indicate that Scatter search performed better than Simulated Annealing. The results also indicate that different search objectives require differently tuned algorithms. Furthermore, the results show that test functions with a suitable degree of difficulty are not a trivial task to obtain. A verification of the tuned optimization algorithms has been conducted on the benchmark model. The results are somewhat contradicting and in this specific case, it is not possible to claim that good configurations on test functions remain good in real applications. Core prediction, Global optimization, Metaheuristic, Scatter search, Simulated annealing, Systems biology Abstract Mathematical modeling has evolved from being a rare event to becoming a standard approach for investigating complex biological interactions. However, variations and uncertainties in experimental data usually result in uncertain estimates of the parameters of the model. It is possible to draw conclusions from the model despite uncertain parameters by using core predictions. A core prediction is a model property which is valid for all parameter vectors that fit data at an acceptable cost. By validating the core prediction with additional experimental measurements one can draw conclusions about the overall model despite uncertain parameter values. A prerequisite for identifying a core prediction is a global search for all acceptable parameter vectors. Global optimization methods are normally constructed to search for a single optimal parameter vector, but methods searching for several acceptable parameter vectors are required here. In this thesis, two metaheuristic optimization algorithms have been evaluated, namely Simulated annealing and Scatter search. In order to compare their differences, a set of functions has been implemented in Matlab. The Matlab functions include a statistical framework which is used to discard poorly tuned optimization algorithms, five performance measures reflecting the different objectives of locating one or several acceptable parameter vectors, and a number of test functions meant to reflect high-dimensional, multimodal problems. In addition to the test functions, a biological benchmark model is included. The statistical framework has been used to evaluate the performance of the two algorithms with the objective of locating one and several acceptable parameter vectors. For the objective of locating one acceptable parameter vector, the results indicate that Scatter search performed better than Simulated Annealing. The results also indicate that different search objectives require differently tuned algorithms. Furthermore, the results show that test functions with a suitable degree of difficulty are not a trivial task to obtain. A verification of the tuned optimization algorithms has been conducted on the benchmark model. The results are somewhat contradicting and in this specific case, it is not possible to claim that good configurations on test functions remain good in real applications. v Sammanfattning Användningsområdet för matematisk modellering har utökats från att ha varit sällsynt inom den biologiska forskningsvärlden, till att bli en standardmetod för att undersöka komplexa biologiska interaktioner. Ett vanligt problem är att variationer och osäkerheter i experimentella data leder till osäkra värden i modellparametrarna. Genom att använda core-prediktioner (eng. core predictions) är det möjligt att dra slutsatser från modellen trots osäkra parametervärden. En core-prediktion är en modellegenskap som gäller för alla parametervektorer som kan anpassas till experimentell data till en acceptabel kostnad. Genom att validera core-prediktionen med ytterligare mätningar kan man dra slutsatser om hela modellen trots osäkerheter i parametervärdena. En förutsättning för att kunna identifiera en core-prediktion är en global optimering där man söker efter alla acceptabla parametervektorer. Globala optimeringsmetoder är vanligtvis konstruerade med syftet att hitta en optimal parametervektor men det krävs här alltså metoder som söker efter flera godtagbara parametervektorer. De två metaheuristiska optimeringsmetoderna Simulated annealing och Scatter search har utvärderats här. För att kunna jämföra metoderna har en uppsättning med funktioner implementerats i Matlab. Matlabfunktionerna inkluderar ett statistiskt ramverk som används för att förkasta dåligt inställda optimeringsmetoder, fem prestandamått som mäter de olika optimeringsmålen att hitta en, respektive flera, acceptabla parametervektorer samt en uppsättning testfunktioner som efterliknar flerdimensionella, multimodala problem. Utöver testfunktionerna är även en biologisk jämförelsemodell (eng. benchmark model) inkluderad bland matlabfunktionerna. Det statistiska ramverket har använts för att utvärdera hur bra de två optimeringsmetoderna kan lokalisera en respektive flera acceptabla parametervektorer. Resultaten visar att Scatter search var bättre än Simulated annealing för optimeringsmålet att hitta en acceptabel parametervektor. Resultaten visar även att beroende på vilket optimeringkriterium som används, så fungerar olika inställningar på optimeringsmetoderna olika bra. Det visade sig även att det var svårt att konstruera testproblem med lämplig svårighetsgrad. En verifiering av de inställda optimeringsmetoderna genomfördes på den biologiska jämförelsemodellen. Resultaten var motstridiga vilket gör att det i detta specifika fall inte går att säga att bra inställningar för optimeringsmetoderna för testfunktionerna är bra inställningar för verkliga problem. vi Acknowledgments I would like to thank my examiner Martin Enqvist for structuring the work and giving me the opportunity to do the work at the Division of Automatic Control. I also would like to thank Christian Lyzell for help with the report. I am most grateful to Gunnar Cedersund, who gave me the opportunity to do my thesis in an interesting research field, and for providing good input to the thesis. Thanks to Anton Höghäll for help with finishing a tidy code and being a valuable discussion partner. I would like to give special thanks to David Gomez-Cabrero, not just for all your practical help during the work, but also for giving inspiration when things did not work out like expected. Also thanks to José A. Egea for providing the Multistart code and SSm updates, Eva Enqvist for statistical aid concerning Friedman s test, and to Marielle for your kind support during the work. Last, I would like to show my gratitude to the staff at ISB group for asking the right questions and providing a positive working environment. vii Contents 1 Introduction Background Problem description Objectives Limitations Thesis outline Related work Terminology Theoretical Background Modeling systems biology Systems biology Modeling biology Modeling framework Parameter estimation and core predictions Metaheuristics Background Simulated annealing Scatter search Multistart local solver Experimental Setup Statistically tuning metaheuristics Analytical test functions Benchmark model Assigning configurations Simulated annealing Scatter search Multistart local solver Performance measures Software ix x Contents 4 Experiments Finding one parameter vector Discarding configurations using M Success rate Finding several parameter vectors Performance on benchmark model Conclusions and Future Work Conclusions Future work Bibliography 43 A Simulated Annealing, Options and Configurations 47 B Scatter Search, Options and Configurations 49 Chapter 1 Introduction Mathematical models have been used extensively in science and for industrial purposes during modern time. In the emerging field of systems biology, mathematical models are used to analyze data from complex biochemical interactions. The complex interactions between cells and molecules consist of multiple interacting variables which make it hard to determine cause-reaction links. In order to understand the behavior of such interactions, mathematical modeling has proven to be a useful analysis tool. For readers who are unfamiliar with the concepts of biological modeling it is recommended to start out by reading Section 2.1, Modeling systems biology, for an introduction. 1.1 Background Modeling biological systems involves some subject specific issues. The quality and amount of data gained from biological experiments are typically low compared to the size and complexity of the desired system to describe. The system usually has unknown areas which are only characterized to a limited extent. Dynamic experiments, in the sense of time-dependency, are not traditionally conducted. These drawbacks make it difficult to construct reliable models with well-characterized properties. A model based on uncertain data can be hard to use unless there is a way to handle the uncertainties. However, if there are methods for dealing with the uncertainties so that no false conclusions are made, models can be a useful research tool. The Diabetes and Integrative Systems Biology group at Linköping University conducts modeling and experiments on insulin signaling pathways. A pathway is the route of biochemical reactions by which, e.g., a signal is transferred throughout the cell. Insulin resistance in fat and muscle cells are pre-steps to the well-known disease type-2 diabetes [33]. The main goal of the modeling and experimenting is to gain knowledge about the mechanisms of diabetes. At Linköping University, one central part of the modeling work is to gain knowledge about the mechanisms of insulin signaling by identifying core predictions. A core prediction is a model 1 2 Introduction property which is valid for all parameter vectors that fit data at an acceptable cost. Core predictions is one possible way to draw conclusions from the model, despite uncertainties in the parameters of the model. However, the construction of such predictions is limited by the available tools. The work presented in this thesis aims at presenting tools which facilitates the construction of such core predictions. 1.2 Problem description As previously mentioned, modeling biological systems usually involves noisy and limited amount of data. These limitations are revealed during the parameter estimation. Normally, the parameters of a model are unknown but can be estimated by comparing the output from the model with experimental data. The procedure when different parameters are tested to obtain a good fit between the model and experimental data is referred to as the parameter estimation. If there are only a few data points (compared to the model size), the uncertainties in the estimated parameters will be large which results in a model with the inherited uncertainties. In the systems biology community, one common way to deal with parameter uncertainties is to conduct a local parameter sensitivity analysis [24]. The outcome of such analysis is poor since nonlinearities and other parameter vectors which fit data equally well are not considered. There are examples of strategies to handle the sensitivity analysis in a more global fashion [17]. However, there is no established standard for sensitivity analysis when it is done globally with all the non-linear model properties included. Further discussion of the topic can be found in the review [29]. As mentioned in Section 1.1, one possible way of handling parameter uncertainties is to use core predictions. A prerequisite for identifying a core prediction is a search for all acceptable parameter vectors. In this thesis, an acceptable parameter vector is defined as a vector of the parameters of the model which gives the model an acceptable fit to the experimental data, according to some statistical measure. In [5], a chi-square test was used as statistical measure to define an acceptable threshold. A more detailed explanation of the methodology of core predictions is given in Section In order to search for acceptable parameter vectors, different kinds of optimization algorithms, also known as metaheuristic methods, are used. Optimization algorithms are normally constructed to find the optimal solution of a problem. In this thesis, the optimal solution would be equal to the optimal parameter vector. Even though some recent publications show that there exist some methods which have been developed with the purpose of locating several suboptimal parameter vectors [23, 16], the classical optimization objective is still to find the global or local optimum. This is in agreement with the lack of benchmarks for locating several parameter vectors in the literature. Thus, there is a need for finding appropriate optimization algorithms for the objective of locating several acceptable parameter vectors. An issue when using optimization algorithms is the issue of how to select proper settings for the algorithms. The settings are algorithm specific and affect the 1.3 Objectives 3 behaviour of the algorithms. It is not obvious how to select proper settings and methods for selecting good settings are required in order to make the best use of the specific optimization algorithm. 1.3 Objectives The objective of this thesis is divided into three main parts. Part one involves finding suitable optimization algorithms for parameter estimation in biological state-space models. Part two consists of developing a collection of Matlab functions for testing and evaluating the aforementioned optimization algorithms. The collection of Matlab functions will be referred to as the workbench which should also include test problems. The last part consists of an evaluation of the optimization algorithms in Part one by using the workbench from Part two. The optimization algorithms should be able to handle the following problems. First, find one acceptable parameter vector P which describe the data to some defined cost δ. P : V (Z, P ) δ (1.1) Second, find some, ideally all, parameter vectors which describe the data equally well according to the defined cost δ. {P : V (Z, P ) δ} (1.2) The purpose of the workbench is twofold. First, the workbench will be used to reveal weaknesses and strengths of the chosen optimization algorithms. Second, the workbench will be a tool for benchmarking and developing future optimization algorithms. In order to make the best use of each optimization algorithm, the algorithms should be evaluated with different settings. Tools for selecting good settings for the algorithms should be included in the workbench. 1.4 Limitations Some areas in this thesis are restricted by the following limitations. Only one implementation of each optimization algorithm is evaluated. A limited number of settings for each optimization algorithm are evaluated. Only default settings are used in the local solver Dynamic hill climbing. The performance of the algorithms is evaluated on a model based on ordinary differential equations. Other types of biological models are not considered. 4 Introduction 1.5 Thesis outline This thesis is written at a level suitable for a biological engineering student, and for a system theoretician interested in how systems theory can be applied to the biological field. The thesis is organized as follows. Chapter 2 includes some background theory from the field of biological modeling, together with a description of the optimization algorithms used in this thesis. In Chapter 3, a presentation of the methods used in the workbench is given together with the experimental setup. Chapter 4 includes the results from the evaluation which was done as described in Chapter 3. The thesis is concluded in Chapter 5 with conclusions and some proposals for future work. 1.6 Related work The work presented in this thesis is part of a project at the Diabetes and Integrative Systems Biology group, which aims at evaluating and developing optimization methods for systems biology purposes. Large parts of the work in this thesis have been conducted in collaboration with Anton Höghäll. Anton also pursued some unsolved questions, more specifically the construction of suitable test functions and the objective of measuring the spread of search in real problems in [19]. David Gomez-Cabrero should receive credit for the original idea of the methodology for tuning metaheuristics. David has also actively been involved in the work of this thesis, especially in the work of constructing the test functions. 1.7 Terminology ATP - Adenosine triphosphate, energy carrying molecule. Benchmark problem - A representative optimization problem which can be used to compare the performance of dif

Related Search

Similar documents

We Need Your Support

Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks