R for Beginners. Emmanuel Paradis. Institut des Sciences de l Évolution Université Montpellier II F Montpellier cédex 05 France - PDF

Description
R fr Beginners Emmanuel Paradis Institut des Sciences de l Évlutin Université Mntpellier II F Mntpellier cédex 05 France I thank Julien Claude, Christphe Declercq,

Please download to get full document.

View again

of 20
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Study Guides, Notes, & Quizzes

Publish on:

Views: 24 | Pages: 20

Extension: PDF | Download: 0

Share
Transcript
R fr Beginners Emmanuel Paradis Institut des Sciences de l Évlutin Université Mntpellier II F Mntpellier cédex 05 France I thank Julien Claude, Christphe Declercq, Éldie Gazave, Friedrich Leisch, Luis Luangkesrn, Françis Pinard, and Mathieu Rs fr their cmments and suggestins n earlier versins f this dcument. I am als grateful t all the members f the R Develpment Cre Team fr their cnsiderable effrts in develping R and animating the discussin list rhelp. Thanks als t the R users whse questins r cmments helped me t write R fr Beginners. Special thanks t Jrge Ahumada fr the Spanish translatin. c 2002, 2005, Emmanuel Paradis (12th September 2005) Permissin is granted t make and distribute cpies, either in part r in full and in any language, f this dcument n any supprt prvided the abve cpyright ntice is included in all cpies. Permissin is granted t translate this dcument, either in part r in full, in any language prvided the abve cpyright ntice is included. Cntents 1 Preamble 1 2 A few cncepts befre starting Hw R wrks Creating, listing and deleting the bjects in memry The n-line help Data with R Objects Reading data in a file Saving data Generating data Regular sequences Randm sequences Manipulating bjects Creating bjects Cnverting bjects Operatrs Accessing the values f an bject: the indexing system Accessing the values f an bject with names The data editr Arithmetics and simple functins Matrix cmputatin Graphics with R Managing graphics Opening several graphical devices Partitining a graphic Graphical functins Lw-level pltting cmmands Graphical parameters A practical example The grid and lattice packages Statistical analyses with R A simple example f analysis f variance Frmulae Generic functins Packages 6 Prgramming with R in pratice Lps and vectrizatin Writing a prgram in R Writing yur wn functins Literature n R 71 1 Preamble The gal f the present dcument is t give a starting pint fr peple newly interested in R. I chse t emphasize n the understanding f hw R wrks, with the aim f a beginner, rather than expert, use. Given that the pssibilities ffered by R are vast, it is useful t a beginner t get sme ntins and cncepts in rder t prgress easily. I tried t simplify the explanatins as much as I culd t make them understandable by all, while giving useful details, smetimes with tables. R is a system fr statistical analyses and graphics created by Rss Ihaka and Rbert Gentleman 1. R is bth a sftware and a language cnsidered as a dialect f the S language created by the AT&T Bell Labratries. S is available as the sftware S-PLUS cmmercialized by Insightful 2. There are imprtant differences in the designs f R and f S: thse wh want t knw mre n this pint can read the paper by Ihaka & Gentleman (1996) r the R-FAQ 3, a cpy f which is als distributed with R. R is freely distributed under the terms f the GNU General Public Licence 4 ; its develpment and distributin are carried ut by several statisticians knwn as the R Develpment Cre Team. R is available in several frms: the surces (written mainly in C and sme rutines in Frtran), essentially fr Unix and Linux machines, r sme pre-cmpiled binaries fr Windws, Linux, and Macintsh. The files needed t install R, either frm the surces r frm the pre-cmpiled binaries, are distributed frm the internet site f the Cmprehensive R Archive Netwrk (CRAN) 5 where the instructins fr the installatin are als available. Regarding the distributins f Linux (Debian,... ), the binaries are generally available fr the mst recent versins; lk at the CRAN site if necessary. R has many functins fr statistical analyses and graphics; the latter are visualized immediately in their wn windw and can be saved in varius frmats (jpg, png, bmp, ps, pdf, emf, pictex, xfig; the available frmats may depend n the perating system). The results frm a statistical analysis are displayed n the screen, sme intermediate results (P-values, regressin cefficients, residuals,... ) can be saved, written in a file, r used in subsequent analyses. The R language allws the user, fr instance, t prgram lps t successively analyse several data sets. It is als pssible t cmbine in a single prgram different statistical functins t perfrm mre cmplex analyses. The 1 Ihaka R. & Gentleman R R: a language fr data analysis and graphics. Jurnal f Cmputatinal and Graphical Statistics 5: See fr mre infrmatin 3 4 Fr mre infrmatin: 5 1 R users may benefit frm a large number f prgrams written fr S and available n the internet 6, mst f these prgrams can be used directly with R. At first, R culd seem t cmplex fr a nn-specialist. This may nt be true actually. In fact, a prminent feature f R is its flexibility. Whereas a classical sftware displays immediately the results f an analysis, R stres these results in an bject, s that an analysis can be dne with n result displayed. The user may be surprised by this, but such a feature is very useful. Indeed, the user can extract nly the part f the results which is f interest. Fr example, if ne runs a series f 20 regressins and wants t cmpare the different regressin cefficients, R can display nly the estimated cefficients: thus the results may take a single line, whereas a classical sftware culd well pen 20 results windws. We will see ther examples illustrating the flexibility f a system such as R cmpared t traditinal sftwares. 6 Fr example: 2 2 A few cncepts befre starting Once R is installed n yur cmputer, the sftware is executed by launching the crrespnding executable. The prmpt, by default , indicates that R is waiting fr yur cmmands. Under Windws using the prgram Rgui.exe, sme cmmands (accessing the n-line help, pening files,... ) can be executed via the pull-dwn menus. At this stage, a new user is likely t wnder What d I d nw? It is indeed very useful t have a few ideas n hw R wrks when it is used fr the first time, and this is what we will see nw. We shall see first briefly hw R wrks. Then, I will describe the assign peratr which allws creating bjects, hw t manage bjects in memry, and finally hw t use the n-line help which is very useful when running R. 2.1 Hw R wrks The fact that R is a language may deter sme users wh think I can t prgram. This shuld nt be the case fr tw reasns. First, R is an interpreted language, nt a cmpiled ne, meaning that all cmmands typed n the keybard are directly executed withut requiring t build a cmplete prgram like in mst cmputer languages (C, Frtran, Pascal,... ). Secnd, R s syntax is very simple and intuitive. Fr instance, a linear regressin can be dne with the cmmand lm(y ~ x) which means fitting a linear mdel with y as respnse and x as predictr. In R, in rder t be executed, a functin always needs t be written with parentheses, even if there is nthing within them (e.g., ls()). If ne just types the name f a functin withut parentheses, R will display the cntent f the functin. In this dcument, the names f the functins are generally written with parentheses in rder t distinguish them frm ther bjects, unless the text indicates clearly s. When R is running, variables, data, functins, results, etc, are stred in the active memry f the cmputer in the frm f bjects which have a name. The user can d actins n these bjects with peratrs (arithmetic, lgical, cmparisn,... ) and functins (which are themselves bjects). The use f peratrs is relatively intuitive, we will see the details later (p. 25). An R functin may be sketched as fllws: arguments ptins functin default arguments = result The arguments can be bjects ( data, frmulae, expressins,... ), sme 3 f which culd be defined by default in the functin; these default values may be mdified by the user by specifying ptins. An R functin may require n argument: either all arguments are defined by default (and their values can be mdified with the ptins), r n argument has been defined in the functin. We will see later in mre details hw t use and build functins (p. 67). The present descriptin is sufficient fr the mment t understand hw R wrks. All the actins f R are dne n bjects stred in the active memry f the cmputer: n temprary files are used (Fig. 1). The readings and writings f files are used fr input and utput f data and results (graphics,... ). The user executes the functins via sme cmmands. The results are displayed directly n the screen, stred in an bject, r written n the disk (particularly fr graphics). Since the results are themselves bjects, they can be cnsidered as data and analysed as such. Data files can be read frm the lcal disk r frm a remte server thrugh internet. keybard muse cmmands functins and peratrs.../library/base/ /stast/ /graphics/... library f functins screen data bjects 3 results bjects PS JPEG... data files internet Active memry Hard disk Figure 1: A schematic view f hw R wrks. The functins available t the user are stred in a library lcalised n the disk in a directry called R HOME/library (R HOME is the directry where R is installed). This directry cntains packages f functins, which are themselves structured in directries. The package named base is in a way the cre f R and cntains the basic functins f the language, particularly, fr reading and manipulating data. Each package has a directry called R with a file named like the package (fr instance, fr the package base, this is the file R HOME/library/base/R/base). This file cntains all the functins f the package. One f the simplest cmmands is t type the name f an bject t display its cntent. Fr instance, if an bject n cntents the value 10: n [1] 10 4 The digit 1 within brackets indicates that the display starts at the first element f n. This cmmand is an implicit use f the functin print and the abve example is similar t print(n) (in sme situatins, the functin print must be used explicitly, such as within a functin r a lp). The name f an bject must start with a letter (A Z and a z) and can include letters, digits (0 9), dts (.), and underscres ( ). R discriminates between uppercase letters and lwercase nes in the names f the bjects, s that x and X can name tw distinct bjects (even under Windws). 2.2 Creating, listing and deleting the bjects in memry An bject can be created with the assign peratr which is written as an arrw with a minus sign and a bracket; this symbl can be riented left-t-right r the reverse: n - 15 n [1] 15 5 - n n [1] 5 x - 1 X - 10 x [1] 1 X [1] 10 If the bject already exists, its previus value is erased (the mdificatin affects nly the bjects in the active memry, nt the data n the disk). The value assigned this way may be the result f an peratin and/r a functin: n n [1] 12 n - 3 + rnrm(1) n [1] The functin rnrm(1) generates a nrmal randm variate with mean zer and variance unity (p. 17). Nte that yu can simply type an expressin withut assigning its value t an bject, the result is thus displayed n the screen but is nt stred in memry: (10 + 2) * 5 [1] 60 5 The assignment will be mitted in the examples if nt necessary fr understanding. The functin ls lists simply the bjects in memry: nly the names f the bjects are displayed. name - Carmen ; n1 - 10; n2 - 100; m - 0.5 ls() [1] m n1 n2 name Nte the use f the semi-cln t separate distinct cmmands n the same line. If we want t list nly the bjects which cntain a given character in their name, the ptin pattern (which can be abbreviated with pat) can be used: ls(pat = m ) [1] m name T restrict the list f bjects whse names start with this character: ls(pat = ^m ) [1] m The functin ls.str displays sme details n the bjects in memry: ls.str() m : num 0.5 n1 : num 10 n2 : num 100 name : chr Carmen The ptin pattern can be used in the same way as with ls. Anther useful ptin f ls.str is max.level which specifies the level f detail fr the display f cmpsite bjects. By default, ls.str displays the details f all bjects in memry, included the clumns f data frames, matrices and lists, which can result in a very lng display. We can avid t display all these details with the ptin max.level = -1: M - data.frame(n1, n2, m) ls.str(pat = M ) M : data.frame : 1 bs. f 3 variables: $ n1: num 10 $ n2: num 100 $ m : num 0.5 ls.str(pat= m , max.level=-1) M : data.frame : 1 bs. f 3 variables: T delete bjects in memry, we use the functin rm: rm(x) deletes the bject x, rm(x,y) deletes bth the bjects x et y, rm(list=ls()) deletes all the bjects in memry; the same ptins mentined fr the functin ls() can then be used t delete selectively sme bjects: rm(list=ls(pat= ^m )). 6 2.3 The n-line help The n-line help f R gives very useful infrmatin n hw t use the functins. Help is available directly fr a given functin, fr instance: ?lm will display, within R, the help page fr the functin lm() (linear mdel). The cmmands help(lm) and help( lm ) have the same effect. The last ne must be used t access help with nn-cnventinal characters: ?* Errr: syntax errr help( * ) Arithmetic package:base R Dcumentatin Arithmetic Operatrs... Calling help pens a page (this depends n the perating system) with general infrmatin n the first line such as the name f the package where is (are) the dcumented functin(s) r peratrs. Then cmes a title fllwed by sectins which give detailed infrmatin. Descriptin: brief descriptin. Usage: fr a functin, gives the name with all its arguments and the pssible ptins (with the crrespnding default values); fr an peratr gives the typical use. Arguments: fr a functin, details each f its arguments. Details: detailed descriptin. Value: if applicable, the type f bject returned by the functin r the peratr. See Als: ther help pages clse r similar t the present ne. Examples: sme examples which can generally be executed withut pening the help with the functin example. Fr beginners, it is gd t lk at the sectin Examples. Generally, it is useful t read carefully the sectin Arguments. Other sectins may be encuntered, such as Nte, References r Authr(s). By default, the functin help nly searches in the packages which are laded in memry. The ptin try.all.packages, which default is FALSE, allws t search in all packages if its value is TRUE: 7 help( bs ) N dcumentatin fr bs in specified packages and libraries: yu culd try help.search( bs ) help( bs , try.all.packages = TRUE) Help fr tpic bs is nt in any laded package but can be fund in the fllwing packages: Package splines Library /usr/lib/r/library Nte that in this case the help page f the functin bs is nt displayed. The user can display help pages frm a package nt laded in memry using the ptin package: help( bs , package = splines ) bs package:splines R Dcumentatin B-Spline Basis fr Plynmial Splines Descriptin:... Generate the B-spline basis matrix fr a plynmial spline. The help in html frmat (read, e.g., with Netscape) is called by typing: help.start() A search with keywrds is pssible with this html help. The sectin See Als has here hypertext links t ther functin help pages. The search with keywrds is als pssible in R with the functin help.search. The latter lks fr a specified tpic, given as a character string, in the help pages f all installed packages. Fr instance, help.search( tree ) will display a list f the functins which help pages mentin tree. Nte that if sme packages have been recently installed, it may be useful t refresh the database used by help.search using the ptin rebuild (e.g., help.search( tree , rebuild = TRUE)). The fnctin aprps finds all functins which name cntains the character string given as argument; nly the packages laded in memry are searched: aprps(help) [1] help .helpfrcall help.search [4] help.start 8 3 Data with R 3.1 Objects We have seen that R wrks with bjects which are, f curse, characterized by their names and their cntent, but als by attributes which specify the kind f data represented by an bject. In rder t understand the usefulness f these attributes, cnsider a variable that takes the value 1, 2, r 3: such a variable culd be an integer variable (fr instance, the number f eggs in a nest), r the cding f a categrical variable (fr instance, sex in sme ppulatins f crustaceans: male, female, r hermaphrdite). It is clear that the statistical analysis f this variable will nt be the same in bth cases: with R, the attributes f the bject give the necessary infrmatin. Mre technically, and mre generally, the actin f a functin n an bject depends n the attributes f the latter. All bjects have tw intrinsic attributes: mde and length. The mde is the basic type f the elements f the bject; there are fur main mdes: numeric, character, cmplex 7, and lgical (FALSE r TRUE). Other mdes exist but they d nt represent data, fr instance functin r expressin. The length is the number f elements f the bject. T display the mde and the length f an bject, ne can use the functins mde and length, respectively: x - 1 mde(x) [1] numeric length(x) [1] 1 A - Gmphtherium ; cmpar - TRUE; z - 1i mde(a); mde(cmpar); mde(z) [1] character [1] lgical [1] cmplex Whatever the mde, missing data are represented by NA (nt available). A very large numeric value can be specified with an expnential ntatin: N - 2.1e23 N [1] 2.1e+23 R crrectly represents nn-finite numeric values, such as ± with Inf and -Inf, r values which are nt numbers with NaN (nt a number). 7 The mde cmplex will nt be discussed in this dcument. 9 x - 5/0 x [1] Inf exp(x) [1] Inf exp(-x) [1] 0 x - x [1] NaN A value f mde character is input with duble qutes . It is pssible t include this latter character in the value if it fllws a backslash \. The tw charaters altgether \ will be treated in a specific way by sme functins such as cat fr display n screen, r write.table t write n the disk (p. 14, the ptin qmethd f this functin). x - Duble qutes \ delimitate R s strings. x [1] Duble qutes \ delimitate R s strings. cat(x) Duble qutes delimitate R s strings. Alternatively, variables f mde character can be delimited with single qutes ( ); in this case it is nt necessary t escape duble qutes with backslashes (but single qutes must be!): x - Duble qutes delimitate R\ s strings. x [1] Duble qutes \ delimitate R s strings. The fllwing table gives an verview f the type f bjects representing data. bject mdes several mdes pssible in the same bject? vectr numeric, character, cmplex r lgical N factr numeric r character N array numeric, character, cmplex r lgical N matrix numeric, character, cmplex r lgical N data frame numeric, character, cmplex r lgical Yes ts numeric, character, cmplex r lgical N list numeric, character, cmplex, lgical, Yes functin, expressin,... 10 A vectr is a variable in the cmmnly admitted meaning. A factr is a categrical variable. An array is a table with k dimensins, a matrix being a particular case f array with k = 2. Nte that the elements f an array r f a matrix are all f the same mde. A data frame is a table cmpsed with ne r several vectrs and/r factrs all f the same length but pssibly f different mdes. A ts is a time series data set and s cntains additinal attributes such as frequency and dates. Finally, a list can cntain any type f bject, included lists! Fr a vectr, its mde and length are sufficient t describe the data. Fr ther bjects, ther infrmatin is necessary and it is given by nn-intrinsic attributes. Amng these attributes, we can cite dim which crrespnds t the dimensins f an bject. Fr example, a matrix with 2 lines and 2 clumns has fr dim the pair f values [2, 2], but its length is Reading data in a file Fr reading and writing in files, R uses the wrking directry. T find this directry, the cmmand getwd() (get wrking directry) can be used, and the wrking directry can be changed with setwd( c:/data ) r setwd( /hme/- paradis/r ). It is necessary t give the path t a file if it is nt in the wrking directry. 8 R can read data stred in text (ASCII) files with the fllwing functins: read.table (which has several variants, see belw), scan and read.fwf. R can als read files in ther frmats (Excel, SAS, SPSS,... ), and access SQLtype databases, but the functins needed fr this are nt in the package ba
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks