Core Electron Binding Energy (CEBE) as Descriptors in Quantitative Structure–Activity Relationship (QSAR) Analysis of Cytotoxicities of a Series of Simple Phenols

Description
Core Electron Binding Energies (CEBEs) of ring carbon atoms in 4-X-phenols were calculated using density-functional theory with the scheme ΔEKS (PW86-PW91)/TZP//HF/6-31G*. The phenols show toxicity to fast growing cells. Using CEBEs of four

Please download to get full document.

View again

of 7
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information
Category:

Databases

Publish on:

Views: 0 | Pages: 7

Extension: PDF | Download: 0

Share
Tags
Transcript
  Core Electron Binding Energy (CEBE) as Descriptors inQuantitative Structure–Activity Relationship (QSAR) Analysis of Cytotoxicities of a Series of Simple Phenols Yuji Takahata a *, Masmoto Arakawa b , Kimito Funatsu b , Maria Cristina Andreazza Costa a and Maximiliano Segala a a Department of Chemistry, State University of Campinas, CP 6154, Campinas, Sa ˜ o Paulo, 13084-862, Brazil,E-mail: taka@iqm.unicamp.br b Department of Chemical Engineering, Faculty of Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Japan Keywords:  CEBE, Cytotoxicity, DFT, PCA, Phenols, PLS, QSARReceived: January 24, 2006; Accepted: July 25, 2006DOI: 10.1002/qsar.200630007 Abstract Core Electron Binding Energies (CEBEs) of ring carbon atoms in 4-  X  -phenols werecalculated using density-functional theory with the scheme  D E  KS  (PW86-PW91)/TZP//HF/6-31G*. The phenols show toxicity to fast growing cells. Using CEBEs of four distin-guished carbon atoms in the phenyl ring and oxygen atom in  OH of the phenols, thecompounds were well separated and grouped by Principal Component Analysis (PCA).Using three out of the five CEBEs, together with  s þ and log  P   of the phenols asdescriptor, we established a QSAR model with Partial Least Squares (PLS) regressionwhich resulted in  Q 2 ¼ 0.914. CEBE values when used together with traditionally useddescriptors such as log  P   and  s þ turned out to be useful descriptors in modeling theactivity (QSAR) of the compounds with PLS. 1 Introduction In SAR/QSAR studies, many descriptors have been sug-gested and employed [1–4]. Many of them are successfuland well accepted. Some descriptors that can be calculatedby quantum mechanical methods have been recognized asuseful in QSAR [5–13]. However, there still remainsroom to search for alternative and/or better descriptorsthan those in use, especially those descriptors that can beevaluated theoretically.X-ray Photoelectron Spectroscopy (XPS) enables one tomeasure binding energies of inner shell electrons (CoreElectron Binding Energies, CEBEs). It is also known asElectron Spectroscopy for Chemical Analysis (ESCA)[14]. CEBE value of an atom in a molecule is intimatelyrelated to its chemical environment. Hence, CEBE of anatom has been known to be related to properties such aspartial atomic charge, Hammett substituent ( s ) constants[15, 16], electronegativity, electrostatic energy, polarizabil-ity, proton affinities [17], chemical equilibrium, and reac-tivity parameters among others. Therefore, we can expectthat CEBE may also be related with some biological activ-ity of molecules. Recently, a technique to calculate accu-rate CEBEs in molecules that contain atoms of the firstand second row of the periodic table have been established[18]. It uses Density-Functional Theory (DFT). AverageAbsolute Deviation (AAD) attained by the DFT methodis less than 0.2 eV.Linderberg  et al  . [16] derived Eq. (1) in which CEBEshifts ( D CEBE) correlate linearly with Hammett substitu-ent ( s ) constants in substituted benzene D CEBE ffi k s  (1)where  k  is a constant.  D CEBE is defined as the differencebetween CEBE of an atom in a substituted benzene minusCEBE of the atom in unsubstituted benzene. Recently, weexamined validity of Eq. (1) using a large amount of theo-retically calculated  D CEBEs [19–21]. It was shown that D CEBE ¼ 1.17 s  0.17, where  D CEBE was expressed inunits of eV [21]. Using some selected  D CEBEs of carbonatoms in two phenyl rings of neolignans, the compoundswere well separated into groups by Principal ComponentAnalysis (PCA) [22–23]. If Eq. (1) is valid, CEBE itself also correlates linearly with  s . Selassie  et al  . [24] establish-ed QSAR models for the cytotoxicities of a series of 4-, 3-,and 2-  X  -phenols to fast growing cells using four descrip-tors;  s þ , the Brown variant of the Hammett electronic pa-rameter, log  P  , LUMO-HOMO gap (L-H gap), and BondDissociation Energies (BDE).Recently [25],  D CEBEs of carbon atoms in the phenylring of some selected phenols were calculated with the378   2007 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim QSAR Comb. Sci.  26 ,  2007 , No.3, 378–384 Full Papers  semiempirical HAM/3 method [26–28], and they werecorrelated with their cytotoxities using PCA. The object of the present work is to extend and deepen the previous in-vestigation further on the phenols, this time using muchmore accurate CEBEs calculated with a nonempiricalmethod, using different series of molecules from previousones, employing different QSAR methods. We investigaterelations between the CEBEs and the four descriptors:  s þ ,log  P  , and L-H gap, and BDE employed by Selassie  et al. [24]. We used only 4-  X  -phenols so that the type of mole-cules treated is uniform. This will eliminate a complex ef-fect that may be caused by employing nonuniform types of molecules. We want to know the applicability of CEBEs asdescriptors in QSAR comparing them with the well-estab-lished descriptors of different types. 2 Method of Calculation Consider a core ionization process M  ! M  þ þ e  (2)where  M   is a neutral 4-  X  -phenol and  M  þ is a core ionizedspecies, a 1s core electron removed from a carbon or oxy-gen atom of   M  . CEBE of the process in Eq. (2) can be cal-culated as the difference,  D E  , between  E  ( M  þ ), the totalenergy of the cation  M  þ and  E  ( M  ), the total energy of theneutral molecule  M  , as given in Eq. (3) D E  ¼ E  ( M  þ )  E  ( M  ) ¼ CEBE (3)The total energy ( E  KS ) calculated with DFT includes thecorrelation energy, and is called Kohn-Sham (KS) total en-ergy. If DFT provides the same degree of accuracy to thetotal energies,  E  KS ( M  þ ) and  E  KS ( M  ), one can expect to ob-tain accurate  D E  KS , which is CEBE, using Eq. (3). TheCEBEs of four distinguished ring carbons and the oxygenin  OH of substituted phenols (Figure 1) were calculatedusing DFT with the scheme called  D E  KS  (PW86-PW91)/TZP//HF/6-31G*, which is a shorthand notation of themethod of calculation [18].  E  KS ( M  þ ) and  E  KS ( M  ) were cal-culated by DFT, using a Triple-Zeta Polarized (TZP) basisset. Type of the basis set used is Slater-Type Orbital(STO). The TZP basis consists of a core orbital of double-zeta, a valence orbital of triple-zeta, and one polarization func-tion. The functional combinationis the Perdew–Wang 1986 ex-change functional [29] and thePerdew–Wang 1991 correlationfunctional [30]. The geometry of molecules was optimized by  ab ini-tio  HF/6-31G*. The AmsterdamDensity Functional (ADF) pack-age [31] was used for calculationof CEBEs. PCA is an useful explorative tool, which mapssamples through scores and individual variables by theloadings in a new vector space defined by the principalcomponents. Score plots allow sample identification,checking if they are similar or dissimilar, typical, or outli-ers. Also, it provides information about their grouping.We use Partial Least Squares (PLS) regression as ourmodeling method. Genetic Algorithm (GA) combinedwith PLS is used to select the best possible variables. Thismethod of selection is denominated as the GAPLS meth-od. Chemish package [32] was used for GAPLS and PLS. 3 Results and Discussions Table 1 lists calculated CEBEs of 28 4-  X  -phenols. In thecase of unsubstituted phenol, there are observed CEBEscorresponding to C1 (292.0 eV), C2 (290.2 eV), C3(290.6 eV), C4 (290.2 eV), and oxygen (538.9 eV) that canbe compared with the calculated CEBEs; C1 (292.1 eV),C2 (290.4 eV), C3 (290.5 eV), C4 (290.2 eV), and oxygen(539.2 eV), respectively. Agreement between theory andexperiment is satisfactory. There is no observed CEBEsavailable in the literature to compare with the calculatedones for most of the remaining phenols listed in the table.Since the present method of calculation,  D E  KS  (PW86-PW91)/TZP//HF/6-31G*, attained AAD of 0.16 eV for 59cases of small molecules [18], we expect similar accuracyfor the CEBEs listed in Table 1.Table 2 lists correlation coefficients among a total of nine variables, the five CEBEs listed in Table 1 plus thefour descriptors;  s þ , log  P  , L-H gap and BDE employedby Selassie  et al.  [24]. Correlation between the observedactivity, log 1/ C  , and the nine variables are also listed. C1,C2, C3, C4, and oxygen in the table represent CEBE val-ues at the respective atom in 4-  X  -phenols (Figure 1). Veryhigh correlation among C1, C2, C3, and oxygen can beseen. However, correlation between C4 and the remainingatoms is low. C4 is the position where substituent    X  binds. A 4-  X   substitution causes shifts of CEBE value in aconcerted manner in all atoms of the molecule except C4.High correlation can be observed among variables C1, C2,C3, oxygen,  s þ , and BDE. They belong to the so-calledelectronic variables. Correlation between these electronicvariables and activity, log 1/ C  , is fairly high, greater than0.7 in absolute scale. Log  P   correlates less with all othervariables. L-H gap also correlates less with most of the var-iables.Figure 2 is a score plot of PCA analysis for the 28 4-  X  -phenols in Table 1 using only CEBE values at five distin-guished atoms, C1, C2, C3, C4, and oxygen. Abscissa is thefirst principal component and ordinate is the second com-ponent. The number accompanying the black dot corre-sponds to the compound number defined in Table 1. Thephenols can be grouped into four categories: A, B, C, andD. Group A consists mainly of 4-  X  -phenols, where  X   is QSAR Comb. Sci.  26 ,  2007 , No.3, 378–384  www.qcs.wiley-vch.de   2007 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim  379 Figure 1.  Numberingsystem of 4-  X  -phenol.Core Electron Binding Energy as Descriptors in QSAR  electron-donating such as OR (OCH 3 , OC 2 H 5 ,  etc .) andNH 2 . The group A phenols are all of High Activity (HA),except the compound 28 (  X  ¼ NHCOCH 3 ). The group Bconsists of 4-  X  -phenols, where  X  ¼ R  (C 5 H 11 , C 4 H 9 , C 3 H 7 , etc. ). Activity of the group B phenols can be characterizedas Medium Activity (MA). The group C consists of 4-  X  -phenols, where  X   is a halogen. They are of medium activi-ty. The group D consists mainly of 4-  X  -phenols, where  X   iselectron withdrawing such as NO 2  and CN. Those phenolsin group D are of Low Activity (LA). The score plot (Fig-ure 2) of PCA obtained using only CEBE values of theatoms in the molecules shows clear groupings of the mole-cules according to their types of substituent and their bio-logical activities. The percent variance explained by thefirst component for PCAwas 86.8%, while the correspond-ing value by the second component was 12.7%, cumulativepercent variance being 99.4%. This is a demonstration thatCEBEs can be used as convenient variables for PCA anal-ysis.We classify total of the nine variables into three catego-ries: the first category consists of the five CEBEs, C1, C2,C3, C4, and oxygen listed in Table 1; the second consists of 380   2007 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim  www.qcs.wiley-vch.de  QSAR Comb. Sci.  26 ,  2007 , No.3, 378–384 Table1.  Calculated CEBEs, in eV, of the 4-  X   phenols. Observed and calculated log 1/ C   values and errors are also listed.Substituent C1 C2 C3 C4 Oxygen Observed log 1/ C   Calculated log 1/ C   Error1 H 292.07 290.42 290.48 290.18 539.19 3.27 3.05   0.222 4-OCH 3  291.66 290.31 290.15 291.60 538.96 4.48 4.63 0.153 4-OC 2 H 5  291.59 290.25 290.08 291.51 538.92 4.64 4.77 0.134 4-OC 3 H 7  291.58 290.20 290.12 291.48 538.91 4.85 4.86 0.015 4-OC 4 H 9  291.46 290.13 289.91 291.40 538.83 5.2 5.05   0.156 4-OC 6 H 13  291.54 290.20 290.00 291.44 538.87 5.5 5.48   0.027 4-OC 6 H 5  291.64 290.28 290.12 291.53 538.96 4.97 4.9   0.078 4-CH 3  291.82 290.27 290.21 290.23 539.04 3.85 3.69   0.169 4-C 2 H 5  291.70 290.30 290.11 289.93 538.90 3.86 3.98 0.1210 4-C 3 H 7  291.75 290.20 290.12 290.05 539.00 4.04 3.96   0.0811 4-C 4 H 9  291.70 290.16 290.07 289.97 538.97 4.33 4.16   0.1712 4-C 5 H 11  291.68 290.13 290.03 289.90 538.96 4.47 4.29   0.1813 4-C 7 H 15  291.69 290.15 290.06 290.00 538.96 4.49 4.61 0.1214 4-C 8 H 17  291.69 290.15 290.06 290.01 538.96 4.62 4.79 0.1715 4-C 9 H 19  291.69 290.15 290.05 290.01 538.96 4.75 4.94 0.1916 4-C(CH 3 ) 3  291.71 290.16 290.07 289.94 538.98 4.09 4.03   0.0617 4-CONH 2  292.29 290.65 290.75 290.55 539.44 2.48 2.79 0.3118 4-NO 2  292.87 291.17 291.26 291.83 539.90 3.45 3.33   0.1219 4-I 292.15 290.61 290.63 290.94 539.34 3.86 4.04 0.1820 4-SO 2 NH 2  292.52 290.83 290.90 290.90 539.62 2.5 2.54 0.0421 4-CHO 292.55 290.86 290.93 290.66 539.65 3.08 3.17 0.0922 4-F 292.24 290.75 290.78 292.50 539.36 3.83 4.01 0.1823 4-NH 2  291.34 290.10 289.95 291.23 538.74 5.09 4.9   0.224 4-OH 291.83 290.44 290.32 291.83 539.07 4.59 4.61 0.0225 4-Cl 292.21 290.69 290.72 291.62 539.37 4.29 3.98   0.3126 4-Br 292.20 290.67 290.70 291.34 539.38 4.2 3.98   0.2227 4-CN 292.68 291.05 291.14 291.59 539.76 3.44 3.37   0.0728 4-NHCOCH 3  291.74 290.28 290.08 291.37 539.05 3.73 4.01 0.28 Table2.  Correlation matrix. C1, C2, C3, C4, and Oxygen represent CEBE values at the respective atoms.C2 C2 C3 C4 C5 C6 Oxygen  s þ log  P   L-H gap BDE log 1/ C  C1 1 0.968 0.984 0.245 0.984 0.960 0.995 0.923   0.353   0.442 0.881   0.780C2 0.968 1 0.989 0.455 0.989 0.986 0.974 0.818   0.456   0.256 0.750   0.679C3 0.984 0.989 1 0.371 1.000 0.983 0.984 0.862   0.426   0.334 0.800   0.726C4 0.245 0.455 0.371 1 0.369 0.486 0.293   0.077   0.470 0.540   0.203 0.130C5 0.984 0.989 1.000 0.369 1 0.982 0.984 0.863   0.426   0.335 0.801   0.727C6 0.960 0.986 0.983 0.486 0.982 1 0.974 0.801   0.444   0.229 0.726   0.649Ox 0.995 0.974 0.984 0.293 0.984 0.974 1 0.904   0.370   0.383 0.854   0.752 s þ 0.923 0.818 0.862   0.077 0.863 0.801 0.904 1   0.093   0.656 0.983   0.796log  P    0.353   0.456   0.426   0.470   0.426   0.444   0.370   0.093 1   0.331   0.041 0.507L-H gap   0.442   0.256   0.334 0.540   0.335   0.229   0.383   0.656   0.331 1   0.740 0.478BDE 0.881 0.750 0.800   0.203 0.801 0.726 0.854 0.983   0.041   0.740 1   0.803log 1/ C    0.780   0.679   0.726 0.130   0.727   0.649   0.752   0.796 0.507 0.478   0.803 1 Full Papers  Yuji Takahata et al.  the four variables of the literature [24],  s þ , log  P  , L-Hgap, and BDE; the third consists of the sum of the first twocategories totaling altogether the nine variables. We ap-plied GAPLS to each of the three categories to select thebest variables to model each of the datasets. Data werepreprocessed by mean-centering, variance-scaling, and re-scaling. When GAPLS was applied to the first category of the variables, the highest predictive explained variance, Q 2 , was 0.644 with three variables (C1, C3, and C4) andtwo components were selected. When GAPLS was appliedto the second category of the variables,  Q 2 was 0.823 withtwo variables (log  P   and BDE) and two components wereselected. Lastly, we applied GAPLS to the third categoryto select the optimum variables out of the nine. The maxi-mum number of components was limited to five, which is areasonable number considering the total number of sam-ples being 28. Table 3 lists five candidates selected byGAPLS. The first candidate resulted  Q 2 ¼ 0.914, which issuperior to  Q 2 ¼ 0.823 obtained by the second category.The first candidate employs five variables and five compo-nents. The five variables selected are C1, C2, oxygen,  s þ ,and log  P  . The  Q 2 values of the candidates 2 and 3 are0.905 and 0.895, respectively. These values are close to0.914 of the first candidate. We used the five selected vari-ables for PLS calculation. Figure 3 plots  R 2 and  Q 2 valuesas ordinates against the number of components as abscissa.The  R 2 and  Q 2 values increase sharply till the number of components reach 3, from where the increase becomesgradual reacting the maximum when the number of com-ponents is 5. At the maximum,  R 2 ¼ 0.950 and  Q 2 ¼ 0.914. QSAR Comb. Sci.  26 ,  2007 , No.3, 378–384  www.qcs.wiley-vch.de   2007 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim  381 Figure 2.  Score plot, t1(PC1)  vs.  t2(PC2), of the 4-  X  -phenols using CEBEs of C1, C2, C3, C4, and oxygen atoms. Table3.  Results of GAPLS with the maximum number of components limited to 5.Candidate C1 C2 C3 C4 Ox  s þ log  P   L-H gap BDE  Q 2 Number of variables Optimum number of components1 1 1 0 0 1 1 1 0 0 0.914 5 52 0 1 0 0 0 1 1 0 1 0.905 4 33 0 1 0 0 0 1 1 0 0 0.895 3 34 0 0 0 0 0 0 1 0 1 0.823 2 25 0 0 0 0 0 0 0 0 1 0.579 1 1Core Electron Binding Energy as Descriptors in QSAR  Figure 4 shows a plot of the observed log 1/ C   value asordinate against the score along the first principal compo-nent as abscissa. Almost all high-active compounds are lo-cated on the positive side of abscissa, whereas low-activephenols are grouped on the negative side of the abscissa.Calculated log 1/ C   values of the 28 phenols obtained bythe PLS calculation are listed in Table 1 along side to thoseof observed ones. Errors of the calculated values are listedin the last column of the table. Figure 5 plots calculatedlog 1/ C   ( Y  calc ) as ordinate and observed log 1/ C   ( Y  obs ) asabscissa. Good correlation ( R 2 ¼ 0.950) between calculatedand observed values can be seen. These results demon-strate that CEBE values at C1, C2, and oxygen in  OH of phenols, together with  s þ and log  P  , can serve as useful de-scriptors to model their activity by PLS. Figure 6 is a plotof C1 CEBEs, in eV, of 15 selected 4-  X  -phenols corre-sponding to the compound numbers 1, 2, 8, 9, 17–27 (Ta-ble 1) as ordinate and Hammett substituent ( s p ) constantat  para  position as abscissa. The 15 4-  X  -phenols were se-lected because their reliable Hammett substituent ( s p )constants are available in the literature [33]. No Hammettsubstituent ( s p ) constants are available for most of the re-maining compounds. The carbon C1 is  para  with respect tothe substituent  X   (Figure 1). A linear least square fit of Figure 6 resulted a regression equation, CEBE ¼ 292.01 þ 1.01 s p  with  r  ¼ 0.97, standard deviation of 0.10. Slope of the fitted line is almost unity indicating a linear correlationbetween CEBE and  s p . The linear correlation betweenCEBE and  s  is expected from Eq. (1). More or less similarlinear correlation to this were obtained between C2 (andC3) CEBEs and  s  at  meta  (and  ortho ) positions, respec-tively. Hammett substituent ( s ) constant, as electronic var-iable, is one of the key descriptors used in QSAR [1–4].382   2007 WILEY-VCH Verlag GmbH&Co. KGaA, Weinheim  www.qcs.wiley-vch.de  QSAR Comb. Sci.  26 ,  2007 , No.3, 378–384 Figure 3.  R 2 and  Q 2 as a function of the number of compo-nents resulting from PLS. Figure 4.  Plot of observed log 1/ C vs.  t1(PC1) resulting from PLS. Full Papers  Yuji Takahata et al.
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks