Toxicity caused by para-substituted phenols on Tetrahymena pyriformis: The structure-activity relationships
Sorana D. Bolboacă
Financial support: The research supported by the UEFISCSU Romania through research grants ID_458 & ID_1051.
Keywords: para-substituted phenol derivatives, structure-activity relationships, Tetrahymena pyriformis, toxicity.
The toxicity of thirty para-substituted phenols on Tetrahymena pyriformis was modelled using an original methodology that uses the complex structural information of the compounds. Two models were built. The methodology allows atomic properties to be assigned to toxicity based on the selection of pairs of descriptors from the entire family, which is called Molecular Descriptors Family (MDF). One model has two independent structural descriptors and the other has four. The model with four descriptors proved to have high estimated and predictive abilities (over 97% of toxicity could be explained by structural information). The partial charge distribution by bonds (molecular topology) and space (molecular geometry) interaction proved to be related with the toxicity of para-substituted phenols on Tetrahymena pyriformis. The predictive ability of the model was tested by using the following methods: the cross-validation leave-one-out and the training versus test experiments. The comparisons among the models were performed using the correlated correlations method. The embedding of the complex information from the structure using MDF methodology can lead to further investigations of the mechanism of chemicals toxicity on Tetrahymena pyriformis.
The development of information and computing technologies have led to the development of structure-activity/property relationships (qSARs) methods with focus on informatics and modelling (Diudea et al. 2001). The qSARs methods are used for the quantitative characterization of the relationships between the structure of compounds and their activity or property in many fields such as: drug design (Duch et al. 2007; Prathipati et al. 2007), environmental sciences (Li and Xi, 2007; Knauer et al. 2007; Jager et al. 2007), biotechnology (Li et al. 2007), and all the fields of chemistry (Niu et al. 2007; Malík et al. 2007; Scotti et al. 2007; Lubbers et al. 2007).
The toxicity of para-substituted phenols on Tetrahymena Pyriformis (a non-pathogenic unicellular protozoan) was studied by many researchers. The toxicity has been analyzed by using the octanol/water partition coefficient (Schultz, 1987a), the hydrophobicity/ionization surface (Schultz, 1987b; Schultz et al. 1996), electrophilicity (Roy et al. 2006). Different approaches have been used: quantitative neighbourhoods of atoms (Lagunin et al. 2007), core electron binding energy (Takahata et al. 2007), quantum topological molecular similarity (Loader et al. 2007), neural networks (Ivanciuc, 1998) or back propagation artificial neural networks (Yang et al. 2006).
The main objective of the present study was to characterize the toxicity caused by para-substituted phenols on Tetrahymena pyriformis by using the molecular descriptors family on the structure-activity relationships approach. This approach proved its estimated and predictive abilities on different classes of chemical compounds, both on properties and activities (Jäntschi and Bolboacă, 2007).
A sample of thirty para-substituted phenols (HO-C6H4-R) was included into the study. The experimental toxicities on Tetrahymena pyriformis (Toxexp), expressed as the logarithm of the inverse of the IGC (inhibitory growth concentration) value in mmol/l, were taken from a previously reported research (Schultz, 1987b).
Step 1: The topological (2D) and geometrical (3D) model of investigated para-substituted phenols was obtained using the HyperChem software. The geometry of the compounds was optimized by applying the semi-empirical Extended Hückel model (Hoffmann, 1963) and the quantum mechanics model (Cornell et al. 1995). The output files were stored as *.hin files.
Step 2: The experimental data were collected and were stored into a *.txt file.
Step 3: It includes the construction, generation, calculation and filtration of the molecular descriptors family. The *.hin files, which contain information about the topology, geometry and charges distribution for each para-substituted phenol, represented the primary data file required to construct, generate, and calculate the molecular descriptors family. A set of five PHP programs generated the MDF for para-substituted phenols: ▪ 0_mdf_prepare.php creates the structure of tables for the investigated compounds; ▪ 1_mdf_generate.php generates the MDF of the para-substituted phenols and stores them into a table; ▪ 2_mdf_linearize.php applies the linearizing operator and stores valid records into tables; ▪ 3_mdf_bias.php sorts the descriptors by squared correlation coefficient and deletes identical entries; ▪ 4_mdf_order.php orders the descriptors from highest to lowest by the squared correlation coefficient again and creates a new table. The results are stored on a FreeBSD server from IntraNet [IP 172.27.211.5] using a MySQL database server.
Each molecular descriptor has a name consisting of seven-letters that describes the modality of its construction. The description of each possible character is presented in Table 1.
Step 4: It searches and identifies the most significant SAR models. The following criteria were used (Bolboacă and Jäntschi, 2007): the squared correlation coefficient (value closed to 1 indicates a good model), the standard error of estimated (value closed to 0 indicates a good model) and statistical parameters associated with the model (the Fisher parameter, which has a less than 5% probability of type I error, confidence intervals for the intercept and slope, standard error of intercept and slope, student parameter and its probability of type I error).
Step 5: The models were validated in order to characterize their estimated and predictive abilities. The leave-one-out cross-validation analysis (Baumann, 2003) was conducted (Leave-one-out Analysis, 2005). The obtained score (abbreviated as r2loo-cv), the standard error of predictive and the Fisher parameter were obtained and interpreted.
Step 6: The analysis of the models was performed by assessing the following: ▪ model stability (the model is considered more stable if the difference between the squared correlation coefficient and the cross-validation leave-one-out score is closer to 0) ▪ predictive ability of the model with the higher squared correlation coefficient was assessed in training and test experiments (Training vs. Test Experiment, 2005), ▪ comparison with previously reported models (where appropriate) through a correlated correlation analysis (Steiger, 1980). A difference between the squared correlation coefficient (r2) and the leave-one-out cross-validation score (r2loo-cv) lower than 0.3 indicates the absence of an over fitted model, irrelevant independent variables, and/or outliers (Bolboacă and Jäntschi, 2007). Moreover, in order to identify the outliers in the investigated compounds, the graphical representation methods were used (Bolboacă and Jäntschi, 2007).
Note that the MDF SAR approach uses a genetic algorithm for selection of descriptors from descriptor’s pool (Jäntschi et al. 2007).
Ŷ2v = -2.261 + 0.037·ASMmVQt - 0.216·lfDdOQg 
Ŷ4v = -3.295 + 0.035·ASMmVQt - 0.326·lfDdOQg + 0.079·InMrLQg - 0.346·LsDMpQg 
where: Ŷ2v = toxicity estimated by Eq(1); Ŷ4v = toxicity estimated by Eq(2); ASMmVQt, lfDdOQg, InMrLQg, and LsDMpQg = molecular descriptors.
The values of the experimental determinations (Toxexp), of the calculated descriptors and of the toxicity estimated by Eq(1) and Eq(2) are presented in Table 2.
The values of the squared correlation coefficients between each descriptor and the experimental toxicity (Toxexp) as well as between pairs of descriptors were as follows:
model with two descriptors - Eq(1):
SAR model with four descriptors - Eq(2):
The statistics associated with the models with two - Eq(1) - and four - Eq(2) molecular descriptors are presented in Table 3.
The graphical representation of the relation among the estimated toxicity of para-substituted phenols on Tetrahymena Pyriformis by Eq(1), Eq(2), neural network (Ivanciuc, 1998) and experimental toxicity (Schultz, 1987b) is presented in Figure 1.
The statistics on the similarity of the activity estimated by Eq(1) (Ŷ2v-Eq(1)) and by Eq(2) (Ŷ4v-Eq(1)) as well as the experimental toxicity (Toxexp) of para-substituted phenols are presented in Table 4. In Table 4 the best estimation values, expressed as the lowest value of the difference between experimental and estimated toxicity, are shaded in gray.
The validation results of the model with four descriptors in training versus test experiments (for the sample size that varied from 18 to 22 in training) are presented in Table 5.
The comparison between the SAR model with four descriptors and the previously reported MLR (Multiple Linear Regression, (Ivanciuc, 1998)) and Neural Network (NN, (Ivanciuc, 1998)) models is presented in Table 6.
The integration of the structural information obtained from the para-substituted phenol compounds allows the estimation and prediction of toxicity on Tetrahymena pyriformis. Two models proved to have good estimated and predictive abilities (one model with two (Eq(1)) and the other with four descriptors (Eq(2)).
The analysis of the results presented in Table 2 reveals the influence of the substituent on the toxicity of para-substituted phenols. Thus, the phenyl group determined a higher toxicity of para-substituted phenols (between 1.01237 for 4-hydroxybenzophenone - compound no. 21, Table 2, and 1.6547 for 4- hydroxybenzene - compound no. 23, Table 2). A high toxicity is also determined by the nitro group, as in the case of the 4-nitrophenol (1.4257, Table 2).
Both SAR models were statistically significant, the significance level being lower than 0.0001 (Table 3). In toxicity modelling, three descriptors refer to molecular geometry (lfDdOQg, InMrLQg and LsDMpQg) and one refers to molecular topology (ASMmVQt). All descriptors consider the partial electric change as the atomic property (ASMmVQt, lfDdOQg, InMrLQg, LsDMpQg).
The values of the correlation coefficient obtained by the model with two descriptors (r = 0.9472, Table 3) sustain the role of these two descriptors in the estimation of toxicity. Almost ninety percent of the toxicity variation of the studied para-substituted phenols can be explained by its linear relationship with the ASMmVQt and the lfDdOQg descriptors. The prediction ability of the model with two variables is sustained by the results obtained in leave-one-out cross-validation analysis: leave-one-out cross-validation score (r2loo-cv = 0.8745, Table 3), standard error of predicted (sloo = 0.2613, Table 3), Fisher parameter and associated significance (ppred = 7.58·10-13, Table 3). The analysis of the model with two variables showed that molecular descriptors are not able to provide individually relevant models (Eq(3)). Note also that there is no collinearity between the descriptors used by the model with two descriptors (r2(ASMmVQt, lfDdOQg) = 0.12152). The model with two variables reveals that the toxicity of the studied para-substituted phenols on Tetrahymena pyriformis is of geometrical and topological nature and it is also dependent on partial electric changes.
Both descriptors used by the model with two descriptors are found again in the model with four descriptors (Eq(2)). Ninety-seven percent of toxicity variation of the para-substituted phenols could be explained by its linear relationship with the molecular descriptors used by this model. The value of the multiple correlation coefficient (r = 0.9868, Table 3) supports the estimated ability of the SAR model. The predictive ability of the model with four descriptors is supported by the following: the value of the leave-one-out cross-validation score (r2loo-cv = 0.9650, Table 3), the type I error of the Fisher parameter (ppred = 1.50·10-21, Table 3), the standard error of predicted (sloo = 0.1429, Table 3) and the stability of the model (r2 - r2loo-cv = 0.0086, Table 3). No significant correlation was identified neither between the descriptor and the experimental toxicity nor between the pairs of descriptors (Eq(4)). The toxicity of the para-substituted phenols on Tetrahymena pyriformis is of geometrical and topological nature. It is also dependent on the partial electric charge of the compounds.
The analysis of the results presented in Table 4 indicates that the best proximity of the estimated and experimental toxicity was obtained by the SAR model with four variable (on twenty-one out of thirty compounds the estimated value was in the proximity of the experimental value), followed by the model with two variables (five compounds out of thirty obtained the best proximity) and the neural network (Ivanciuc, 1998) (four compounds out of thirty obtained the best proximity).
The predictive ability of the model with four descriptors was studied on training and test sets. With one exception, all investigated sample sizes obtained statistically significant models at a significance level of 1% (Table 5). The exception was observed in the experiment with twenty-one compounds in the training set and nine compounds in the test set. For this model the type I error was of 1.4·10-2 and 1.6·10-14, respectively. The average of the squared correlation coefficient obtained in training sets was almost identical with the average of the squared correlation coefficient in the test sets (0.971 vs. 0.972, Table 5). The dispersion of the correlation coefficients in both sets was low (see Table 5). The above mentioned results support the validity of the SAR model with four descriptors as well as its power of predicting the toxicity of para-substituted phenols. The molecular descriptors of a new para-substituted phenol could be calculated using the online DC Demo Calculator (DC Demo Calculator, 2005). Therefore the 2D and 3D structure of the compound has to be constructed using the HyperCem software. As result, the calculate values of the molecular descriptors are displayed. Moreover, the 2D and 3D structure of a new para-substituted phenol could be used in order to predict its activity (MDF SAR Predictor, 2005). The following steps must be followed: ▪ selecting the name of learning set (RRC443_ for the para-substituted phenols set); ▪ selecting the predictor equation (the model with two or four molecular descriptors); and ▪ browsing and submitting the *.hin file of the new compound proposed for investigation. Consequently, the equation used for prediction, the calculated values of the molecular descriptors family on the structure-activity relationships for the new compound as well as the activity predicted by the model are displayed.
The comparison between the SAR model with four descriptors and the previously reported models (Ivanciuc, 1998) (Table 6) showed that the probability of coincidence between the SAR model and the MLR model is of 1.14·10-2, while that between the SAR model and the NN model is of 4.51·10-2. It can be concluded that the correlation coefficient obtained by the SAR model with four descriptors is significantly higher compared with the correlation coefficients obtained by the previously reported models (Ivanciuc, 1998).
Many approaches have been developed in order to translate the chemical information of a compound into a useful numerical value (Todeschini and Consonni, 2000). The radial basis functions (Hemmer et al. 1999), GATEWAY (Consonni et al. 2002), 3-MoRSE electron diffraction (Todeschini and Consonni, 2000) and other descriptors represent similar approaches. These approaches are useful for further investigations if their application leads to significant statistical models. The difference between models in terms of structure-activity relationships could then be investigated using the correlated correlation analysis (Steiger, 1980).
The above-mentioned results support the estimated and predictive abilities of the SAR model with four descriptors to characterize the toxicity of para-substituted phenols on Tetrahymena pyriformis. In conclusion, the toxicity of the studied para-substituted phenols on Tetrahymena pyriformis is of both geometrical and topological nature and depends on the partial electric charges of the compounds. Furthermore, the application of the SAR method in the modelling of the para-substituted phenols toxicity on Tetrahymena pyriformis could be the first step in discovering and characterizing new compounds. Such further investigations could lead to the discovery of compounds with higher activity at lower costs.
BAUMANN, K. Cross-validation as the objective function for variable-selection techniques. Trends in Analytical Chemistry, 2003, vol. 22, no. 6, p. 395-406. [CrossRef]
BOLBOACĂ, S.D. and JÄNTSCHI, L. Modelling the property of compounds from structure: Statistical methods for models validation. Environmental Chemistry Letters, October 2007. [CrossRef]
CONSONNI, V.; TODESCHINI, R. and PAVAN, M. Structure/Response Correlations and Similarity/Diversity Analysis by GETAWAY Descriptors. 1. Theory of the Novel 3D Molecular Descriptors. Journal of Chemical Information and Computer Sciences, 2002, vol. 42, no. 3, p. 682-692. [CrossRef]
CORNELL, W.D.; CIEPLAK, P.; BAYLY C.I.; GOULD I.R.; MERZ, K.M. JR.; FERGUSON D.M.; SPELLMEYER D.C.; FOX, T.; CALDWELL J.M. and KOLLMAN, P.A. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. Journal of the American Chemical Society, 1995, vol. 117, p. 5179-5197. [CrossRef]
DC Demo Calculator [online]. ©2005, Virtual Library of Free Software [cited 20 November 2007]. Available from Internet: http://l.academicdirect.org/Chemistry/SARs/MDF_SARs/j_mdf_demo.php.
DUCH, W.; SWAMINATHAN, K. and MELLER, J. Artificial intelligence approaches for rational drug design and discovery. Current Pharmaceutical Design, 2007, vol. 13, no. 14, p. 1497-1508. [CrossRef]
HEMMER, M.C.; STEINHAUER, V. and GASTEIGER, J. Deriving the 3D structure of organic molecules from their infrared spectra. Vibrational Spectroscopy, 1999, vol. 19, p. 151-164. [CrossRef]
HOFFMANN, R. An extended Hückel theory. I. Hydrocarbons. Journal of Chemical Physics, 1963, vol. 39, p. 1397-1412. [CrossRef]
IVANCIUC, O. Artificial Neural Networks Applications. Part 4. Quantitative structure-activity relationships for the estimation of relative toxicity of phenols for Tetrahymena. Revue Roumaine de Chimie, 1998, vol. 43, no. 3, p. 255-260.
JAGER, T.; POSTHUMA, L.; de ZWART, D. and van de MEENT, D. Novel view on predicting acute toxicity: Decomposing toxicity data in species vulnerability and chemical potency. Ecotoxicology and Environmental Safety, 2007, vol. 67, no. 3, p. 311-322. [CrossRef]
JÄNTSCHI, L. and BOLBOACĂ, S. Results from the use of molecular descriptors family on structure property/activity relationships. International Journal of Molecular Sciences, 2007, vol. 8, no. 3, p. 189-203.
JÄNTSCHI, L.; BOLBOACĂ, S. and DIUDEA M.V. Chromatographic retention times of polychlorinated biphenyls: from structural information to property characterization. International Journal of Molecular Sciences, 2007, vol. 8, no. 11, p. 1125-1157.
KNAUER, K.; LAMPERT, C. and GONZALEZ-VALERO, J. Comparison of in vitro and in vivo acute fish toxicity in relation to toxicant mode of action. Chemosphere, 2007, vol. 68, no. 8, p. 1435-1441. [CrossRef]
LAGUNIN, A.A.; ZAKHAROV, A.V.; FILIMONOV, D.A. and POROIKOV, V.V. A new approach to QSAR modelling of acute toxicity. SAR and QSAR in Environmental Research, 2007, vol. 18, no. 3-4, p. 285-298. [CrossRef]
Leave-one-out Analysis [online]. ©2005, Virtual Library of Free Software [cited 20 July 2007]. Available from Internet: http://l.academicdirect.org/Chemistry/SARs/MDF_SARs/loo/
LI, Y. and XI, D.-l. Quantitative structure-activity relationship study on the biodegradation of acid dyestuffs. Journal of Environmental Sciences, 2007, vol. 19, no. 7, p. 800-804. [CrossRef]
LI, Z.R.; HAN, L.Y.; XUE, Y.; YAP, C.W.; LI, H.; JIANG, L. and CHEN, Y.Z. MODEL - Molecular descriptor lab: A web-based server for computing structural and physicochemical features of compounds. Biotechnology and Bioengineering, 2007, vol. 97, no. 2, p. 389-396. [CrossRef]
LOADER, R.J.; SINGH, N.; O'MALLEY, P.J. and POPELIER, P.L.A. The cytotoxicity of ortho alkyl substituted 4-X-phenols: A QSAR based on theoretical bond lengths and electron densities. Bioorganic and Medicinal Chemistry Letters, 2007, vol. 16, no. 5, p. 1249-1254. [CrossRef]
LUBBERS, S.; DECOURCELLE, N.; MARTINEZ, D.; GUICHARD, E. and TROMELIN, A. Effect of thickeners on aroma compound behavior in a model dairy gel. Journal of Agricultural and Food Chemistry, 2007, vol. 55, no. 12, p. 4835-4841. [CrossRef]
MALÍK, I.; SEDLÁROVÁ, E.; CSÖLLEI, J.; ANDRIAMAINTY, F. and ČIŽMÃIRIK, J. Relationship between physicochemical properties, lipophilicity parameters, and local anesthetic activity of dibasic esters of phenylcarbamic acid. Chemical Papers, 2007, vol. 61, no. 3, p. 206-213. [CrossRef]
MDF SAR Predictor [online]. ©2005, Virtual Library of Free Software [cited 20 July 2007]. Available from Internet: http://l.academicdirect.org/Chemistry/SARs/MDF_SARs/sar/.
NIU, B.; LU, W.-C.; YANG, S.-S.; CAI, Y.-D. and LI, G.-Z. Support vector machine for SAR/QSAR of phenethyl-amines. Acta Pharmacologica Sinica, 2007, vol. 28, no. 7, p. 1075-1086. [CrossRef]
PRATHIPATI, P.; DIXIT, A. and SAXENA, A.K. Computer-aided drug design: Integration of structure-based and ligand-based approaches in drug design. Current Computer-Aided Drug Design, 2007, vol. 3, no. 2, p. 133-148. [CrossRef]
ROY, D.R.; PARTHASARATHI, R.; SUBRAMANIAN, V. and CHATTARAJ, P.K. An electrophilicity based analysis of toxicity of aromatic compounds towards Tetrahymena pyriformis. QSAR and Combinatorial Science, 2006, vol. 25, no. 2, p. 114-122. [CrossRef]
SCHULTZ, T.W. Relative toxicity of para-substituted phenols: log KOW and pKa-dependent structure-activity relationships. Bulletin of Environment Contamination and Toxicology, 1987b, vol. 38, no. 6, p. 994-999. [CrossRef]
SCHULTZ, T.W.; BEARDEN A.P. and JAWORSKA, J.S. A novel QSAR approach for estimating toxicity of phenols. SAR and QSAR in Environmental Research, 1996, vol. 5, no. 2, p. 99-112. [CrossRef]
SCOTTI, L.; SCOTTI, M.T.; ISHIKI, H.M.; FERREIRA, M.J.P.; EMERENCIANO, V.P.; de S. MENEZES, C.M. and FERREIRA, E.I. Quantitative elucidation of the structure-bitterness relationship of cynaropicrin and grosheimin derivatives. Food Chemistry, 2007, vol. 105, no. 1, p. 77-83.
TAKAHATA, Y.; ARAKAWA, M.; FUNATSU, K.; COSTA, M.C.A. and SEGALA, M. Core Electron Binding Energy (CEBE) as descriptors in Quantitative Structure - Activity Relationship (QSAR) analysis of cytotoxicities of a series of simple phenols. QSAR and Combinatorial Science, 2007, vol. 26, no. 3, p. 378-384. [CrossRef]
TODESCHINI, R. and CONSONNI, V. Handbook of Molecular Descriptors. Wiley-UCH, Weinheim, 2000, 688 p. ISBN: 978-3527299133. [CrossRef]
Training vs. Test Experiment [online]. ©2005, Virtual Library of Free Software [cited 20 July 2007]. Available from Internet: http://l.academicdirect.org/Chemistry/SARs/MDF_SARs/qsar_qspr_s/.
YANG, L.; WANG, P.; JIANG, Y.-L. and XIA, B. QSAR for toxicities of phenols using improved genetic algorithm combined with BP artificial neural network. Journal of Harbin Institute of Technology, 2006, vol. 38, no. 2, p. 216-218.
Note: Electronic Journal of Biotechnology is not responsible if on-line references cited on manuscripts are not available any more after the date of publication.