Solubility

Solubility

Of all properties that determine a drug's ultimate in vivo ADME behaviour, solubility is one of the most important and deserves close attention in early discovery. Indeed, a drug's propensity to dissolve in aqueous media is key to its successful administration and absorption.

Why is Solubility Important?
Orally administered drugs need to be absorbed from the intestine and solubility plays a key role in this. Dose formulation will also be a problem for compounds with poor solubility making determination of phamacokinetics during in vivo studies difficult and unreliable. Currently, most HTS in vitro solubility assays are in 2-5% DMSO/buffer, which does not necessarily correlate well with aqueous solubility. Therefore, the ability to predict aqueous solubility is important for early identification of compounds that are less likely to pose future difficulties in formulation and administration.

Solubility Models
Asteris has two models that predict aqueous solubility - a logS model to predict intrinsic water solubility and a logS7.4 model to predict apparent solubility of charged compounds at pH 7.4.

LogS

The logS model predicts intrinsic water solubility, i.e. solubility for uncharged compounds in water. The model is based on more than 3300 aqueous solubility data points for intrinsic water solubility, S in uM, defined as the thermodynamic solubility of uncharged compound in water between 20-30°C. The data comes from the Syracuse database and it is noteworthy that most in silico models for prediction of intrinsic aqueous solubility are based on the same commercial database. The model was trained using 2650 compounds by means of a radial basis function technique. The model was then successfully validated on a test set of 663 compounds and observed and predicted values for this set are well correlated, with an R2 value of 0.82. Predictions for compounds within the chemical space of the model have a RMSE in prediction of 0.70 log units. Estimated logS values for compounds outside the chemical space have a RMSE of 1.03 log units.

Observed vs Predicted logS

logS at pH 7.4
At physiological pH many drug-like compounds exist in partially dissociated or ionised form. As part of the modelling process, a set of rules was defined to identify neutral or uncharged molecules at pH 7.4 for which logS7.4 is equal to logS. In these cases, the prediction is generated by the logS model described above. The logS at pH7.4 model is based on 322 charged drug-like compounds using a compilation of high quality solubility data measured in buffered solution at pH 7.4 (logS7.4 with S7.4 in uM) gathered from ChEMBLdb. Only those measurements that were determined between 25°C and 35°C were considered. The model was built by the automatic procedure implemented within the StarDrop Auto-Modeller using standard settings. The initial dataset was split into three subsets using cluster analysis at Tanimoto level 0.7. The model was trained on 226 compounds and evaluated on validation and test sets of 48 compounds each. The best model was produced by the Radial Basis Function technique coupled with a genetic algorithm for descriptor selection (GA-RBF). The model was tested on the combined validation and test sets with a R2 value of 0.74 and a RMSE of 0.61 log units.

Observed vs Predicted logS