The computational prediction of drug-disease interactions using the dual-network L2,1-CMF method


On average, it takes over a dozen years and approximately 1.8 billion dollars to develop a drug [1]. In addition, most drugs have strong side effects or undesirable effects on patients, so these drugs cannot be placed on the market. Therefore, many pharmaceutical companies resort to repositioning of existing drugs on the market [2]. Many known drugs can be found to have new effects for different diseases. In medicine, drug repurposing has two advantages. One advantage is that known drugs have already been approved by the US FDA (Food and Drug Administration) [3]. In other words, these drugs are safe to use. Another advantage is that the side effects of these drugs are known to medical scientists, so these side effects can be better controlled to achieve the desired therapeutic effect. Drug repurposing can help accelerate and facilitate the research and development process in the drug discovery pipeline [4].

The most important factor for drug repositioning is online biological databases. Many public databases, such as KEGG [5], STITCH [6], OMIM [7], DrugBank [8] and ChEMBL [9] store large amounts of information related to drugs and diseases. These databases contain detailed information such as a drug’s chemical structure, side effects, and genomic sequences [10].

In general, the goal of drug repositioning is to discover novel drug-disease interactions (DDIs) using existing drugs. Because a drug is often not specific for one disease, most drugs can treat a variety of diseases. Recently, more methods have been proposed for drug repositioning, such as machine learning [11], text mining [12], network analysis [13] and many other effective methods due to the increasing depth of research [14, 15]. Of course, we can also use the opposition-based learning particle swarm optimization to predict interactions, such as SNP-SNP interactions [16]. For instance, Gottlieb et al. proposed a computational method to discover potential drug indications by constructing drug-drug and disease-disease similarity classification features [17]. Then, the predicted score of the novel DDIs can be calculated by a logistic regression classifier. Napolitano et al. calculated drug similarities using combined drug datasets [18]. They proposed a multi-class SVM (Support Vector Machine) classifier to predict some novel DDIs. Moreover, some researchers use network-based models for drug repositioning. The advantage of this network model is that it can fully consider the large-scale generation of high-throughput data to build complex biological information interaction networks. Wang et al. proposed a method called TL-HGBI to infer novel treatments for diseases [19]. These authors constructed a heterogeneous network and integrated datasets about drugs, diseases and drug targets. Another network-based prioritization method called DrugNet was proposed by Martinez et al. [20]. This method can predict not only novel drugs but also novel treatments for diseases. Similar to the TL-HGBI method, the DrugNet method uses a heterogeneous network to predict novel DDIs using information about drugs, diseases, and targets. Luo et al. developed a computational method to predict novel interactions of known drugs [21]. Furthermore, comprehensive similarity measures and Bi-Random Walk (MBiRW) algorithm have been applied to this method. In addition, Luo et al. continued to propose a drug repositioning recommendation system (DRRS) to predict new DDIs by integrating data sources for drugs and diseases [14]. A heterogeneous drug-disease interaction network can be constructed by integrating drug-drug, disease-disease and drug-disease networks. Moreover, a large drug-disease adjacency matrix can replace the heterogeneous network, including drug pairs, disease pairs, known drug-disease pairs, and unknown drug-disease pairs. A fast and favourable algorithm SVT (Singular Value Thresholding) [22] has been used to complete predicted scores of the drug-disease adjacency matrix for unknown drug-disease pairs. According to previous studies, each method has its own advantages for predicting DDIs. However, after comparing the prediction of these methods, the best method is currently DRRS. The method achieves the highest AUC (area under curve) value and the best prediction [14]. Recently, matrix factorization methods have also been used to identify novel DDIs [23]. The matrix factorization method takes one input matrix and attempts to obtain two other matrices, and then the two matrices are multiplied to approximate the input matrix [23]. Similar to looking for missing interactions in the input matrix, matrix factorization can be used as a good technique to solve the prediction problem. Examples of such matrix factorization methods are the kernel Bayesian matrix factorization method (KBMF2K) [24] and the collaborative matrix factorization method (CMF) [25].

In this work, a simple yet effective matrix factorization model called the Dual-Network L2,1-CMF (Dual-network L2,1-collaborative matrix factorization) is proposed to predict new DDIs based on existing DDIs. However, there are many missing unknown interactions, so a pre-processing step is used to solve this problem. The main purpose of this pre-processing method is to attempt to weight K nearest known neighbours (WKNKN) [26]. Specifically, in the original matrix, WKNKN is used to describe whether there is an interaction between drug-disease pairs, bringing each element closer simply 0 and 1 to a reliable value than. Thus, WKNKN will have a positive impact on the final prediction. Furthermore, unlike the previous matrix factorization methods, L2,1-norm [2] and GIP (Gaussian interaction profile) kernels are added to the CMF method. Among them, L2,1-norm can avoid over-fitting and eliminate some unattached disease pairs [27]. The GIP kernels are used to calculate the drug similarity matrix and the disease similarity matrix [28]. Cross validation is used to evaluate our experimental results. The final experimental results show that after removing some of the interactions, our proposed method is superior to other methods. In addition, a simulation experiment is conducted to predict new interactions.

The results are described in Section 2, including the datasets used in our study and experimental results. The corresponding discussions are presented in Section 3. The conclusion is described in Section 4. Finally, Section 5 describes our proposed method, including specific solution steps and iterative processes.

Articles You May Like

ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark
Assessing the performance of real-time epidemic forecasts: A case study of Ebola in the Western Area region of Sierra Leone, 2014-15
ADS-HCSpark: A scalable HaplotypeCaller leveraging adaptive data segmentation to accelerate variant calling on Spark
World Health Organization Forms Committee To Guide Editing Of Human Genes
Correlation Structure in Micro-ECoG Recordings is Described by Spatially Coherent Components

Leave a Reply

Your email address will not be published. Required fields are marked *