Drug R+: A comprehensive database for drug repurposing applications




Please cite us as
Masoudi-Sobhanzadeh, Y., Omidi, Y., Amanlou, M., & Masoudi-Nejad, A. (2019). DrugR+: A comprehensive relational database for drug repurposing, combination therapy, and replacement therapy. Computers in Biology and Medicine.

Drug R+
Drug R+ is the first database which provides drug repurposing capabilities based on drug-target interactions, adverse reaction of drugs, and mechanism of action of drugs on targets. Furthermore, expert users can express their complex queries such as nested queries, and then get their favorable results. Also, unprofessional users can state their queries in an easy way. After acquiring the results, they can export them into an excel file.

The first page of Drug R+, which is observer in Fig.1, includes four parts, including Search, CDR, data sets, and database structure.


Fig.1: The first page of Drug R+
In continue, each of the sections is described.



SEARCH
Two search strategies are available in Drug R+. In the first strategy, unprofessional users can select their tables and can add some constraints which limit search space. When a user selects a table and adds a constraint by clicking on the “add constraint button”, the relevant query is built. For acquiring the result, users must click on “show results” button. The first condition is started by “where” while others begin with “and”. If your results are not presented, click on “clear” button and try again. By clicking on “export to excel” button, users can send their obtained results into an excel file. Various parts of the search section are depicted in Fig.2. Only, the queries, which are relative to information retrieval, are permitted. In contrast, delete, update, insert, or any other operations which change the database are not allowed.


Fig.2: The search section of Drug R+



CDR
In addition to the search section, CDR section is another main part of Drug R+. This section suggests a list of drugs which have a meaningful relation with a desired drug. In the section, a user must select its desired drug by clicking on the determined field. Then, the user must select the target and click on the “result” button. In the target box, main targets, enzymes, transporters, and carriers can be selected. Also, the user can confine search space using each of the following options:

i) Drug type: users can limit search space for FDA approved or the FDA not approved drugs.
ii) Known action: “yes”, “no”, and “unknown” indicate targets which are directly related to their clinical targets, targets which are relative to off-targets, and targets which their mechanism of actions are not reviewed respectively.

After acquiring the list, users must analyze it and obtain their favorable results if they exist. In fig.3, CDR section of Drug R+ is shown. The results of Fig.3 are relative to armodafinil which is used for treating excessive daytime sleeping.


Fig.3: CDR of Drug R+



Datasets
Drug R+ includes four datasets which can be used for creating a model for prediction drug-target interactions. In supplementary file, three machine learning approaches have been applied on them, and their results have been reported. The mentioned datasets are:

i) Enzymes: Enzymes are macromolecular catalyzers which accelerate chemical reactions.
ii) Ion channel proteins: They are pore-forming proteins which allow ions to pass pore channel.
iii) GPCR: G protein coupled receptors are large class of proteins which activate signal transduction pathways.
iv) Nuclear receptors: They are responsible for certain type of molecules like steroid and thyroid. In response, they regulate expression of some genes.

All of the mentioned proteins play main roles in a cell and have importance roles in drug design. In fig.4, the dataset section of Drug R+ is presented.


Fig.4: The datasets of Drug R+



DB structure
In order to develop Drug R+, several steps have been followed. In the first step, the flat file of drugs is taken from drug bank database. The flat file is then divided into small files using python programing language. Based on our analysis on the files, we done semantic modeling and then implemented the database. Through third normal form (3NF), the database normalization is done. The script of the database is available in the DB structure section of Drug R+. After that, we used python programing language, and employed parallel processing and map reduce method for transferring data of the flat file into the database. The python codes are also available in this part of web interface of Drug R+. Furthermore, entity relationship diagram (ERD) of the database is accessible. In fig.5, DB structure section is seen. Users can get a copy of the database by following the below steps:

i) Download the flat file from drug bank. (https://www.drugbank.ca/releases/latest)
ii) Divide the flat file into small files using python codes.
iii) Create a database using DDL.
iv) Run python code for transferring data into their relevant tables and fields.




Fig.5: Structure files of Drug R+



Examples
In this section, several SQL queries and their concepts exist. These examples have different roles and combining them can lead to various results.



Also, some other examples of query statements are observable in figure 6 through 20.

1- Obtain a list of structure information of drugs which their original resource is not pubchem:
select * from structures where original_resource not like ('%pubchem%')


Fig.6: An example of like operation



2- 3D structure information of drugs which were produced in 2018:
select * from structures where left(drug_id,7) in (select id from drugs where datepart(year,create_date)=2018)


Fig.7: A nested query example



3- A list of drugs’ id, their smile, and inchi keys:
select drug_id,smile,inchi_key from structures


Fig.8: A list of drugs’ id and their 3D information



4- A list of drugs’ id and their chemical formula:
select drug_id,formula from structures where formula like '%c6%'


Fig.9: drugs and their chemical formula



5- Drugs which have identical chemical formula:
select a.drug_id,b.drug_id,a.formula from structures as a join structures as b on a.formula=b.formula and a.drug_id<>b.drug_id


Fig.10: Drugs having an identical chemical formula



6- The chemical formula and their plenitude and sort them in descending form:
select formula,COUNT(*) as 'the total number' from structures group by formula order by 'the total number' desc


Fig.11: Formula and their plenitude (descending)



7- The chemical formula and their plenitude and sort them in ascending form:
select formula,COUNT(*) as 'the total number' from structures group by formula order by 'the total number' asc


Fig.12: Formula and their plenitude (ascending)



8- The chemical formula and their plenitude which is greater than 5 and sort them in descending form:
select formula,COUNT(*) as 'the total number' from structures group by formula having count(*)>5 order by 'the total number' asc


Fig.13: Formula and their plenitude which are greater than 5 (ascending)



9- Acquire a list of drugs’ name and their molecular weights:
select a.drug_name,b.molecular_weight from drugs as a join structures as b on a.id=left(b.drug_id,7)


Fig.14: list of drugs and their molecular weights



10- List of drugs and their molecular weights which is greater than 50 daltons:
select a.drug_name,b.molecular_weight from drugs as a join structures as b on a.id=left(b.drug_id,7) and b.molecular_weight>'50'


Fig.15: list of drugs and their molecular weights which are greater than 50 daltons



11- The total number of small molecules:
select COUNT(*) from drugs where drug_type='small molecule'


Fig.16: The total number of small molecules



12- The total number of small_molecules:
select COUNT(*) from drugs where drug_type<>'small molecule'


Fig.17: The total number of drugs which are not small molecules



13- List of drugs’ name which their 3D information is not available:
select drug_name from drugs where id not in (select LEFT(drug_id,7) from structures)


Fig.18: The drugs’ name having 3D information



14- The total number of drugs which have not 3D structures:
select COUNT(*) from drugs where id not in (select LEFT(drug_id,7) from structures)


Fig.19: The total number of drugs which their 3D information



Acronyms
It has been tried to use words which are absolutely clear and understandable. However, there are some words that their acronyms exist in Table.1.




Laboratory of systems biology and bioinformatics (L.B.B). update: 2019:11