Report updated to 03/28/2024

General aims

Identify similar molecules within a known drug database and analyze these connections to identify potential protein target for drug discovery.


The report outlines the process and results of identifying molecular similarities between a predefined set of molecules, represented by the SMILES notation, and known drugs within the CHEMBL database. This comparative analysis identifies potential therapeutic candidates with properties like those of the target molecules.


Materials and Methods

Dataset_JF: initial dataset of 228 molecules from the file Compounds for library development.xlsx

The excel file consists of 5 sheets:

  • Quassinoids – 60 smiles
  • Steroidal Alkaloids – 20 smiles
  • Cycloeudesmane sesquiterpenoids – 44 smiles
  • Isoquinoline alkaloids – 73 smiles
  • Naphthoquinones – 31 smiles

Reference drug set: in the initial phase, our investigation utilized an internal database comprising 10,295 known drug molecules, sourced from CHEMBL. There exists potential for expansion of this database to encompass up to 2 million molecules.


Similarity selection: we used the MoDiCa (Molecular Distance Calculation) tool to calculate the molecular properties of the Dataset_JF and the set of drugs. These calculations include generating a molecular fingerprint based on the type of atoms present in the molecule, assigning AM1BCC atom types, calculating logP (coefficient of partition of a compound between an organic phase and an aqueous phase), calculating the number of acceptor and donor hydrogen bonds, and determining an approximate charge for the molecule based on the type of atoms present.

Once the fingerprints were selected, we calculated the distance between the Dataset_JF and the drug database as the Euclidean distance between the corresponding fingerprint vectors. This distance metric considers differences between small molecules in all selected properties, allowing a complete comparison.








Target identification on the basis of similar molecules


Search for new targets with the matisse tool