Overview of PIDGINv4¶

Introduction¶

Protein target prediction using Random Forests (RFs) trained on bioactivity data from PubChem (extracted Mar 2021) and ChEMBL (version 28), using the RDKit and Scikit-learn, which employ a modification of the reliability-density neighbourhood Applicability Domain (AD) analysis by Aniceto [1]. This project is the sucessor to PIDGIN version 1 [2] and PIDGIN version 2 [3]. This is the updated and retrained version of PIDGIN version 3 Target prediction with extended NCBI pathway and DisGeNET disease enrichment calculation is available as implemented in [4].

Molecular Descriptors : 2048bit RDKit Extended Connectivity FingerPrints (ECFP) [5]
Algorithm: Random Forests with dynamic number of trees (see docs for details), class weight = ‘balanced’, sample weight = ratio Inactive:Active
Models generated at four different cut-off’s: 100μM, 10μM, 1μM and 0.1μM
Models generated both with and without mapping to orthologues
Pathway information from NCBI BioSystems
Disease information from DisGeNET
Target/pathway/disease enrichment calculated using Fisher’s exact test and the Chi-squared test

Details for sizes across all activity cut-off’s (needs to be updated for new version):

	Without orthologues	With orthologues
Distinct Models	11,782	16,772
Distinct Targets [exhaustive total]	3,698 [11,782]	17,021 [63,140]
Total Bioactivities Over all models	50,210,041	437,574,005
Actives	4,079,996	4,087,155
Inactives [Of which are Sphere Exclusion (SE)]	46,130,045 [35,119,663]	463,237,781 [314,117,438]

Full details on all models are provided in the uniprot_information.txt files in the ortho and no_ortho directories (to be downloaded)

Contributing¶

Development occurs on GitHub. Documentation on Readthedocs. Contributions, feature requests, and bug reports are welcome. Consult the issue tracker.

License¶

PIDGINv4 is released under the GNU Lesser General Public License version 3.0 ().

Broadly, this means PIDGINv4 can be used in any manner without modification, with proper attribution. Modification of source code must also be released under so that the community may benefit.

Citing PIDGIN¶

To cite PIDGINv4, please reference either previous versions [2] [3] or use .

References¶

[1]	Aniceto, N, et al. A novel applicability domain technique for mapping predictive reliability across the chemical space of a QSAR: Reliability-density neighbourhood. J. Cheminform. 8: 69 (2016)

[2]	(1, 2) Mervin, L H., et al. Target prediction utilising negative bioactivity data covering large chemical space. J. Cheminform. 7: 51 (2015)

[3]	(1, 2) Mervin, L H., et al. Orthologue chemical space and its influence on target prediction. Bioinformatics. 34: 72–79 (2018)

[4]	Mervin, L H., et al. Understanding Cytotoxicity and Cytostaticity in a High-Throughput Screening Collection. ACS Chem. Biol. 11: 11 (2016)

[5]	Rogers D & Hahn M. Extended-connectivity fingerprints. J. Chem. Inf. Model. 50: 742-54 (2010)

Overview of PIDGINv4¶

Introduction¶

Contributing¶

License¶

Citing PIDGIN¶

References¶

PIDGINv4

Navigation

Related Topics