Script documentation

BioCompoundML’s primary script is

bcml.py

--input: Training File Input

--datain: Saved Data Input

--test_input: Testing File Input

--train: Train the model

--test: Test the model

--model: Output the model to a file

--dataout: Output all data structures

--pred: Prediction feature

--proxy: URL of http/s proxy

--cluster: Cluster the training data

--split_value: Threshold for classification of prediction feature

--random: User defined random seed

--verbose: Verbose output

--experimental: Extract experimental/computed features from PubChem

--fingerprint: Extract CACTVS fingerprints from PubChem

--chemofeatures: Run PaDEL-Descriptors

--user: User features are provided in training and/or test files

--distance: Calculate compound vs. compound distance matrix

--impute: Impute missing data using K Nearest Neighbors Imputation

--selection: Run Boruta Feature Selection to reduce uninformative features

--cv: Run 50% hold-out Cross-Validation 100 times