Database module

Submodules

Database.initialize_database

class Database.initialize_database.Createdb(database, inchidb)

Bases: object

Generates new sqlite database and creates table and indicies

create_model_tbls()

Creates tables and indicies in SQLite Database

Database.query

class Database.query.Connector(database)

Bases: object

Connects to a generated database

connect_to_database()
custom_query(query)

Takes a custom query and returns the results

get_all_compound_keggIDs()

Retrieves all compound Kegg IDs

get_all_compounds()

Retrieves all compounds in the database

get_all_cpd_chemicalformulas()

Retrieves all chemicalformulas

get_all_cpd_with_chemicalformula(cf)

Retrieves chemicalformula for compound ID

Retrieves compound name for given search term (name/formula)

get_all_fba_models()

Retrieves all model IDs in the database

get_all_keggIDs()

Retrieves reactions based on type

get_all_models()

Retrieves all model IDs in the database

get_all_reactions()

Retrieves all reactions in the database

get_catalysts(reaction_ID)

Retrieves the catalyst of reaction

get_compartment(compartment)

Retrieves the compartment ID

get_compound_ID(compound_name, strict=False)

Retrieves compound ID given a compound name

get_compound_compartment(compound_ID)

Retrieves the compartment that the compound is in

get_compound_name(compound_ID)

Retrieves compound name given a compound ID

get_compounds_in_model(organism_ID)

Retrives all compounds in a metabolic model given model ID

get_cpd_casnumber(ID)

Retrieves casnumber for compound ID

get_cpd_chemicalformula(ID)

Retrieves chemicalformula for compound ID

get_genes(reaction_ID, organism_ID)

Retrieves gene associations for a reaction of a given metabolic network (model ID)

get_kegg_cpd_ID(ID)

Retrieves kegg ID for a compound based on main ID

get_kegg_reaction_ID(ID)

Retrieves kegg ID for a reaction based on main ID

get_model_ID(file_name)

Retrieves model ID for given file_name

get_models_from_cluster(cluster)

Retrieves model IDs from a specified cluster in the database

get_organism_ID(organism_name)

Retrieves ID of metabolic model given a specific model name

get_organism_name(organism_ID)

Retrieves name of metabolic model given a specific model ID

get_pressure(reaction_ID)

Retrieves the pressure reaction is performed at

get_products(reaction_ID)

Retrieves products (compound IDs) of a given reaction

get_products_reactions(compound_ID)

Retrieves reactions that have a given compound (ID) as a product

get_proteins(reaction_ID, organism_ID)

Retrieves protein associations for a reaction of a given metabolic network (model ID)

get_reactants(reaction_ID)

Retrieves reactants (compound IDs) of a given reaction

get_reactants_reactions(compound_ID)

Retrieves reactions that have a given compound (ID) as a reactant

get_reaction_name(reaction_ID)

Retrieves name of the reaction given the reaction ID

get_reaction_species(reaction_ID)

Retrieves compound IDs that are in a given a reaction

get_reaction_type(rxn)

Retrieves reaction on type

get_reactions(compound_ID, is_prod)

Retrieves reaction IDs that have a given compound ID as a reactant or product

get_reactions_based_on_type(rxntype)

Retrieves reactions based on type

get_reactions_in_model(organism_ID)

Retrieves all reactions in a metabolic model given model ID

get_reference(reaction_ID)

Retrieves the reference of reaction

get_solvents(reaction_ID)

Retrieves solvent of reaction

get_stoichiometry(reaction_ID, compound_ID, is_prod)

Retrieves stoichiometry of a compound for a given reaction

get_temperature(reaction_ID)

Retrieves the temperaature reaction is performed at

get_time(reaction_ID)

Retrieves the time that is required to perform reaction

get_uniq_metabolic_clusters()

Retrieves unique metabolic clusters (organisms with the exact same metabolism) in the database

get_yield(reaction_ID)

Retrieves yield that was reported with reaction

is_reversible(organism_ID, reaction_ID)

Retrieves reverisbility information of a reaction in a specified metabolic model (model ID)

is_reversible_all(reaction_ID)

Retrieves reversibility information of a reaction independent of model

Database.query.fetching_all_query_results(Q, conn, cnx, db, query, count)
Database.query.fetching_one_query_results(Q, conn, cnx, db, query, count)
Database.query.test_db_4_error(conn, cnx, query, db, count)

Database.build_ATLAS_db

Database.build_ATLAS_db.build_atlas(atlas_dir, DBPath, inchidb, processors, rxntype='bio')

Add atlas database to RSA metabolic database

Database.build_ATLAS_db.extract_KEGG_data(url)

Extract Kegg db info

Database.build_ATLAS_db.fill_arrays_4_db(rxn_keggids, rxn_atlas, DBPath, inchidb, processors, rxntype)

fill arrays with ATLAS reactions to add to database

Database.build_ATLAS_db.fill_database(cnx, reactions, reaction_reversible, model_reaction, reaction_protein, reaction_genes, model_compound, compound, reaction_compound, original_db_cpd_new)

fill database with ATLAS information

Database.build_ATLAS_db.fill_dictionary(larray, dictionary, KEGGID=False)

Fill dictionaries with ATLAS reactions

Database.build_ATLAS_db.fill_dictionary_atlasbiochem(larray, dictionary)

Fill dictionaries with ATLAS reactions

Database.build_ATLAS_db.get_inchi_4_cpd(cpd, INCHI)

Get inchi for a compound if inchidb is True

Database.build_ATLAS_db.open_atlas_files(atlas_files)

open ATLAS files

Database.build_ATLAS_db.process_reactions(rxninfo, currentcpds, dbcpds, original_db_cpd_current, inchidb, rxntype, INCHI, output_queue)

Process ATLAS reactions

Database.build_ATLAS_db.process_substrates(rxn, cpd, is_prod, currentcpds, dbcpds, inchidb, original_db_cpd_current, model_compound_temp, reaction_compound_temp, model_reaction_temp, compound_temp, original_db_cpd_temp, INCHI)

Process ATLAS compounds

Database.build_kbase_db

Database.build_kbase_db.BuildKbase(sbml_dir, kbase2keggCPD_translate_file, kbase2keggRXN_translate_file, inchi, DBpath, rxntype='bio')

Inserts values from metabolic networks, xml into sqlite database

Database.build_kbase_db.extract_KEGG_data(url)

Extract Kegg db info

Database.build_kbase_db.get_KEGG_IDs(ID, KEGGdict)

Retrieve KEGG IDs

Database.build_kbase_db.get_compartment_info(ID)

Retrieve compartment information for compound

Database.build_kbase_db.get_inchi_values(file_name, inchi_pubchem, inchi_cf, inchi_cas, total_set, CPD2KEGG, CT, INCHI)

Retrieve InChI values for compounds in metabolic networks

Database.build_kbase_db.get_metabolic_clusters(org_cpds, org_rxns, model_id, metabolic_clusters, cluster_org)

Retrieves metabolic cluster info

Database.build_kbase_db.insert_comprehensive_model_results_2_db(DBpath, modelcompartments, modelcompounds_allinfo, rxn_info, all_rxn_cpds, keggdict, inchi, inchi_pubchem, filenum, rxntype)

Inserts comprehensive compound and reaction information into tables

Database.build_kbase_db.insert_individual_model_results_2_db(DBpath, modelcompounds, modelreactions, genelist, proteinlist, mi, filename)

Inserts individual model information into tables

Database.build_kbase_db.kegg2pubcheminchi(cpd)

Using KEGG ID to get inchi

Database.build_kbase_db.load_file_info_2_db(args)

Open and insert information from metabolic networks (xml files) into database

Database.build_kbase_db.open_translation_file(file_name)

opens and stores KEGG translation files

Database.build_kbase_db.parse_data_sbmlfile(inchi, CPD2KEGG, RXN2KEGG, file_name, inchi_pubchem, inchi_cf, inchi_cas)

Open metabolic network file (xml file) and parse information

Database.build_kbase_db.process_compartments(compartmentsoup)

Get compartments in xml file

Database.build_kbase_db.process_compounds(speciessoup, CPD2KEGG, mi, inchi, inchi_pubchem, inchi_cf, inchi_cas)

Parse compound information from metabolic network file (xml file)

Database.build_kbase_db.process_reactions(reaction_soup, RXN2KEGG, CPD2KEGG, mi, inchi, inchi_pubchem)

Parse reaction information from metabolic network file (xml file)

Database.build_kbase_db.retrieve_exact_inchi_values(m, total_set, inchi_pubchem, inchi_cf, inchi_cas, CPD2KEGG, CT, INCHI)

Retrieve InChI values

Database.build_kbase_db.retrieve_metabolic_clusters(DBpath)

Identifies and inserts metabolic clusters (organisms with the same compounds and reactions) into database

Database.build_modelseed

class Database.build_modelseed.BuildModelSeed(username, password, rxntype, inchidb, DBpath, output_folder, media='Complete', newdb=True, tokentype='patric', sbml_output=False, processors=4, verbose=False, patricfile='/Users/lwhitmo/software/RetSynth_lt/rs/Database/data/PATRIC_genome_complete_07152018.csv', previously_built_patric_models=False)

Bases: object

get_model_from_patric()

Builds models in patric and converts them to cobra models which are then imported into retsynth database

load_complete_genomes()

Loads a list of patric genome IDs that are complete genomes

process_cobra_model(model, genome_id)

Adds information from cobra model into appropriate arrays

class Database.build_modelseed.LoadIntoDB(DBpath, verbose, inchidb)

Bases: object

add_all_info_existing(allcompounds, originalIDs, allreactions, reaction_reversibility, model_ids, model_compartments)

Loads unique compound, reaction, compartment and model information into preexisting database

add_all_info_new(allcompounds, originalIDs, allreactions, reaction_reversibility, model_ids, model_compartments)

Loads unique compound, reaction, compartment and model information into new database

add_cluster_info()

Loads cluster information into database

add_model_compounds(model_compounds)

Loads model compound information into database

add_model_reactions(model_reactions, reaction_genes, reaction_protein)

Loads reaction information into database

add_reaction_compound(reaction_compound, newdb)

Loads reaction compound information into database

get_model_sorted_cpds(model)
get_model_sorted_rxns(model)
Database.build_modelseed.build_patric_models(genome_id, genome_name, media, username)

Builds patric models on the patric server

Database.build_modelseed.extract_KEGG_data(url, verbose)

Extract Kegg db info

Database.build_modelseed.generate_sbml_output_folder(output_folder)

Generate output folder for sbml fba models if user has specified this option

Database.build_modelseed.get_KEGG_IDs(ID, compartment, KEGGdict)

Retrieve KEGG IDs

Database.build_modelseed.kegg2pubcheminchi(cpd, verbose)

Convvert kegg ID to InChI value

Database.build_modelseed.open_translation_file(file_name)

opens and stores KEGG translation files

Database.build_modelseed.retrieve_exact_inchi_values(new_cpd_keggid, raw_cpd_keggid, cpd_name, compart_info, inchi_pubchem, inchi_cf, inchi_cas, CT, INCHI, verbose)

Retrieve InChI values

Database.build_modelseed.verbose_print(verbose, line)

verbose print function

Database.build_user_rxns_db

class Database.build_user_rxns_db.AddUserRxns2DB(DBPath, file_name, model_id='UserAdded', rxntype='bio')

Bases: object

add_data_2_db()
check_cpd_in_db(rxn_comps, rxn_stoich, rxn_comps_name, rxn_id, is_prod)
get_fp_cf(cpd)
get_rxn_components(rxn, ids)
get_stoichometry(cpds)
open_user_file()

Database.build_KEGG_db

Database.build_KEGG_db.BuildKEGG(types_orgs, inchidb, processors, currentcpds, num_organisms='all', num_pathways='all')

Build metabolic database from KEGG DB

class Database.build_KEGG_db.CompileKEGGIntoDB(database, type_org, inchidb, processors, num_organisms, num_pathways, rxntype, add)

Bases: object

Add KEGG info to sqlite database

add_to_preexisting_db()

Add KEGG info to already developed database

fill_new_database()

Fill database

Database.build_KEGG_db.add_metabolite(reactionID, cpd, stoichiometry, is_prod, reactioninfo)

add metabolites to dictionary

Database.build_KEGG_db.extract_KEGG_data(url)

Extract Kegg db info

Database.build_KEGG_db.extract_KEGG_orgIDs(types_orgs, num_organisms)

Retrieve organism IDs in KEGG

Database.build_KEGG_db.extract_pathwayIDs(orgID, num_pathways, output_queue)

Retrieve pathway IDs

Database.build_KEGG_db.extract_reactionIDs(pathway, output_queue)

Extract reactions in pathways

Database.build_KEGG_db.process_compound(cpd, reactionID, reactioninfo, is_prod, inchidb, compoundinfo, cpd2inchi, inchi_cf, inchi_cas, currentcpds)

Extract compound info

Database.build_KEGG_db.process_reaction(reactionID, inchidb, compoundinfo, cpd2inchi, inchi_cf, inchi_cas, currentcpds, output_queue)

Extract reaction info

Database.build_metacyc_db

class Database.build_metacyc_db.MetaCyc(DB, inchidb, cnx, verbose)

Bases: object

Opens and parses metacyc xml file

arrays(cpdID, compartment, name, KEGG_ID, chemicalformula, cas)

Adds compound to compound arrays

check_db(cpdID, kegg_id)

Checks database for table original_db_cpdIDs if table exists and cpd ID exists the inchi value for that compound is retrieved

check_reaction_difference(rxnID, rxn_compounds)

Checks to see if promiscus mets are only difference between reactions

compound_translator(compound_ID, biocyc_ID, inchi_ID, KEGG_ID, name, compartment)

Checks if metacyc compound is in database, if it is not it adds it

fill_compound_arrays(ID, inchi, cpdID, KEGG_ID, name, compartment)

Determines whether or not to add compound to compound arrays which will later be added to the sqlite database

fill_temp_array(cpdID, is_prod, stoic, temp_all_rxn_compound)

Adds reaction information from metacyc xml file to temporary reaction list which later gets inserted into database

get_compounds_4_rxn(species)

get reaction compounds

get_fp_cf_info(inchi)
get_promiscuous_cpds(file_name)
multiple_copies_of_rxns(rxnID, rxn, name, genes, proteins, kegg, temp_all_rxn_compound)

Deals with rxns that have same catalytic enzyme but different substrates

read_metacyc_file(BIOCYC_translator, file_name)

Reads and parses metacyc SBML file

retrieve_compartment_4_compartment(temp_compartment, rxnID)
retrieve_rxn_info(rxn, rxnID, genes, proteins, kegg_ID, biocycID)

Parses reaction information from metacyc xml file

rxn_translator(reaction_ID, temp_all_rxn_compound, revers, name, genes, proteins, kegg)

Checks if metacyc reaction is in database, if it is not it adds it

class Database.build_metacyc_db.Translate(DBPath, file_name, inchidb, rxntype, verbose, add=True)

Bases: object

Translates metacyc compound and reaction IDs to Kbase compound and reaction IDs if a kbase database is used (ONLY WORKS WITH KBASE, ModelSeed/patric)

add_metacyc_to_db()

Adds metacyc information to the database

Database.build_metacyc_db.extract_KEGG_data(url)

Extract Kegg db info

Database.build_metacyc_db.get_inchi_from_kegg_ID(cpd)
Database.build_metacyc_db.verbose_print(verbose, line)

Database.build_SPRESI_db

Database.build_SPRESI_db.RDF_Reader(file_directory, DBpath, rxntype, compartment, processors, temp_option=False, pressure_option=False, yield_option=False, time_option=False, catalyst_option=True, solvent_option=True)

Adds data from RDF files into database (specifically works with spresi formated rdf files)

Database.build_SPRESI_db.add_individual_file_info(text_file, cnx, conn, rxntype, compartment, eliminate_duplicates, identification)

Specifically adds individual file info from to database

Database.build_SPRESI_db.add_info_2_database(DBpath, rxntype, compartment)

Adds data from text files too database

Database.build_SPRESI_db.check_refs(table, rxn_id, test_info_ref, new_rxn_info, larray, cnx)

check if reference info is already in database

Database.build_SPRESI_db.generate_mol_file(compounds, filenumber, substrates=False)

Generates mole file and then reads it in using the indigo API to get the smile

Database.build_SPRESI_db.get_complex_reference_details(item, reference_parameters, full_citation_string, reference_details_array, reference_details_bool)

retrieves details for references with many parameters

Database.build_SPRESI_db.get_data(match, datatype, item, result_array, type_bool, DATA_TYPE=None)

retrieves a variety of other information for rxn (yield, reference etc…)

Database.build_SPRESI_db.get_mol_structure(match, item, type_bool, result_dict, count_item=0, GET_RXN=True)

retrieves rxn, solvents and catalyst information

Database.build_SPRESI_db.open_file(args)

Opens in an RDF file

Database.build_SPRESI_db.parse_file(output_file, RDF_dict, filenumber, file_name, options)

Parses elements of an RDFile outputs them to new next file

Database.build_SPRESI_db.process_rxntext(rxntext_array)

processes string containing reaction information

Database.build_MINE_db

class Database.build_MINE_db.BuildMINEdb(dumpdirectory, database, inchidb, rxntype)

Bases: object

Adds or builds MINE database to metabolic database

add2dictionary(temp)

Adds compound to file dictionary

extract_cpd_information(compoundid, INFO, tp)

Get compound information

extract_source_information(compoundid, tp)

extract operator information (EC number)

fill_database()

Generate arrays of database information and fill database with information

fill_reaction_components_dict(rxn, compound, typecpd)

Fill reaction info (substrates) in to dictionary

generate_reactions()

Get reactions from MINE files

open_mspfile(filename)

Opens and reads msp files

Database.remove_duplicate_cpds

class Database.remove_duplicate_cpds.OverlappingCpdIDs(database)

Bases: object

identify_overlapping_ids()

find cpd IDs that are have different IDs (i.e. inchi and cpd for the same compound) and fix