Database module¶

Submodules¶

Database.initialize_database¶

class Database.initialize_database.Createdb(database, inchidb)¶

Bases: object

Generates new sqlite database and creates table and indicies

create_model_tbls()¶: Creates tables and indicies in SQLite Database

Database.query¶

class Database.query.Connector(database)¶

Bases: object

Connects to a generated database

connect_to_database()¶

custom_query(query)¶: Takes a custom query and returns the results

get_all_compound_keggIDs()¶: Retrieves all compound Kegg IDs

get_all_compounds()¶: Retrieves all compounds in the database

get_all_cpd_chemicalformulas()¶: Retrieves all chemicalformulas

get_all_cpd_with_chemicalformula(cf)¶: Retrieves chemicalformula for compound ID

get_all_cpd_with_search(search)¶: Retrieves compound name for given search term (name/formula)

get_all_fba_models()¶: Retrieves all model IDs in the database

get_all_keggIDs()¶: Retrieves reactions based on type

get_all_models()¶: Retrieves all model IDs in the database

get_all_reactions()¶: Retrieves all reactions in the database

get_catalysts(reaction_ID)¶: Retrieves the catalyst of reaction

get_compartment(compartment)¶: Retrieves the compartment ID

get_compound_ID(compound_name, strict=False)¶: Retrieves compound ID given a compound name

get_compound_compartment(compound_ID)¶: Retrieves the compartment that the compound is in

get_compound_name(compound_ID)¶: Retrieves compound name given a compound ID

get_compounds_in_model(organism_ID)¶: Retrives all compounds in a metabolic model given model ID

get_cpd_casnumber(ID)¶: Retrieves casnumber for compound ID

get_cpd_chemicalformula(ID)¶: Retrieves chemicalformula for compound ID

get_genes(reaction_ID, organism_ID)¶: Retrieves gene associations for a reaction of a given metabolic network (model ID)

get_kegg_cpd_ID(ID)¶: Retrieves kegg ID for a compound based on main ID

get_kegg_reaction_ID(ID)¶: Retrieves kegg ID for a reaction based on main ID

get_model_ID(file_name)¶: Retrieves model ID for given file_name

get_models_from_cluster(cluster)¶: Retrieves model IDs from a specified cluster in the database

get_organism_ID(organism_name)¶: Retrieves ID of metabolic model given a specific model name

get_organism_name(organism_ID)¶: Retrieves name of metabolic model given a specific model ID

get_pressure(reaction_ID)¶: Retrieves the pressure reaction is performed at

get_products(reaction_ID)¶: Retrieves products (compound IDs) of a given reaction

get_products_reactions(compound_ID)¶: Retrieves reactions that have a given compound (ID) as a product

get_proteins(reaction_ID, organism_ID)¶: Retrieves protein associations for a reaction of a given metabolic network (model ID)

get_reactants(reaction_ID)¶: Retrieves reactants (compound IDs) of a given reaction

get_reactants_reactions(compound_ID)¶: Retrieves reactions that have a given compound (ID) as a reactant

get_reaction_name(reaction_ID)¶: Retrieves name of the reaction given the reaction ID

get_reaction_species(reaction_ID)¶: Retrieves compound IDs that are in a given a reaction

get_reaction_type(rxn)¶: Retrieves reaction on type

get_reactions(compound_ID, is_prod)¶: Retrieves reaction IDs that have a given compound ID as a reactant or product

get_reactions_based_on_type(rxntype)¶: Retrieves reactions based on type

get_reactions_in_model(organism_ID)¶: Retrieves all reactions in a metabolic model given model ID

get_reference(reaction_ID)¶: Retrieves the reference of reaction

get_solvents(reaction_ID)¶: Retrieves solvent of reaction

get_stoichiometry(reaction_ID, compound_ID, is_prod)¶: Retrieves stoichiometry of a compound for a given reaction

get_temperature(reaction_ID)¶: Retrieves the temperaature reaction is performed at

get_time(reaction_ID)¶: Retrieves the time that is required to perform reaction

get_uniq_metabolic_clusters()¶: Retrieves unique metabolic clusters (organisms with the exact same metabolism) in the database

get_yield(reaction_ID)¶: Retrieves yield that was reported with reaction

is_reversible(organism_ID, reaction_ID)¶: Retrieves reverisbility information of a reaction in a specified metabolic model (model ID)

is_reversible_all(reaction_ID)¶: Retrieves reversibility information of a reaction independent of model

Database.query.fetching_all_query_results(Q, conn, cnx, db, query, count)¶

Database.query.fetching_one_query_results(Q, conn, cnx, db, query, count)¶

Database.query.test_db_4_error(conn, cnx, query, db, count)¶

Database.build_ATLAS_db¶

Database.build_ATLAS_db.build_atlas(atlas_dir, DBPath, inchidb, processors, rxntype='bio')¶: Add atlas database to RSA metabolic database

Database.build_ATLAS_db.extract_KEGG_data(url)¶: Extract Kegg db info

Database.build_ATLAS_db.fill_arrays_4_db(rxn_keggids, rxn_atlas, DBPath, inchidb, processors, rxntype)¶: fill arrays with ATLAS reactions to add to database

Database.build_ATLAS_db.fill_database(cnx, reactions, reaction_reversible, model_reaction, reaction_protein, reaction_genes, model_compound, compound, reaction_compound, original_db_cpd_new)¶: fill database with ATLAS information

Database.build_ATLAS_db.fill_dictionary(larray, dictionary, KEGGID=False)¶: Fill dictionaries with ATLAS reactions

Database.build_ATLAS_db.fill_dictionary_atlasbiochem(larray, dictionary)¶: Fill dictionaries with ATLAS reactions

Database.build_ATLAS_db.get_inchi_4_cpd(cpd, INCHI)¶: Get inchi for a compound if inchidb is True

Database.build_ATLAS_db.open_atlas_files(atlas_files)¶: open ATLAS files

Database.build_ATLAS_db.process_reactions(rxninfo, currentcpds, dbcpds, original_db_cpd_current, inchidb, rxntype, INCHI, output_queue)¶: Process ATLAS reactions

Database.build_ATLAS_db.process_substrates(rxn, cpd, is_prod, currentcpds, dbcpds, inchidb, original_db_cpd_current, model_compound_temp, reaction_compound_temp, model_reaction_temp, compound_temp, original_db_cpd_temp, INCHI)¶: Process ATLAS compounds

Database.build_kbase_db¶

Database.build_kbase_db.BuildKbase(sbml_dir, kbase2keggCPD_translate_file, kbase2keggRXN_translate_file, inchi, DBpath, rxntype='bio')¶: Inserts values from metabolic networks, xml into sqlite database

Database.build_kbase_db.extract_KEGG_data(url)¶: Extract Kegg db info

Database.build_kbase_db.get_KEGG_IDs(ID, KEGGdict)¶: Retrieve KEGG IDs

Database.build_kbase_db.get_compartment_info(ID)¶: Retrieve compartment information for compound

Database.build_kbase_db.get_inchi_values(file_name, inchi_pubchem, inchi_cf, inchi_cas, total_set, CPD2KEGG, CT, INCHI)¶: Retrieve InChI values for compounds in metabolic networks

Database.build_kbase_db.get_metabolic_clusters(org_cpds, org_rxns, model_id, metabolic_clusters, cluster_org)¶: Retrieves metabolic cluster info

Database.build_kbase_db.insert_comprehensive_model_results_2_db(DBpath, modelcompartments, modelcompounds_allinfo, rxn_info, all_rxn_cpds, keggdict, inchi, inchi_pubchem, filenum, rxntype)¶: Inserts comprehensive compound and reaction information into tables

Database.build_kbase_db.insert_individual_model_results_2_db(DBpath, modelcompounds, modelreactions, genelist, proteinlist, mi, filename)¶: Inserts individual model information into tables

Database.build_kbase_db.kegg2pubcheminchi(cpd)¶: Using KEGG ID to get inchi

Database.build_kbase_db.load_file_info_2_db(args)¶: Open and insert information from metabolic networks (xml files) into database

Database.build_kbase_db.open_translation_file(file_name)¶: opens and stores KEGG translation files

Database.build_kbase_db.parse_data_sbmlfile(inchi, CPD2KEGG, RXN2KEGG, file_name, inchi_pubchem, inchi_cf, inchi_cas)¶: Open metabolic network file (xml file) and parse information

Database.build_kbase_db.process_compartments(compartmentsoup)¶: Get compartments in xml file

Database.build_kbase_db.process_compounds(speciessoup, CPD2KEGG, mi, inchi, inchi_pubchem, inchi_cf, inchi_cas)¶: Parse compound information from metabolic network file (xml file)

Database.build_kbase_db.process_reactions(reaction_soup, RXN2KEGG, CPD2KEGG, mi, inchi, inchi_pubchem)¶: Parse reaction information from metabolic network file (xml file)

Database.build_kbase_db.retrieve_exact_inchi_values(m, total_set, inchi_pubchem, inchi_cf, inchi_cas, CPD2KEGG, CT, INCHI)¶: Retrieve InChI values

Database.build_kbase_db.retrieve_metabolic_clusters(DBpath)¶: Identifies and inserts metabolic clusters (organisms with the same compounds and reactions) into database

Database.build_modelseed¶

class Database.build_modelseed.BuildModelSeed(username, password, rxntype, inchidb, DBpath, output_folder, media='Complete', newdb=True, tokentype='patric', sbml_output=False, processors=4, verbose=False, patricfile='/Users/lwhitmo/software/RetSynth_lt/rs/Database/data/PATRIC_genome_complete_07152018.csv', previously_built_patric_models=False)¶

Bases: object

get_model_from_patric()¶: Builds models in patric and converts them to cobra models which are then imported into retsynth database

load_complete_genomes()¶: Loads a list of patric genome IDs that are complete genomes

process_cobra_model(model, genome_id)¶: Adds information from cobra model into appropriate arrays

class Database.build_modelseed.LoadIntoDB(DBpath, verbose, inchidb)¶

Bases: object

add_all_info_existing(allcompounds, originalIDs, allreactions, reaction_reversibility, model_ids, model_compartments)¶: Loads unique compound, reaction, compartment and model information into preexisting database

add_all_info_new(allcompounds, originalIDs, allreactions, reaction_reversibility, model_ids, model_compartments)¶: Loads unique compound, reaction, compartment and model information into new database

add_cluster_info()¶: Loads cluster information into database

add_model_compounds(model_compounds)¶: Loads model compound information into database

add_model_reactions(model_reactions, reaction_genes, reaction_protein)¶: Loads reaction information into database

add_reaction_compound(reaction_compound, newdb)¶: Loads reaction compound information into database

get_model_sorted_cpds(model)¶

get_model_sorted_rxns(model)¶

Database.build_modelseed.build_patric_models(genome_id, genome_name, media, username)¶: Builds patric models on the patric server

Database.build_modelseed.extract_KEGG_data(url, verbose)¶: Extract Kegg db info

Database.build_modelseed.generate_sbml_output_folder(output_folder)¶: Generate output folder for sbml fba models if user has specified this option

Database.build_modelseed.get_KEGG_IDs(ID, compartment, KEGGdict)¶: Retrieve KEGG IDs

Database.build_modelseed.kegg2pubcheminchi(cpd, verbose)¶: Convvert kegg ID to InChI value

Database.build_modelseed.open_translation_file(file_name)¶: opens and stores KEGG translation files

Database.build_modelseed.retrieve_exact_inchi_values(new_cpd_keggid, raw_cpd_keggid, cpd_name, compart_info, inchi_pubchem, inchi_cf, inchi_cas, CT, INCHI, verbose)¶: Retrieve InChI values

Database.build_modelseed.verbose_print(verbose, line)¶: verbose print function

Database.build_user_rxns_db¶

class Database.build_user_rxns_db.AddUserRxns2DB(DBPath, file_name, model_id='UserAdded', rxntype='bio')¶

Bases: object

add_data_2_db()¶

check_cpd_in_db(rxn_comps, rxn_stoich, rxn_comps_name, rxn_id, is_prod)¶

get_fp_cf(cpd)¶

get_rxn_components(rxn, ids)¶

get_stoichometry(cpds)¶

open_user_file()¶

Database.build_KEGG_db¶

Database.build_KEGG_db.BuildKEGG(types_orgs, inchidb, processors, currentcpds, num_organisms='all', num_pathways='all')¶: Build metabolic database from KEGG DB

class Database.build_KEGG_db.CompileKEGGIntoDB(database, type_org, inchidb, processors, num_organisms, num_pathways, rxntype, add)¶

Bases: object

Add KEGG info to sqlite database

add_to_preexisting_db()¶: Add KEGG info to already developed database

fill_new_database()¶: Fill database

Database.build_KEGG_db.add_metabolite(reactionID, cpd, stoichiometry, is_prod, reactioninfo)¶: add metabolites to dictionary

Database.build_KEGG_db.extract_KEGG_data(url)¶: Extract Kegg db info

Database.build_KEGG_db.extract_KEGG_orgIDs(types_orgs, num_organisms)¶: Retrieve organism IDs in KEGG

Database.build_KEGG_db.extract_pathwayIDs(orgID, num_pathways, output_queue)¶: Retrieve pathway IDs

Database.build_KEGG_db.extract_reactionIDs(pathway, output_queue)¶: Extract reactions in pathways

Database.build_KEGG_db.process_compound(cpd, reactionID, reactioninfo, is_prod, inchidb, compoundinfo, cpd2inchi, inchi_cf, inchi_cas, currentcpds)¶: Extract compound info

Database.build_KEGG_db.process_reaction(reactionID, inchidb, compoundinfo, cpd2inchi, inchi_cf, inchi_cas, currentcpds, output_queue)¶: Extract reaction info

Database.build_metacyc_db¶

class Database.build_metacyc_db.MetaCyc(DB, inchidb, cnx, verbose)¶

Bases: object

Opens and parses metacyc xml file

arrays(cpdID, compartment, name, KEGG_ID, chemicalformula, cas)¶: Adds compound to compound arrays

check_db(cpdID, kegg_id)¶: Checks database for table original_db_cpdIDs if table exists and cpd ID exists the inchi value for that compound is retrieved

check_reaction_difference(rxnID, rxn_compounds)¶: Checks to see if promiscus mets are only difference between reactions

compound_translator(compound_ID, biocyc_ID, inchi_ID, KEGG_ID, name, compartment)¶: Checks if metacyc compound is in database, if it is not it adds it

fill_compound_arrays(ID, inchi, cpdID, KEGG_ID, name, compartment)¶: Determines whether or not to add compound to compound arrays which will later be added to the sqlite database

fill_temp_array(cpdID, is_prod, stoic, temp_all_rxn_compound)¶: Adds reaction information from metacyc xml file to temporary reaction list which later gets inserted into database

get_compounds_4_rxn(species)¶: get reaction compounds

get_fp_cf_info(inchi)¶

get_promiscuous_cpds(file_name)¶

multiple_copies_of_rxns(rxnID, rxn, name, genes, proteins, kegg, temp_all_rxn_compound)¶: Deals with rxns that have same catalytic enzyme but different substrates

read_metacyc_file(BIOCYC_translator, file_name)¶: Reads and parses metacyc SBML file

retrieve_compartment_4_compartment(temp_compartment, rxnID)¶

retrieve_rxn_info(rxn, rxnID, genes, proteins, kegg_ID, biocycID)¶: Parses reaction information from metacyc xml file

rxn_translator(reaction_ID, temp_all_rxn_compound, revers, name, genes, proteins, kegg)¶: Checks if metacyc reaction is in database, if it is not it adds it

class Database.build_metacyc_db.Translate(DBPath, file_name, inchidb, rxntype, verbose, add=True)¶

Bases: object

Translates metacyc compound and reaction IDs to Kbase compound and reaction IDs if a kbase database is used (ONLY WORKS WITH KBASE, ModelSeed/patric)

add_metacyc_to_db()¶: Adds metacyc information to the database

Database.build_metacyc_db.extract_KEGG_data(url)¶: Extract Kegg db info

Database.build_metacyc_db.get_inchi_from_kegg_ID(cpd)¶

Database.build_metacyc_db.verbose_print(verbose, line)¶

Database.build_SPRESI_db¶

Database.build_SPRESI_db.RDF_Reader(file_directory, DBpath, rxntype, compartment, processors, temp_option=False, pressure_option=False, yield_option=False, time_option=False, catalyst_option=True, solvent_option=True)¶: Adds data from RDF files into database (specifically works with spresi formated rdf files)

Database.build_SPRESI_db.add_individual_file_info(text_file, cnx, conn, rxntype, compartment, eliminate_duplicates, identification)¶: Specifically adds individual file info from to database

Database.build_SPRESI_db.add_info_2_database(DBpath, rxntype, compartment)¶: Adds data from text files too database

Database.build_SPRESI_db.check_refs(table, rxn_id, test_info_ref, new_rxn_info, larray, cnx)¶: check if reference info is already in database

Database.build_SPRESI_db.generate_mol_file(compounds, filenumber, substrates=False)¶: Generates mole file and then reads it in using the indigo API to get the smile

Database.build_SPRESI_db.get_complex_reference_details(item, reference_parameters, full_citation_string, reference_details_array, reference_details_bool)¶: retrieves details for references with many parameters

Database.build_SPRESI_db.get_data(match, datatype, item, result_array, type_bool, DATA_TYPE=None)¶: retrieves a variety of other information for rxn (yield, reference etc…)

Database.build_SPRESI_db.get_mol_structure(match, item, type_bool, result_dict, count_item=0, GET_RXN=True)¶: retrieves rxn, solvents and catalyst information

Database.build_SPRESI_db.open_file(args)¶: Opens in an RDF file

Database.build_SPRESI_db.parse_file(output_file, RDF_dict, filenumber, file_name, options)¶: Parses elements of an RDFile outputs them to new next file

Database.build_SPRESI_db.process_rxntext(rxntext_array)¶: processes string containing reaction information

Database.build_MINE_db¶

class Database.build_MINE_db.BuildMINEdb(dumpdirectory, database, inchidb, rxntype)¶

Bases: object

Adds or builds MINE database to metabolic database

add2dictionary(temp)¶: Adds compound to file dictionary

extract_cpd_information(compoundid, INFO, tp)¶: Get compound information

extract_source_information(compoundid, tp)¶: extract operator information (EC number)

fill_database()¶: Generate arrays of database information and fill database with information

fill_reaction_components_dict(rxn, compound, typecpd)¶: Fill reaction info (substrates) in to dictionary

generate_reactions()¶: Get reactions from MINE files

open_mspfile(filename)¶: Opens and reads msp files

Database.remove_duplicate_cpds¶

class Database.remove_duplicate_cpds.OverlappingCpdIDs(database)¶

Bases: object

identify_overlapping_ids()¶: find cpd IDs that are have different IDs (i.e. inchi and cpd for the same compound) and fix

Database module¶

Submodules¶

Database.initialize_database¶

Database.query¶

Database.build_ATLAS_db¶

Database.build_kbase_db¶

Database.build_modelseed¶

Database.build_user_rxns_db¶

Database.build_KEGG_db¶

Database.build_metacyc_db¶

Database.build_SPRESI_db¶

Database.build_MINE_db¶

Database.remove_duplicate_cpds¶

Table of Contents

This Page