CytoSQL - Help

This documentation assumes that system requirements are fulfilled, that the plugin is succesfully installed, and that the user is familiar with SQL and the databases to be queried. Below you can find an explanation of the most typical ways of using CytoSQL. We recommend that you also have a look at the other documentation pages, available from the left menu, that describe how columns are mapped and offers schemes of its modes of operation. The toy example helps you to rapidly test common use cases with a simple artifical database.

1. Database definition

The database definitions are accessible via the "Adjust database connection" button on the main CytoSQL plugin panel. When clicked, a connection definition panel is brought up, which contains a series of fields:

1.1. Preformatted database specifications: Use this part of the form if you are using the built-in MySQL or PostGreSQL drivers. If you want to use a different driver, select the "Custom" option in the Driver dropdown box, and proceed to 1.2.

- Driver: This defines the JAVA driver used to connect CytoSQL to the database. To use the built in drivers you can use the dropdown box and just select the appropriate driver.

for MySQL: com.mysql.jdbc.Driver
for postgreSQL: org.postgresql.Driver
(for other drivers, you should select the "Custom" driver option and proceed to 1.2)

- Database host: hostname

for example localhost

- Port Number: (may be empty with default ports)

for example 3306

- Database name:

for example MyDatabase

1.2. Custom database specification

Only relevant if you specified "custom jdbc database driver" in the driver field above.

Driver: reference to the jdbc driver.

For example: org.sqlite.JDBC

URL: This defines the location of the database server. The hostname should be preceded by the jdbc path. For example (in case the database is on your local machine):

for MySQL: jdbc:mysql://localhost/MyDatabase
for postgreSQL: jdbc:postgresql://localhost/MyDatabase
for sqlite: jdbc:sqlite:/Applications/MAMP/db/sqlite/MyDatabase

In some cases it may be necessary to specify the port used for the connection, for example:
for PostgreSQL: jdbc:postgresql://localhost:5432

(note: these examples are for illustration purposes only. For Postgresql and MySQL, it is easier to use the fields described in 1.1, and ignore section 1.2)

1.3. Username: the name of the user with the appropriate permissions.

1.4. Password: the password associated with the username specified above.

2. Executing SQL queries

CytoSQL links relational databases to Cytoscape through SQL statements. An SQL statement returns a results set as a table, consisting of rows and columns. Rows correspond to network components, i.e. nodes or edges. Columns correspond to the attributes of these components: they will be interpreted as node or edge properties, depending on the mode of operation and the user selected mappings.

Launching a query is simple:
- specify the database connection if you did not yet do that
- Enter an SQL query
- Select a mode of operation
- Preview the results in the preview window
- Press "import" if you like the results.

In the meanwhile, the plugin will prompt to specify how the different columns should be mapped to Cytoscape network elements (see further).

CytoSQL supports four modes of operation.

The first one enables you to generate a completely new network using an SQL statement (de novo queries). This approach is the simplest: it converts the output of a query to new Cytoscape network components.

The other three allow you to extend an existing Cytoscape network (extension queries), either by adding extra node or edge attributes, or by adding additional nodes / edges that connect with already present nodes. Any of these three modes need SQL queries that use some properties of the existing Cytoscape network as arguments. For that purpose, CytoSQL employs a dedicated feature, called "bind variables". Bind variables are replacers for certain Cytoscape attributes: you can use them in a query, and then define which Cytosscape attribute should be mapped upon each bind variable. CytoSQL supports an unlimited number of bind variables.

2.1. De novo queries

Mode 1: Create Network

The network will be built using two columns of the results table. From version 1.0, the user is free which of the columns cary the new node identifiers. Each row in the results set corresponds to one edge. Nodes are created as necessary using two preferred columns as a unique ID. The remaining columns in the query will be added to the network as attributes of either the nodes or the edges. The user will be prompted by a dialog box to specify for each column whether it should become a source node, target node, an attribute of the source or target node, or an attribute of the edge between them.

Example Query 1:

SELECT protein1, protein2, protein1Description, protein2Description, interactionEvidence FROM interactions

Upon launching this query, the plugin will prompt the user to define the right mappings. The first two retrieved columns may be imported as source and target nodes, respectively. In this case, you would probably map protein1Description as an attribute for the source node, protein2Description as an attribute of the target node, and interactionEvidence as an attribute of the edge.

3.2. Extension queries

Extension queries proceed from an already existing network and a set of selected nodes (to update node attributes) or edges (to update edge attributes). If no nodes or edges are selected, the tool auto-selects all nodes / edges.

Mode 2: Update node attributes

The query is executed once for each selected node, with the bind variable properly replaced, and the results are processed.

Bind variables are used to match query result columns to attributes of the selected nodes. In most cases, at least one bind-variable will be necessary and used as the unique identifier of the node in the queried table(s). Bind variables are entered in the query using a question mark ("?"). The software will ask for the attribute(s) mapped to each bind variable in the order that they are present in the query.

Example Query 2:

SELECT accession, proteinLocalization FROM localization WHERE accession=?

This query selects all accession numbers and protein localizations from a hypothetical localization table, with the bind variable ? in the WHERE statement replaced by a Cytoscape attribute. In this case we may map the Cytoscape node ID attribute to bind variable 0.

Remember that the query will be executed once for each selected node. If an attribute update query yields more than one results row, the attributes get overwritten by the last results row in the set. It is up to the user to verify the use of unique identifiers as bind variables.

Mode 3: Update edge attributes

The query is executed once for each selected edge. Bind variables shouls be used to specify the unique edge characteristics in the WHERE statement. An edge is ususally defined either by the combination of a source and a target node attribute, or by a (unique) edge attribute.

Example Query 3:

SELECT protein1, protein2, evidence from interactions where protein1=? and protein2=?

In this case, the query uses two bind variables. Bind variable 0 should be mapped to a source node identifier, and bind variable 1 should be mapped to a target node identifier. If directionality is not relevant, you may execute the query twice, with source and target attributes reversed.

Keep in mind that an edge attribute update query only yields new edge attributes. If you used columns to identify an edge by its node attributes, they will be re-imported as edge attributes. You may like to delet them afterwards.

Example Query 4:

SELECT interactionID, protein1, protein2, evidence from interactions where interactionID=?

Here only one bind variable is used: an interaction identifier.

Mode 4: Extend network

This type of query will result in additional edges or nodes in the existing network. One of the results columns is used as a unique node identifier. If it does not exist in the network yet, a new node is created using this ID. For each selected node the query is run and for each row in the resultset an edge is added between the current "query-node" and the node identified by the selected results column. The user will be asked whether the "query-node" is source or target of the new edge. Similarly to the create network query extra columns in the result will become attributes in the network belonging to either the "query-node", the newly added node or the newly created edge.

Example Query 5:

SELECT protein2 FROM preys WHERE protein1 = ?