The small-scale structure of a graph contains important information about that graph’s function. *Network motifs* [1] are defined as subgraphs of a larger graph that appear significantly more often in that large graph than would be expected in a purely random graph with the same number of nodes and edges. It was found that motifs can easily be used to distinguish networks of different functions from each other, like genetic transcription networks from ecosystem food webs [1]. To know whether a certain subgraph is a motif in a specific explored graph, its expected number of appearances in random graphs should be known. This means the subgraph must be counted in a large number of random graphs [1]. This may be sped up by sampling [2] or parallellisation [3], but the random graphs are still needed.

*graphlets*. All graphlets on 2 to 5 nodes are shown in Fig. 1. They are all assigned a unique identification number, which can be used to refer to that specific graphlet (shown under each graphlet in Fig. 1). Originally, graphlets were restricted to 5 or fewer nodes because of limitations in computing power. Later research [5, 6] increased that number of nodes. Regardless of the exact computing power available, the exponentially exploding number of graphlets enforces the use of some cut-off on the number of nodes in a graphlet. Therefore, if the term “all graphlets” is used in this paper, it means “all graphlets on

*k*or fewer nodes”, with

*k*some arbitrary but fixed cut-off.

Each graphlet’s nodes can be subdivided into symmetrically equivalent sets, called *orbits* [7]. For example, consider graphlet 1, the three-node path, in Fig. 1. By mirroring the graphlet, the two outside nodes can be swapped, which does not change the graphlet’s structure. Therefore, these two nodes are contained in the same orbit. The middle node can never be interchanged with either of the outside nodes without changing the graphlet’s structure. This means that this node is in a different orbit. In Fig. 1, the outer nodes of graphlet 1 are coloured black, whereas the middle node is coloured white, to indicate their respective orbits. Likewise, in all other graphlets in Fig. 1, the nodes are shaded according to their orbit. Like graphlets, orbits are numbered. These numbers are unique across all graphlets, so an orbit number can be used to identify both the orbit and its graphlet. In Fig. 1, the number of each orbit is indicated close to one of its nodes.

When a graphlet is found in an explored graph, it is possible to determine which graph node is in what orbit of that graphlet. That orbit is then said to *touch* that node. The number of times orbit *i* touches node *v* is called *v*’s *i*th *graphlet degree* [7]. Graphlet degrees are an extension of a node’s degree, which itself is equivalent to the 0th graphlet degree, i.e. the graphlet degree of the 2-node graphlet. As an extension of a graph’s degree distribution, the distribution of all of the graphlet degrees can be calculated, which is called the graph’s *graphlet degree distribution* or *GDD*. These graphlet degree distributions themselves are a useful tool for network comparison [7] and identification of nodes’ function [8].

To calculate a graph’s GDD, it seems at first sight that all graphlets within that graph need to be found. However, it was proved in [9] that this is not the case. They showed that, to calculate a node’s graphlet degrees (using graphlets on up to *k* nodes), a system of linear equations can be composed from all graphlets on *k*−1 nodes and all common neighbours of up to *k*−2 nodes in the explored graph. Solving this system of equations then results in the node’s graphlet degrees. Using equations allows part of the calculation to be done in advance and reused while counting multiple different orbits.

The ORCA algorithm (ORbit Counting Algorithm) [9] is the fastest available algorithm to calculate all nodes’ graphlet degrees. ORCA can count the orbits of graphlets up to either 4 or 5 nodes and uses such a system of equations to reduce this to finding graphlets on 3 or 4 nodes, respectively.

The *Jesse* algorithm, which managed to automatically generate equations for orbits of arbitrary order, as well as use them to count those orbits, was described in [5]. To this end, a non-ambiguous, extendable graphlet and orbit numbering was introduced. This numbering does not follow the original numbering in Fig. 1, but is constructed in an algorithmical, consistent way. Likewise, a novel type of graphlet that represents a specific orbit was introduced. In these graphlets, a single node is marked, which singles out that node’s orbit. These so-called *orbit representatives* were used as the nodes of a directed tree, in which the arcs were the addition of nodes and edges to these orbit representatives. This tree can then be used to find orbit representatives on *k*−1 nodes within a graph, after which automatically generated [10] equations are used to calculate the number of orbits of graphlets on *k* nodes.

However, many redundant equations are generated, so a large linearly dependent system of equations arises. Originally, the equations used in Jesse were chosen on a first-come-first-served basis. If Jesse generated more than 1 equation that could be used to count orbit *i*, only the one that was generated first was saved and used. All other generated equations are just as correct, though.

Once the system of equations is completed, it can be straightforwardly solved as its left-hand side, where graphlet degrees are related to each other, is an upper triangular matrix. However, filling in the right-hand side of all equations is more time-consuming, as this depends on the graph’s structure. Therefore, different, but equally correct sets of equations can be used and have an impact on Jesse’s running time.

Other techniques to count graphlets have been developed: ranging from combinatorial counting that uses graphlets’ symmetries and substructures to count them without finding any graphlet, which is currently limited to graphlets on 4 nodes [11], to sampling techniques that do not count all graphlets but identify randomly sampled subgraphs [6], or incremental counting [12].

In this paper, we investigate whether it is possible to select a set of equations that allows the graphlet degrees to be calculated faster than when using other sets, and whether this set depends on the explored graph’s properties. First, a number of graph theory terms need to be defined. A high-level explanation of Jesse’s internal structure follows. Then, the structure of a graphlet counting equation is explained, and the variable properties of equations are identified. With all of these, the effect of equation selection on Jesse’s speed can be investigated, accounting for different graph types, orders and sizes. Finally, the Cytoscape version of Jesse will be presented.

### Formal graph theory definitions

This section is a collection of formal definitions of terms which will be used later on in the paper. These are formal definitions for all terms used in this paper, but for more information on how and why these terms were constructed we refer to [10].

###
**Definition 1**

*undirected graph*

$$G = (V,E),$$

where *V* is the collection of *nodes* and *E* is the collection of *edges* of *G*, such that

$$E subseteq {{v,w} | (v, w in V wedge v neq w)}.$$

Similarly, *V*(*G*) and *E*(*G*) are used to indicate *G*’s node and edge set, respectively. *n*=|*V*| is called the graph’s order, while *m*=|*E*| is called its size.

###
**Definition 2**

*G*’s

*density*

$$D(G) = frac{|E|}{binom{|V|}{2}} = frac{2|E|}{|V|(|V|-1)}.$$

As it is the graph’s size divided by the maximal size that graphs of that order can have,

$$0 leq D(G) leq 1.$$

###
**Definition 3**

*isomorphisms*between two graphs

*G*and

*H*are the one-to-one functions that map

*G*’s nodes to

*H*’s nodes, such that their edge sets are mapped onto each other as well.

$$begin{aligned}{}text{Iso}(G,H) &= {f:{V}(G) rightarrow {V}(H)|(f text{ is bijective }\ &quad wedge (forall v, w in {V}(G): ({v,w}\&quad in {E}(G) iff { f(v), f(w) } in {E}(V))))}. end{aligned} $$

*isomorphic*if they have at least one isomorphism.

$$G simeq H iff text{Iso}(G,H) neq emptyset $$

###
**Definition 4**

*automorphisms*of a graph

*G*are the isomorphisms of

*G*to itself.

$$text{Aut} (G) = text{Iso}(G,G)$$

###
**Definition 5**

$$text{Ind} (v) = text{Ind} (w) iff v = w. $$

This index is chosen arbitrarily at graph creation. As a form of shorthand, we note *v*<*w* for Ind(*v*)<Ind(*w*).

###
**Definition 6**

*induced subgraph*on a subset

*V*

_{s}⊆

*V*(

*G*)of a graph

*G*’s nodes is the graph formed by

*V*

_{s}and all edges between those nodes that are present in

*G*.

$$G[V_{s}] = (V_{s}, { {v,w} in E(G) : v, w in V_{s}}) $$

###
**Definition 7**

*number of common neighbours*of a subset

*V*

_{s}⊆

*V*(

*G*) of a graph

*G*’s nodes will be noted as

*c*(

*V*

_{s}).

$$c(V_{s}) = | {v in V(G) |(forall w in V_{s} : {v,w} in E(G))} | $$

Following [9], *c*({*u*,*v*,…}) will also be noted as *c*(*u*,*v*,…).

###
**Definition 8**

A *graphlet**G* is a connected graph. Graphlets are assigned an unique numbering, such that the notation *G*_{i} denotes a unique graphlet for each value of *i*∈*ℕ*. A *k**-graphlet* is a graphlet with *k* nodes.

###
**Definition 9**

The *instances* of a graphlet *G*_{i} in a graph *H* are the induced subgraphs of *H* that are isomorphic with *G*_{i}.

###
**Definition 10**

*automorphism orbit*, or

*orbit*in short, of a node

*v*in a graphlet

*G*

_{i}is the set of nodes to which

*v*can be mapped by an automorphism of

*G*

_{i}.

$$text{Orb}(G_{i},v) = left{w in V(G_{i}) | left(exists f in text{Aut} (G_{i}) : f(v) = w right)right} $$

Orbits are assigned a unique numbering, spanning over all graphlets, such that the notation Orb_{j} can be used to identify both the orbit and its graphlet.

###
**Definition 11**

*orbit representative*is a graphlet with a marked node

*x*.

$$Omega = (V,E,x) $$

with

$$x in V wedge E subseteq {{v,w} | (v, w in V wedge v neq w) } $$

Orbit representatives follow the same numbering as orbits, such that *Ω*_{i} is the orbit representative that corresponds with Orb_{i}. The marked node of an orbit representative *Ω* will be noted as *x*(*Ω*).

###
**Definition 12**

*Ω*and

*Ψ*are the isomorphisms of

*Ω*and

*Ψ*’s graphlets that map

*Ω*’s marked node to

*Ψ*’s marked node.

$$begin{aligned} text{Iso}(Omega,Psi) &= {f:{V}(Omega) rightarrow {V}(Psi) |(f text{ is bijective} \ &quad wedge (forall v, w in {V}(Omega): ({v,w}\ &quad in {E}(Omega) iff { f(v), f(w)} in E(Psi)))\ &quad wedge (f({x}(Omega)) = {x}(Psi)))} end{aligned} $$

Two orbit representatives are said to be isomorphic if they have at least one isomorphism.

$$Omega simeq Psi iff text{Iso}(Omega,Psi) neq emptyset $$

###
**Definition 13**

*Ω*are the isomorphisms of

*Ω*to itself.

$$text{Aut}(Omega) = text{Iso}(Omega,Omega) $$

###
**Definition 14**

*induced orbit representative*on a subset

*V*

_{s}⊆

*V*(

*G*) of a graph

*G*’s nodes and a marked node

*x*∈

*V*

_{s}is the orbit representative formed by

*V*

_{s}and all edges between those nodes that are present in

*G*, with

*x*as the orbit representative’s marked node.

$$G[V_{s}, x] = left(V_{s}, left{ {v,w } in E(G) : v, w in V_{s} right}, x right) $$

###
**Definition 15**

The *instances* of an orbit representative *Ω*_{i} in a graph *H* are the induced subgraphs of *H* that are isomorphic with *Ω*_{i}.

An orbit Orb_{i}*touches* a node *v* in a graph *G* if its corresponding orbit representative *Ω*_{i} has an instance in *G* in which *v* is the marked node.

###
**Definition 16**

*ith graphlet degree*

*o*

_{i}(

*v*)of a node

*v*in a graph

*G*is the number of times orbit

*i*touches

*v*.

$$o_{i}(v) = | {S subseteq V(G) : G[S,v] simeq Omega_{j} } | $$

Once more, we refer the interested reader to [10] for more information.