# Dependency graph
For accuracy and performance, HyperFormula needs to process cells in a correct and optimal order. For example: in formula C1=A1+B1
, cells A1
and B1
need to be evaluated before C1
.
To find the right order of processing cells, HyperFormula builds a dependency graph (opens new window) which captures relationships between cells.
# Cells in the dependency graph
In the dependency graph, each spreadsheet cell is represented by a separate node.
Nodes X
and Y
are connected by a directed edge if and only if the formula in cell X
includes the address of cell Y
.
# Ranges in the dependency graph
If formulas in the spreadsheet include ranges, each range is represented by a separate node. The dependency graph may also contain ranges that are not used by any formula, for better optimization.
Range nodes can be connected to cell nodes and to other range nodes.
# Optimizations for large ranges
In many applications, you may want to use formulas that depend on a
large range of cells. For example, the formula SUM(A1:A100)+B5
depends on 101 cells, and it needs to be represented in the dependency graph accordingly.
An interesting optimization challenge arises when there are multiple cells that depend on large ranges. For example, consider the following use-case:
B1=SUM(A1:A1)
B2=SUM(A1:A2)
B3=SUM(A1:A3)
- ...
B100=SUM(A1:A100)
The problem is that there are 1+2+3+...+100 = 5050
dependencies
for such a simple situation. In general, for n
such rows, the
engine would need to add n*(n+1)/2 ā nĀ²
arcs in the graph. This
value grows much faster than the size of data, meaning the engine
would not be able to handle large data sets efficiently.
A solution to this problem comes from the observation that there is a way to rewrite the above formulas to equivalent ones, which will be more compact to represent. Specifically, the following formulas would compute the same values as the ones provided previously:
B1=A1
B2=B1+A2
B3=B2+A3
- ...
B100=B99+A100
Whereas this example is too specialized to provide a useful rule for optimization, it shows the main idea behind efficient handling of multiple ranges: to represent a range as a composition of smaller ranges.
In the adopted implementation, every time the engine encounters a
range, say B5:D20
, it checks if it has already considered the
range which is one row shorter. In this example, it would be B5:D19
.
If so, then it represents B5:D20
as the composition of a range
B5:D19
and three cells in the last row: B20
,C20
and D20
.
More generally, the result of any associative operation is obtained
as the result of operations for these small rows. There are many
examples of such associative functions: SUM
, MAX
, COUNT
, etc.
As one range can be used in different formulas, we can reuse its
node and avoid duplicating the work during computation.
# Getting the immediate precedents of a cell or a range
To get the immediate precedents of a cell or a range (the in-neighbors of the cell node or the range node), use the getCellPrecedents()
method:
const hfInstance = HyperFormula.buildFromArray([[ '1', '2', '=A1', '=B1+C1' ]]);
hfInstance.getCellPrecedents({ sheet: 0, col: 3, row: 0 });
// returns [{ sheet: 0, col: 1, row: 0 }, { sheet: 0, col: 2, row: 0 }]
# Getting the immediate dependents of a cell or a range
To get the immediate dependents of a cell or a range (the out-neighbors of the cell node or the range node), use the getCellDependents()
method:
const hfInstance = HyperFormula.buildFromArray([[ '1', '=A1', '=A1+B1', '=B1+C1' ]])
hfInstance.getCellDependents({ sheet: 0, col: 0, row: 0 })
// returns [{ sheet: 0, col: 1, row: 0 }, { sheet: 0, col: 2, row: 0 }]
# Getting all precedents of a cell or a range
To get all precedents of a cell or a range (all precedent nodes reachable from the cell node or the range node), use the getCellPrecedents()
method to implement a Breadth-first search (BFS) (opens new window) algorithm:
AllCellPrecedents={start}
let Q be an empty queue
Q.enqueue(start)
while Q is not empty do
cell := Q.dequeue()
S := getCellPrecedents(cell)
for all cells c in S do:
if c is not in AllCellPrecedents then:
insert w to AllCellPrecedents
Q.enqueue(c)
# Getting all dependents of a cell or a range
To get all dependents of a cell or a range (all dependent nodes reachable from the cell node or the range node), use the getCellDependents()
method to implement a Breadth-first search (BFS) (opens new window) algorithm:
AllCellDependents={start}
let Q be an empty queue
Q.enqueue(start)
while Q is not empty do
cell := Q.dequeue()
S := getCellDependents(cell)
for all cells c in S do:
if c is not in AllCellDependents then:
insert w to AllCellDependents
Q.enqueue(c)
ā Key concepts Building & testing ā