Spark - Neighbor Discovery¶
Introduction¶
Spark is the neighbor discovery module of Open/R. It leverages IPv6 link-local
multicast via UDP socket to discover and maintain adjacencies, aka “Neighbor
Relationships”. The discovered neighbors, aka “Local Topology” of the node, is
fed into the system for KvStore database synchronization, and SPF Computation.
Inter Module Communication¶

[Producer] ReplicateQueue<NeighborInitEvent>: sends out neighbor events viaNeighborUpdatesQueuetoLinkMonitor, which includes:UP/DOWN/RESTART/RTT-CHANGEevents or InitializationEventNEIGHBOR_DISCOVERED.[Consumer] RQueue<thrift::InterfaceDatabase>: receives interface database update viaInterfaceUpdatesQueuefromLinkMonitor. Neighbor discovery will be applied on those interfaces ONLY.
Operations¶
Spark relies on LinkMonitor for interface and address notifications of the
underlying system. Then states and timers will decide which types of packet to
be periodically sent. See Module Deep Dive for details. Upon link event,
corresponding interfaces will be modified inside database:
Interface UP: start performing neighbor discovery and record neighbor state;
Interface DOWN: remove tracked neighbor over this interface and generate neighbor down notification;
Areas¶
Spark only forms adjacencies with nodes in the same area. This is configured in
the area stanza of the Open/R config. For example, the following node is in
two areas:
"areas": [
{
"area_id": "area1",
"include_interface_regexes": [
"eth[0-9]"
],
},
{
"area_id": "area2",
"include_interface_regexes": [
"eth[0-9]"
],
}
],
Wildcard Area¶
Open/R has a special area that is treated as the wildcard area. This is area “0”. Interfaces configured into this area will form adjacencies with any other node not validating what area they claim to be in. Example Config:
"areas": [
{
"area_id": "0",
"include_interface_regexes": [
"eth0"
]
}
]
Deep Dive¶
Spark Packet¶
Spark communicates to peer spark instance by broadcasting a UDP packet to
link-local multicast address ff02::1. The packet is sent over every configured
interface. The content of the packet evolves as neighboring spark instances
starts to learn about each other.
The content of the packet is a serialized thrift object of type SparkPacket.
This internally consists of three main messages as its attributes. There will be
only one of them populated at a time to reduce control plane traffic.
SparkHelloMsgSparkHandshakeMsgSparkHeartbeatMsg
NOTE: By using thrift serialization/deserialization we completely avoid the encoding and decoding complexity of data exchange. Thrift further provides a good backward compatibility support as message structure evolves.
Check out if/Types.thrift for detailed message structure.
High level speaking:
SparkHelloMsg=> Sent out periodically over all configured interfaces. It broadcasts discovered neighbor advertisements and future neighbor solicitation.SparkHandshakeMsg=> Negotiation message to establish neighbor relationship, aka theAdjacency. Similar to TCP 3-way handshake process. Negotiation includes version, timers, and area configuration that we’ll discuss below.SparkHeartbeatMsg=> Send out peridodically for keep-alive purpose.
Finite State Machine¶

Spark leverages Finite State Machine (FSM) to formulate neighbor state and its
transitions on event. FSM formulation ensures the correctness of state handling
and also greatly simplify the complexity of implementation.
SparkNeighState
- IDLE
- WARM
- NEGOTIATE
- ESTABLISHED
- RESTART
SparkNeighEvent
- HELLO_RCVD_INFO => SparkHelloMsg received with node's self-info inside
- HELLO_RCVD_NO_INFO => SparkHelloMsg received without node's self-info inside;
- HELLO_RCVD_RESTART => neighbor going down and signal for GR;
- HEARTBEAT_RCVD => keep alive msg received to refresh hold timer;
- HANDSHAKE_RCVD => handshake acknowledgement;
- HEARTBEAT_TIMER_EXPIRE => hold time expired;
- NEGOTIATE_TIMER_EXPIRE => negotiate procedure timed out;
- GR_TIMER_EXPIRE => graceful restart timer expired;
- NEGOTIATION_FAILURE => negotiate procedure failed(e.g. area negotiation failure);
State Transition Map¶
---------------------------------------------------------------------------------------
| EVENTs/STATEs | IDLE | WARM | NEGOTIATE | ESTABLISHED | RESTART |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HELLO_RCVD_INFO | WARM | NEGOTIATE | | | ESTABLISHED |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HELLO_RCVD_NO_INFO | WARM | | | IDLE | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HELLO_RCVD_RESTART | | | | RESTART | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HEARTBEAT_RCVD | | | | ESTABLISHED | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HANDSHAKE_RCVD | | | ESTABLISHED | | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HEARTBEAT_TIMER_EXPIRE | | | | IDLE | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| NEGOTIATE_TIMER_EXPIRE | | | WARM | | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| GR_TIMER_EXPIRE | | | | | IDLE |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| NEGOTIATION_FAILURE | | | WARM | | |
---------------------------------------------------------------------------------------
SparkHelloMsg¶
SparkHelloMsgcontains node name, and the list of neighbors it has heard from on this interface. This allows ALL neighbors on a segment to agree on bidirectional visibility;Functionality:
To advertise its own existence and basic neighbor information;
To ask for immediate response for quick adjacency establishment;
To notify for its own “RESTART” to neighbors;
SparkHelloMsgis sent per interface;
SparkHandshakeMsg¶
SparkHandshakeMsgcontains rest of necessary params to establish adjacency with neighbor besides what has been learned fromSparkHelloMsg;Functionality:
Exchange
peerAddrandportinfo for peer to establish TCP connection;areaId neogtiation;
hold time and GR time negotiation;
SparkHandshakeMsgis sent per (interface, neighbor) combination;
NOTE:
SparkHandshakeMsghas destination node attribute. Neighbors on the same interface will ignore this message if it is NOT destined to itself.
SparkHeartbeatMsg¶
SparkHeartbeatMsgcontains node name, sequence number;Functionality: notify its own aliveness
SparkHeartbeatMsgis sent per interface;
Timers¶
To maintain the state machine running smoothly, there are different kinds of timers used.
helloTimer: timer to control frequency of helloMsg. It is set per ifName;negotiateTimer: timer to control frequency of handshakeMsg. It is set per neighbor;heartbeatTimer: timer to control frequency of heartbeatMsg. It is set per ifName;heartbeatHoldTimer: maximum hold time for neighbor adjacency.SparkHeartbeatMsgwill extend it;negotiateHoldTimer: maximum time withinNEGOTIATEstate to avoid high volume of negotiate packets being sent;gracefulRestartHoldTimer: maximum time to hold neighbor adjacency under GR;
For typical configuration of above timer, please refer to SparkConfig section
defined in
Area Configuration¶
As area negotiation happens by default between spark instances, neighbor adjacency will ONLY be formed if they can reach agreement on area.
For instance, nodeA and nodeB negotiates with area over ethernet1.
On nodeA:
AreaConfig = {
area_id : "1",
interface_regexes : ["ethernet1, port-channel.*"],
neighbor_regexes : ["nodeB"]
}
On nodeB:
AreaConfig = {
area_id : "1",
interface_regexes : ["ethernet1, port-channel.*"],
neighbor_regexes : ["nodeA"]
}
Both nodes will apply combination of neighbor_regex(i.e. regex for node_name)
and interface_regex(i.e. interface on which neighbor is discovered) to
identify what area neighbor should fall into. With above example, both nodeA and
nodeB think neighbor should be in area 1. Hence the negotiation will go
through.
NOTE: negotiation failure will trigger state transition from
NEGOTIATEtoWARMand stop sendingSparkHandshakeMsg. See FSM transition part for deatils.
RTT Measurement¶
With spark exchanging multicast packets for neighbor discovery we can easily
deduce the RTT between neighbors (reflection time). To reduce noise in RTT
measurements we use Kernel Timestamps. To avoid noisy RTT_CHANGED events we
use StepDetector so that small changes in RTT measurements are ignored.
Fast Neighbor Discovery¶
When a node starts or a new link comes up, we perform fast initial neighbor
discovery by sending SparkHelloMsg with solicitResponse bit set. This is to
request immediate reply, which allows quicker discovery of new
neighbors(configurable).
References¶
Link-local address: https://en.wikipedia.org/wiki/Link-local_address
Multicast address: https://en.wikipedia.org/wiki/Multicast_address