Spark - Neighbor Discovery¶
Introduction¶
Spark
is the neighbor discovery module of Open/R. It leverages IPv6 link-local
multicast via UDP socket to discover and maintain adjacencies, aka “Neighbor
Relationships”. The discovered neighbors, aka “Local Topology” of the node, is
fed into the system for KvStore database synchronization, and SPF Computation.
Inter Module Communication¶
[Producer] ReplicateQueue<NeighborInitEvent>
: sends out neighbor events viaNeighborUpdatesQueue
toLinkMonitor
, which includes:UP
/DOWN
/RESTART
/RTT-CHANGE
events or InitializationEventNEIGHBOR_DISCOVERED
.[Consumer] RQueue<thrift::InterfaceDatabase>
: receives interface database update viaInterfaceUpdatesQueue
fromLinkMonitor
. Neighbor discovery will be applied on those interfaces ONLY.
Operations¶
Spark
relies on LinkMonitor
for interface and address notifications of the
underlying system. Then states and timers will decide which types of packet to
be periodically sent. See Module Deep Dive for details. Upon link event,
corresponding interfaces will be modified inside database:
Interface UP: start performing neighbor discovery and record neighbor state;
Interface DOWN: remove tracked neighbor over this interface and generate neighbor down notification;
Areas¶
Spark only forms adjacencies with nodes in the same area. This is configured in
the area
stanza of the Open/R config. For example, the following node is in
two areas:
"areas": [
{
"area_id": "area1",
"include_interface_regexes": [
"eth[0-9]"
],
},
{
"area_id": "area2",
"include_interface_regexes": [
"eth[0-9]"
],
}
],
Wildcard Area¶
Open/R has a special area that is treated as the wildcard area. This is area “0”. Interfaces configured into this area will form adjacencies with any other node not validating what area they claim to be in. Example Config:
"areas": [
{
"area_id": "0",
"include_interface_regexes": [
"eth0"
]
}
]
Deep Dive¶
Spark Packet¶
Spark
communicates to peer spark instance by broadcasting a UDP packet to
link-local multicast address ff02::1
. The packet is sent over every configured
interface. The content of the packet evolves as neighboring spark instances
starts to learn about each other.
The content of the packet is a serialized thrift object of type SparkPacket
.
This internally consists of three main messages as its attributes. There will be
only one of them populated at a time to reduce control plane traffic.
SparkHelloMsg
SparkHandshakeMsg
SparkHeartbeatMsg
NOTE: By using thrift serialization/deserialization we completely avoid the encoding and decoding complexity of data exchange. Thrift further provides a good backward compatibility support as message structure evolves.
Check out if/Types.thrift for detailed message structure.
High level speaking:
SparkHelloMsg
=> Sent out periodically over all configured interfaces. It broadcasts discovered neighbor advertisements and future neighbor solicitation.SparkHandshakeMsg
=> Negotiation message to establish neighbor relationship, aka theAdjacency
. Similar to TCP 3-way handshake process. Negotiation includes version, timers, and area configuration that we’ll discuss below.SparkHeartbeatMsg
=> Send out peridodically for keep-alive purpose.
Finite State Machine¶
Spark
leverages Finite State Machine (FSM) to formulate neighbor state and its
transitions on event. FSM formulation ensures the correctness of state handling
and also greatly simplify the complexity of implementation.
SparkNeighState
- IDLE
- WARM
- NEGOTIATE
- ESTABLISHED
- RESTART
SparkNeighEvent
- HELLO_RCVD_INFO => SparkHelloMsg received with node's self-info inside
- HELLO_RCVD_NO_INFO => SparkHelloMsg received without node's self-info inside;
- HELLO_RCVD_RESTART => neighbor going down and signal for GR;
- HEARTBEAT_RCVD => keep alive msg received to refresh hold timer;
- HANDSHAKE_RCVD => handshake acknowledgement;
- HEARTBEAT_TIMER_EXPIRE => hold time expired;
- NEGOTIATE_TIMER_EXPIRE => negotiate procedure timed out;
- GR_TIMER_EXPIRE => graceful restart timer expired;
- NEGOTIATION_FAILURE => negotiate procedure failed(e.g. area negotiation failure);
State Transition Map¶
---------------------------------------------------------------------------------------
| EVENTs/STATEs | IDLE | WARM | NEGOTIATE | ESTABLISHED | RESTART |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HELLO_RCVD_INFO | WARM | NEGOTIATE | | | ESTABLISHED |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HELLO_RCVD_NO_INFO | WARM | | | IDLE | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HELLO_RCVD_RESTART | | | | RESTART | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HEARTBEAT_RCVD | | | | ESTABLISHED | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HANDSHAKE_RCVD | | | ESTABLISHED | | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HEARTBEAT_TIMER_EXPIRE | | | | IDLE | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| NEGOTIATE_TIMER_EXPIRE | | | WARM | | |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| GR_TIMER_EXPIRE | | | | | IDLE |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| NEGOTIATION_FAILURE | | | WARM | | |
---------------------------------------------------------------------------------------
SparkHelloMsg¶
SparkHelloMsg
contains node name, and the list of neighbors it has heard from on this interface. This allows ALL neighbors on a segment to agree on bidirectional visibility;Functionality:
To advertise its own existence and basic neighbor information;
To ask for immediate response for quick adjacency establishment;
To notify for its own “RESTART” to neighbors;
SparkHelloMsg
is sent per interface;
SparkHandshakeMsg¶
SparkHandshakeMsg
contains rest of necessary params to establish adjacency with neighbor besides what has been learned fromSparkHelloMsg
;Functionality:
Exchange
peerAddr
andport
info for peer to establish TCP connection;areaId neogtiation;
hold time and GR time negotiation;
SparkHandshakeMsg
is sent per (interface, neighbor) combination;
NOTE:
SparkHandshakeMsg
has destination node attribute. Neighbors on the same interface will ignore this message if it is NOT destined to itself.
SparkHeartbeatMsg¶
SparkHeartbeatMsg
contains node name, sequence number;Functionality: notify its own aliveness
SparkHeartbeatMsg
is sent per interface;
Timers¶
To maintain the state machine running smoothly, there are different kinds of timers used.
helloTimer
: timer to control frequency of helloMsg. It is set per ifName;negotiateTimer
: timer to control frequency of handshakeMsg. It is set per neighbor;heartbeatTimer
: timer to control frequency of heartbeatMsg. It is set per ifName;heartbeatHoldTimer
: maximum hold time for neighbor adjacency.SparkHeartbeatMsg
will extend it;negotiateHoldTimer
: maximum time withinNEGOTIATE
state to avoid high volume of negotiate packets being sent;gracefulRestartHoldTimer
: maximum time to hold neighbor adjacency under GR;
For typical configuration of above timer, please refer to SparkConfig
section
defined in
Area Configuration¶
As area negotiation happens by default between spark instances, neighbor adjacency will ONLY be formed if they can reach agreement on area.
For instance, nodeA and nodeB negotiates with area over ethernet1
.
On nodeA:
AreaConfig = {
area_id : "1",
interface_regexes : ["ethernet1, port-channel.*"],
neighbor_regexes : ["nodeB"]
}
On nodeB:
AreaConfig = {
area_id : "1",
interface_regexes : ["ethernet1, port-channel.*"],
neighbor_regexes : ["nodeA"]
}
Both nodes will apply combination of neighbor_regex
(i.e. regex for node_name)
and interface_regex
(i.e. interface on which neighbor is discovered) to
identify what area neighbor should fall into. With above example, both nodeA and
nodeB think neighbor should be in area 1
. Hence the negotiation will go
through.
NOTE: negotiation failure will trigger state transition from
NEGOTIATE
toWARM
and stop sendingSparkHandshakeMsg
. See FSM transition part for deatils.
RTT Measurement¶
With spark exchanging multicast packets for neighbor discovery we can easily
deduce the RTT between neighbors (reflection time). To reduce noise in RTT
measurements we use Kernel Timestamps
. To avoid noisy RTT_CHANGED
events we
use StepDetector
so that small changes in RTT measurements are ignored.
Fast Neighbor Discovery¶
When a node starts or a new link comes up, we perform fast initial neighbor
discovery by sending SparkHelloMsg
with solicitResponse
bit set. This is to
request immediate reply, which allows quicker discovery of new
neighbors(configurable).
References¶
Link-local address: https://en.wikipedia.org/wiki/Link-local_address
Multicast address: https://en.wikipedia.org/wiki/Multicast_address