Spark - Neighbor Discovery

Introduction


Spark is the neighbor discovery module of Open/R. It leverages IPv6 link-local multicast via UDP socket to discover and maintain adjacencies, aka “Neighbor Relationships”. The discovered neighbors, aka “Local Topology” of the node, is fed into the system for KvStore database synchronization, and SPF Computation.

Inter Module Communication


Spark Intermodule Communication

  • [Producer] ReplicateQueue<NeighborInitEvent>: sends out neighbor events via NeighborUpdatesQueue to LinkMonitor, which includes: UP/DOWN/RESTART/RTT-CHANGE events or InitializationEvent NEIGHBOR_DISCOVERED.

  • [Consumer] RQueue<thrift::InterfaceDatabase>: receives interface database update via InterfaceUpdatesQueue from LinkMonitor. Neighbor discovery will be applied on those interfaces ONLY.

Operations


Spark relies on LinkMonitor for interface and address notifications of the underlying system. Then states and timers will decide which types of packet to be periodically sent. See Module Deep Dive for details. Upon link event, corresponding interfaces will be modified inside database:

  • Interface UP: start performing neighbor discovery and record neighbor state;

  • Interface DOWN: remove tracked neighbor over this interface and generate neighbor down notification;

Areas

Spark only forms adjacencies with nodes in the same area. This is configured in the area stanza of the Open/R config. For example, the following node is in two areas:

"areas": [
  {
    "area_id": "area1",
    "include_interface_regexes": [
      "eth[0-9]"
    ],
  },
  {
    "area_id": "area2",
    "include_interface_regexes": [
      "eth[0-9]"
    ],
  }
],

Wildcard Area

Open/R has a special area that is treated as the wildcard area. This is area “0”. Interfaces configured into this area will form adjacencies with any other node not validating what area they claim to be in. Example Config:

"areas": [
  {
    "area_id": "0",
    "include_interface_regexes": [
      "eth0"
    ]
  }
]

Deep Dive


Spark Packet

Spark communicates to peer spark instance by broadcasting a UDP packet to link-local multicast address ff02::1. The packet is sent over every configured interface. The content of the packet evolves as neighboring spark instances starts to learn about each other.

The content of the packet is a serialized thrift object of type SparkPacket. This internally consists of three main messages as its attributes. There will be only one of them populated at a time to reduce control plane traffic.

  • SparkHelloMsg

  • SparkHandshakeMsg

  • SparkHeartbeatMsg

NOTE: By using thrift serialization/deserialization we completely avoid the encoding and decoding complexity of data exchange. Thrift further provides a good backward compatibility support as message structure evolves.

Check out if/Types.thrift for detailed message structure.

High level speaking:

  • SparkHelloMsg => Sent out periodically over all configured interfaces. It broadcasts discovered neighbor advertisements and future neighbor solicitation.

  • SparkHandshakeMsg => Negotiation message to establish neighbor relationship, aka the Adjacency. Similar to TCP 3-way handshake process. Negotiation includes version, timers, and area configuration that we’ll discuss below.

  • SparkHeartbeatMsg => Send out peridodically for keep-alive purpose.

Finite State Machine

State Transition Diagram

Spark leverages Finite State Machine (FSM) to formulate neighbor state and its transitions on event. FSM formulation ensures the correctness of state handling and also greatly simplify the complexity of implementation.

SparkNeighState
- IDLE
- WARM
- NEGOTIATE
- ESTABLISHED
- RESTART

SparkNeighEvent
- HELLO_RCVD_INFO           => SparkHelloMsg received with node's self-info inside
- HELLO_RCVD_NO_INFO        => SparkHelloMsg received without node's self-info inside;
- HELLO_RCVD_RESTART        => neighbor going down and signal for GR;
- HEARTBEAT_RCVD            => keep alive msg received to refresh hold timer;
- HANDSHAKE_RCVD            => handshake acknowledgement;
- HEARTBEAT_TIMER_EXPIRE    => hold time expired;
- NEGOTIATE_TIMER_EXPIRE    => negotiate procedure timed out;
- GR_TIMER_EXPIRE           => graceful restart timer expired;
- NEGOTIATION_FAILURE       => negotiate procedure failed(e.g. area negotiation failure);

State Transition Map

---------------------------------------------------------------------------------------
| EVENTs/STATEs          | IDLE |   WARM    | NEGOTIATE   | ESTABLISHED | RESTART     |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HELLO_RCVD_INFO        | WARM | NEGOTIATE |             |             | ESTABLISHED |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HELLO_RCVD_NO_INFO     | WARM |           |             | IDLE        |             |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HELLO_RCVD_RESTART     |      |           |             | RESTART     |             |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HEARTBEAT_RCVD         |      |           |             | ESTABLISHED |             |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HANDSHAKE_RCVD         |      |           | ESTABLISHED |             |             |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| HEARTBEAT_TIMER_EXPIRE |      |           |             | IDLE        |             |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| NEGOTIATE_TIMER_EXPIRE |      |           | WARM        |             |             |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| GR_TIMER_EXPIRE        |      |           |             |             | IDLE        |
| ---------------------- | ---- | --------- | ----------- | ----------- | ----------- |
| NEGOTIATION_FAILURE    |      |           | WARM        |             |             |
---------------------------------------------------------------------------------------

SparkHelloMsg

  • SparkHelloMsg contains node name, and the list of neighbors it has heard from on this interface. This allows ALL neighbors on a segment to agree on bidirectional visibility;

  • Functionality:

    1. To advertise its own existence and basic neighbor information;

    2. To ask for immediate response for quick adjacency establishment;

    3. To notify for its own “RESTART” to neighbors;

  • SparkHelloMsg is sent per interface;

SparkHandshakeMsg

  • SparkHandshakeMsg contains rest of necessary params to establish adjacency with neighbor besides what has been learned from SparkHelloMsg;

  • Functionality:

    1. Exchange peerAddr and port info for peer to establish TCP connection;

    2. areaId neogtiation;

    3. hold time and GR time negotiation;

  • SparkHandshakeMsg is sent per (interface, neighbor) combination;

NOTE: SparkHandshakeMsg has destination node attribute. Neighbors on the same interface will ignore this message if it is NOT destined to itself.

SparkHeartbeatMsg

  • SparkHeartbeatMsg contains node name, sequence number;

  • Functionality: notify its own aliveness

  • SparkHeartbeatMsg is sent per interface;

Timers

To maintain the state machine running smoothly, there are different kinds of timers used.

  1. helloTimer: timer to control frequency of helloMsg. It is set per ifName;

  2. negotiateTimer: timer to control frequency of handshakeMsg. It is set per neighbor;

  3. heartbeatTimer: timer to control frequency of heartbeatMsg. It is set per ifName;

  4. heartbeatHoldTimer: maximum hold time for neighbor adjacency. SparkHeartbeatMsg will extend it;

  5. negotiateHoldTimer: maximum time within NEGOTIATE state to avoid high volume of negotiate packets being sent;

  6. gracefulRestartHoldTimer: maximum time to hold neighbor adjacency under GR;

For typical configuration of above timer, please refer to SparkConfig section defined in

Area Configuration

As area negotiation happens by default between spark instances, neighbor adjacency will ONLY be formed if they can reach agreement on area.

For instance, nodeA and nodeB negotiates with area over ethernet1.

On nodeA:
AreaConfig = {
    area_id : "1",
    interface_regexes : ["ethernet1, port-channel.*"],
    neighbor_regexes : ["nodeB"]
}
On nodeB:
AreaConfig = {
    area_id : "1",
    interface_regexes : ["ethernet1, port-channel.*"],
    neighbor_regexes : ["nodeA"]
}

Both nodes will apply combination of neighbor_regex(i.e. regex for node_name) and interface_regex(i.e. interface on which neighbor is discovered) to identify what area neighbor should fall into. With above example, both nodeA and nodeB think neighbor should be in area 1. Hence the negotiation will go through.

NOTE: negotiation failure will trigger state transition from NEGOTIATE to WARM and stop sending SparkHandshakeMsg. See FSM transition part for deatils.

RTT Measurement

With spark exchanging multicast packets for neighbor discovery we can easily deduce the RTT between neighbors (reflection time). To reduce noise in RTT measurements we use Kernel Timestamps. To avoid noisy RTT_CHANGED events we use StepDetector so that small changes in RTT measurements are ignored.

Fast Neighbor Discovery

When a node starts or a new link comes up, we perform fast initial neighbor discovery by sending SparkHelloMsg with solicitResponse bit set. This is to request immediate reply, which allows quicker discovery of new neighbors(configurable).

References


  • Link-local address: https://en.wikipedia.org/wiki/Link-local_address

  • Multicast address: https://en.wikipedia.org/wiki/Multicast_address