[ Previous Section | Documentation Home ]

Synopsis

This section describes the internal workings of OpenVirteX and its various subsystems. These subsystems allow OVX to manipulate the components described in Section II in order to virtualize OpenFlow networks.

3.1 System Overview
3.2 Startup and Shutdown
3.3 The Event Loops
3.4 Network Discovery and Presentation
3.5 Virtualization and De-virtualization
3.6 State Synchronization
3.7 Resilience
3.8 Persistence
3.9 The JSONRPC API


3.1 System Overview

OVX is separated into several major parts:

  • A network-facing southbound half that builds and maintains a representation of the infrastructure (PhysicalNetwork), and manages OpenFlow channels between OVX and the datapaths
  • A tenant-facing northbound half that presents each tenant with a virtual network of software switches (OVXSwitches) and links (OVXLinks), and manages OpenFlow channels between OVXSwitches and tenant controllers
  • Global maps (OVXMap, PhysicalPort.ovxPortMap) that map PhysicalNetwork and OVXNetwork Components onto eachother, bridging the two halves
  • An API server that listens for JSONRPC calls for system configuration and system/network state information

The global mappings are populated as OVXNetworks are created, and channel management is done for the two halves in separate IO Loops. The loops are joined on a per-message basis for each message that 1) has a source or destination OVXNetwork, and 2) must cross the north-south gap i.e. has its virtualize() or devirtualize() method called. This ‘decoupled by default’ state of OVX’s two halves makes it possible to dynamically reconfigure OVXNetworks during runtime by manipulating the global mappings on the fly via API calls, and allows OVX to remain connected to the network even without the presence of tenants.

The rest of Section 3 is devoted to the subsystems and mechanisms that enable OVX to function in this way, and importantly, to create isolated virtual OpenFlow networks.

Return to index


3.2 Startup and Shutdown

This section describes the startup and shutdown process of OVX.

3.2.1 Main process startup
3.2.2 PhysicalNetwork / Southbound channel initialization
3.2.3 Tenant Network (OVXNetwork) / Northbound channel initialization
3.2.4 System shutdown

3.2.1 Main process startup

The main method of OVX is found in OpenVirteX.java [package net.onrc.openvirtex.core]. The main method parses the command line arguments for system settings, and launches OpenVirteXController. OpenVirteXController implements the core OVX runnable, and keeps track of system configurations such as:

  • Path to system configuration files (currently unused by OVX)
  • Host and port OVX’s southbound half listens on for OpenFlow connections
  • Host and port OVX should connect to for persistent storage
  • Max number of worker threads
  • Max number of tenant networks
  • Polling rate of PhysicalNetwork statistics

For the full list, refer to the OpenVirteXController constructor or OVX’s help functions.

In addition to initializing these settings, OpenVirteXController, in order:

  1. initializes the single PhysicalNetwork instance
  2. attempts to connect to the database to recover earlier virtual network configurations (if any exist)
  3. starts up the API server, and
  4. initializes southbound channel handlers

OVX begins listening for API calls at step 3, and connections from datapaths at step 4, at which point OVX is considered to be initialized.

3.2.2 PhysicalNetwork population / Southbound channel initialization

The structures in PhysicalNetwork are populated as switches connect to OVX and links are discovered through them. OVX initializes network topology discovery on a per-datapath basis, creating a PhysicalSwitch and SwitchDiscoveryManager for each new switch. As mentioned before, from a switch’s perspective, OVX appears to be a controller. Therefore, connection establishment follows the OpenFlow handshake, with OVX taking on the role of the controller. Fig 3.1 shows the state machine associated with the switch.

ClientFSM

Fig. 3.1 : State machine of southbound handshake with a datapath, with OVX acting as a controller.

The SwitchChannelHandler [net.onrc.openvirtex.core.io] implements this state machine in the ChannelState enum. Each ChannelState value represents a state that the datapath can take, and defines state-specific message handling methods. ChannelState also defines a default behavior, but the state-specific methods override it whenever a datapath is in a specific state.

OVX maps a datapath to a PhysicalSwitch only when the datapath reaches the WAIT_DESCRIPTION_STAT_REPLY state. OVX configures the PhysicalSwitch with the information provided by the datapath during the handshake, and adds it to the PhysicalNetwork. A datapath is deemed to be in ACTIVE state once a SwitchDiscoveryManager is mapped to its PhysicalSwitch, and the PhysicalSwitch’s StatisticsManager is enabled. An ACTIVE PhysicalSwitch may participate in network discovery and the event loop. The details of state and topology discovery are discussed in Section 3.4, and the event loop in Section 3.3.

3.2.3 Tenant network (OVXNetwork) / Northbound channel initialization

A tenant network is created, configured, and initialized via API calls. Procedurally, OVXNetwork creation involves the following:

  1. Declare an OVXNetwork, the Address block used, and tenant controller(s) to connect the OVXNetwork to
  2. Create OVXSwitches from available PhysicalSwitches
  3. Add OVXPorts to the OVXSwitches
  4. Add OVXLinks, Hosts, and for BVSes, SwitchRoutes
  5. If manual, specify paths for OVXLinks and SwitchRoutes
  6. Optionally, add backup paths for OVXLinks and SwitchRoutes
  7. Initialize the OVXNetwork

Internally, these commands cause OVX to:

  1. Instantiate virtual Components
  2. Map the virtual Components onto PhysicalNetwork components in global mappings, and
  3. Bring virtual Components to ACTIVE state, in dependency-imposed order, booting the OVXNetwork as the end result

The steps in the first list are associated with one or more calls to the API server [net.onrc.openvirtex.api.server]. These calls are handled by tenant handlers [api.server.handlers.tenant]. Table 1 shows the (current) tenant handlers, the virtual elements that they instantiate, and the parameters used for mapping:

Elements Parameters Tenant Handler(s)
OVXNetwork tenant controller host/port, IP block CreateOVXNetwork
OVXSwitch tenant ID, PhysicalSwitch(es) CreateOVXSwitch
OVXPort tenant ID, PhysicalSwitch, PhysicalPort CreateOVXPort
OVXLink tenant ID, source/destination OVXSwitch and OVXPort, intermediate PhysicalSwitches and PhysicalPorts (PhysicalLinks) ConnectOVXLink, SetOVXLinkPath
SwitchRoute tenant ID, source/destination OVXPort, OVXSwitch, PhysicalLinks ConnectOVXRoute
Host tenant ID, OVXSwitch, OVXPort, MAC Address ConnectHost

Note that calls that create virtual Components and paths require knowledge about the available physical Components and network topology. Also note that the above table is not a complete list of API handlers, just those that play a role in OVXNetwork initialization. This page provides a full list of API calls and their syntax, and Section 3.9 covers the API Server in depth.

Step 3 of the second list initializes the OVXNetwork through the process illustrated in Fig.3.2.

VnetInit

Fig 3.2 : The startup process for a tenant network, beginning with a call to the API Server.

As discussed before, each class that implements the Component interface (OVXNetwork, OVXSwitch, OVXPort) contains a register() and boot() method that implements parts of the initialization process. Note that initialization follows a certain order, starting with components whose states influence those of the subsequently initialized components. The state dependencies between various OVX objects are illustrated in Fig.3.3.

dependencies

Fig 3.3 : The dependency graph for OVX components. Arrow direction indicates a “influences the state of” relationship, or if the arrow is followed in the reverse direction, an “influences the mappings within” relationship. For example, removal of an OVXPort implies that any OVXLinks, Hosts, and SwitchRoutes attached to the port are deleted. In turn, an OVXSwitch containing the port must remove it from its portMap, and the OVXNetwork must remove any deleted OVXLinks from its linkSet. The black arrows indicate lookups restricted to either the physical or virtual halves of the split, and green, dotted arrows indicate dependencies that cross the gap.

We refer back to Fig.3.3 when we talk about internal state synchronization.

The ControllerChannelHandler [net.onrc.openvirtex.core.io] implements the state machine associated with the handshake between OVXSwitch instances and the tenant controller, shown in Fig.3.4. An OVXNetwork is considered to be in ACTIVE state once all of its switches have successfully connected to the controller.

ServerFSM

Fig 3.4 : The controller state machine, from the perspective of OVX as a datapath.

3.2.4 System shutdown

Shutdown is handled through OpenVirtexShutdownHook [net.onrc.openvirtex.core.io], which calls OpenVirteXController.terminate(). This method closes network and tenant-facing channels, deregisters the PhysicalNetwork (i.e. brings it to STOPPED state), and disconnects OVX from its database.

Given that the PhysicalNetwork is populated, and OVXNetworks are present, the shutdown process takes down components in the order dictated by the dependency graph in Fig.3.3.

Return to index


3.3 The Event Loops

This section gives an overview of the operation of the core I/O loop.

3.3.1 Overview
3.3.2 Message handling and (de)virtualization

3.3.1 Overview

The OVX event loop handles the processing of OpenFlow messages. The primary roles of the event loop are:

  • Carrying out the OpenFlow handshake with datapaths and tenant controllers (initialization) : As discussed in the previous section, OVX implements controller- and switch- side OpenFlow handshakes to establish control channels between it and the datapaths and tenant controllers.

  • Virtualization/devirtualization of OpenFlow messages : For each OpenFlow message that must ‘cross’ the physical-virtual split, OVX must be able to correctly look up which controller channel(s) it must be written to, and where needed, re-write message fields for consistency with the network views of different tenants. It must also be able to reverse this procedure by finding the datapaths that should receive a network-bound message, re-writing messages so that they are not only consistent with the PhysicalNetwork view, but also with the traffic separation between tenants, resolving overlaps between tenant header spaces.

  • Handling keep-alives to/from datapaths and controllers : datapaths and controllers exchange echo request/reply messages while idle. OVX handles these messages on a per-channel basis.

The current implementation of the event loop relies on Netty for asynchronous I/O and Java Executors for thread pools. This is a separate loop from that which handles API calls.

3.3.2 Message handling and (de)virtualization

OVXMessages implement either one or both of the two following interfaces:

Virtualizable : virtualize(PhysicalSwitch sw) : controller-bound messages
Devirtualizable : devirtualize(OVXSwitch sw) : network-bound messages

The argument to both of these interface methods is the Switch instance that has received the given message on its channel. Messages that never cross the virtual-physical gap, such as handshake messages and keep-alives (OVXEchoRequest/Reply) have empty virtualize() and devirtualize() methods. Messages that do cross the gap implement any virtual-to-physical and physical-to-virtual translation processes in their devirtualize() and virtualize() methods, respectively.

These methods are called from handleIO(), a Switch abstract method implemented in PhysicalSwitch and OVXSwitch:

@Override
public void handleIO(OFMessage msg, Channel channel) {
    this.state.handleIO(this, msg, channel);
}

The actual call to the OVXMessage methods occurs under the ACTIVE state of the SwitchState FSMs of PhysicalSwitch and OVXSwitch:

PhysicalSwitch.Switchstate.ACTIVE:

public void handleIO(PhysicalSwitch psw, final OFMessage msg, Channel ch) {
    try {
        ((Virtualizable) msg).virtualize(psw);
    } catch (final ClassCastException e) {
    psw.log.error("Received illegal message : " + msg);
    }
}

OVXSwitch.Switchstate.ACTIVE:

public void handleIO(OVXSwitch vsw, final OFMessage msg, Channel ch) {
    /*
     * Save the channel the msg came in on
     */
    msg.setXid(vsw.channelMux.translate(msg.getXid(), ch));
    try {
        /*
         * Check whether this channel (ie. controller) is permitted
         * to send this msg to the dataplane
         */

        if (vsw.roleMan.canSend(ch, msg) )
            ((Devirtualizable) msg).devirtualize(vsw);
        else
            vsw.denyAccess(ch, msg, vsw.roleMan.getRole(ch));
    } catch (final ClassCastException e) {
        OVXSwitch.log.error("Received illegal message : " + msg);
    }
}

The default behavior of the FSM is to issue a warning and to drop the message.

Figure 3.5 summarizes the high-level view of the event loop.

io_main

Fig 3.5 : The core OVX event loop showing the main message handling paths through OVX. The blue, green, and orange blocks denote procedures that logically reside in/interact with the virtual, global, and physical components, respectively. The gray steps denote those in SwitchChannelHandler (orange region) and in ControllerChannelHandler (blue region). The blue arrows represent the OpenFlow channel.

The specifics of how each message is handled in virtualize() or devirtualize() depends on how their OVXMessage class defines these methods. We do not cover every single message here but will focus on specific messages wherever they crop up.

Return to index


3.4 Network Discovery and Presentation

To carry out accurate virtualization, OVX must keep its view of the network state up-to-date. This involves:

  1. Detecting topology and flow table changes
  2. Applying the changes duly to the PhysicalNetwork/PhysicalSwitches, and
  3. Detecting, and if necessary, applying, changes that affect tenant networks.

OVX carries out both topology discovery for itself, and presents virtual topologies to its tenants. Both are achieved by manipulating LLDPs. The first two sections describe how OVX keeps its PhysicalNetwork synchronized with the network’s topology and state. It then describes how OVX presents OVXNetworks to tenant controllers in order to give the illusion that they are managing “real” networks.

3.4.1 Topology Discovery/LLDP Handling
3.4.2 PhysicalSwitch Statistics Collection
3.4.3 OVXNetwork Presentation

3.4.1 Topology discovery/LLDP handling

Physical LLDP handling. LLDP messages to/from the network are handled by SwitchDiscoveryManager instances paired with each PhysicalSwitch. As mentioned before, the pairs are found in PhysicalNetwork.discoveryManager. Every proberate milliseconds, each SwitchDiscoveryManager sends out a LLDP via the switch that it is paired with. The default proberate is 1000 ms, defined in the SwitchDiscoveryManager constructor. As LLDPs are intercepted at adjacent switches and passed up to OVX, the SwitchChannelHandler intercepts them, invoking PhysicalNetwork.HandleLLDP(). This method is defined by the LLDPEventHandler [net.onrc.openvirtex.core.io] interface, implemented by the Network superclass. HandleLLDP() invokes the SwitchDiscoveryManager paired with the PhysicalSwitch that had received the LLDP. The SwitchDiscoveryManager‘s overall behavior is illustrated below.

SDMgr-loop

Updating the PhysicalNetwork. The topology is updated in SwitchDiscoveryManager.run() every proberate milliseconds. OVX identifies two types of ports:

  • fast : ports where an LLDP was successfully received (e.g. is endpoint for a link)
  • slow : ports whose LLDPs have not been acknowledged for MAX_PROBE_COUNT LLDPs sent (e.g. port is an edgeport or not part of a link. MAX_PROBE_COUNT is currently 3.

A port whose LLDP was received is considered fast, and added to Set fastPorts of the SwitchDiscoveryManager of the recipient switch. For each LLDP sent by a fast port, its probe count is incremented; conversely, the count is decremented by an acknowledgement. The probe count is stored in Map<Short, AtomicInteger> portProbeCount. When a port’s probe count exceeds MAX_PROBE_COUNT, it is moved to Set slowPorts. The update from slow to fast port and decrement of probe count occurs in SwitchDiscoveryManager.ackProbe(), and the reverse occurs in run().

3.4.2 PhysicalSwitch Statistics Collection

As mentioned earlier, statistics associated with PhysicalSwitches are held in two structures:

  • AtomicReference<Map<Short, OVXPortStatisticsReply>> portStats;
  • AtomicReference<Map<Integer, List>> flowStats;

These maps are populated by a PhysicalSwitch’s instance of StatisticsManager[net.onrc.openvirtex.elements.datapath.statistics], which polls the corresponding datapath with OFFlowStatisticsRequests and OFPortStatisticsRequests.

Physical flow table synchronization The flow table in the PhysicalSwitch is represented as the PhysicalSwitch.flowStats structure, populated with FlowStatisticsRequests by periodically polling the corresponding datapath with OFFlowStatisticsRequests. This process is orchestrated by each PhysicalSwitch’s instance of StatisticsManager[net.onrc.openvirtex.elements.datapath.statistics]. The current polling interval is 30 seconds, set by refreshInterval. In addition to flow statistics, the StatisticsManager also collects port statistics by polling the datapath with OFPortStatisticsRequests.

3.4.3 OVXNetwork Presentation

Virtual topology presentation. OVX receives PacketOuts containing LLDPs from tenant controllers running their own topology discovery. LLDPs from tenants are handled within their OVXNetwork. By handling LLDPs in the virtual domain, OVX cuts down significantly on the LLDPs that hit the physical network.

For each output port indicated in such probe packets, OVXNetwork:

  1. looks up the destination port via its neighborPortMap
  2. constructs a PacketIn with InPort set to the destination, and
  3. sends the PacketIn back to the NOS via the OVXSwitch containing the destination port

In other words, OVX emulates the broadcast/reception of LLDP packets within a network, for each tenant’s topology. This routine is implemented in OVXNetwork.handleLLDP(). The figure below summarizes this behavior.

topology resolution

Multiple-controller tenants (role management).
TODO

Return to index


3.5 Network Virtualization

3.5.1 Switch Representation Translation
3.5.2 OpenFlow field translation – Cookies, Buffer IDs, XIDs
3.5.3 Address virtualization
3.5.4 Link and Route virtualization

Overview

In OVX, virtualization and devirtualization are the logical actions of moving across the virtual-physical split. In terms of operations on OpenFlow messages, this means:

  1. modification of source and destination network addresses
  2. translations of host attachment points to/from OVXSwitch/OVXPort and PhysicalSwitch/PhysicalPort
  3. dropping of messages originating from/destined to invalid points (hosts, switches) given virtual and physical network topologies

1 stems from the different addressing schemes used by the physical and virtual networks, for both Hosts (IP Addresses) and Switches (DPIDs and Port numbers). 2 follows logically from 1, since host attachment points will have different designations depending on the network view. 3 serves to isolate the traffic between virtual networks. This section provides descriptions of the various mechanisms that play a role in these three processes.

3.5.1 Switch Representation Translation

A key function in the virtualization process is the translation between OVXSwitches and PhysicalSwitches during message handling.

OVXSwitch -> PhysicalSwitch (Southbound)
OVXSwitches intercept southbound messages sent by tenant controllers. Two methods are used for looking up the destination PhysicalSwitch:

  • By ingress OVXPort: The PhysicalSwitch is found through the PhysicalPort mapped to the OVXPort. The OVXPort is found from a port value field in the message, e.g the in_port field of OFMatch structures. For an OVXBigSwitch, any message without an ingress port value is ignored.

  • By OVXMap lookup: For an OVXSingleSwitch, the 1:1 mapping allows OVX to do a direct lookup on the physicalSwitchMap by tenant ID.

This lookup occurs in the OVXSwitch.sendsouth() method implemented in each of the OVXSwitch subclasses.

PhysicalSwitch -> OVXSwitch (Northbound)
The reverse lookup process exploits how OVX defines tenant networks, and the conventions used by OpenFlow conversations.

tenant networks: Hosts cannot be attached to more than one OVXNetwork, and can be uniquely identified by MAC address. The tenant ID can be fetched from OVXMap’s macMap using the MAC address recovered from an OFMatch field.

OpenFlow conversations: OpenFlow uses the same values for certain fields for multiple messages that are part of the same conversation(transaction). OVX may replace cookie, XID, and/or bufferId fields of request(southbound) messages with new values that either encode context(e.g. tenantID) or can be mapped to the origin when it receives the corresponding reply(northbound). We discuss field translation in further detail in the next section.

3.5.2 OpenFlow field translation – Cookies, Buffer IDs, XIDs

OVX uses several structures to hold mappings used in field translations:

  • XidTranslator [net.onrc.openvirtex.datapath] : LRULinkedHashMap<Integer, XidPair> xidMap
  • OVXFlowTable [net.onrc.openvirtex.datapath] : ConcurrentHashMap<Long, OVXFlowMod> flowmodMap
  • OVXSwitch : LRULinkedHashMap<Integer, OVXPacketIn> bufferMap

XIDTranslator. XID values must be unique within each datapath. The XIDTranslator uses the OpenFlow XID to multiplex/demultiplex conversations between a datapath and multiple tenants. For each southbound OVXMessage, the XIDTranslator:

  1. generates a new XID
  2. creates an XidPair to store the original XID and source OVXSwitch
  3. stores the XidPair in xidMap, using the new XID as the key
  4. returns the new XID value to the caller

XidTranslator.translate() implements the above actions. The caller (PhysicalSwitch) replaces the XID of the message with this new value so that a datapath only receives messages with unique XIDs. Conversely, for northbound messages, the XIDTranslator:

  1. recovers the XidPair given the XID of the message
  2. returns the XidPair to the caller

The tenant that initiated the conversation can be recovered from the OVXSwitch found in the XidPair. This reverse process is implemented in XidTranslator.untranslate().

OVXFlowTable. The OVXFlowTable stores flow entries in the form of a map holding unmodified OVXFlowMods keyed on cookies generated by OVX. The generated cookies encodes the tenant ID of the origin:

private long generateCookie() {
    ...
        final int cookie = this.cookieCounter.getAndIncrement();
        return (long) this.vswitch.getTenantId() &lt;&lt; 32 | cookie;
    }
}

The cookieCounter ensures cookie uniqueness within an OVXSwitch. Specifically, a FlowMod is assigned a new cookie in OVXFlowMod.devirtualize() when it added to the flow table i.e. has a command value of OFPFC_ADD. The new cookie serves as a matching mechanism in the flow table e.g. when OVX receives a FlowRemoved and has to remove entries to maintain state consistency with the network. Table state maintenance is discussed here.

bufferMap. PacketIn/PacketOut pairs reference the same bufferID. OVX associates PacketIns with newly generated bufferIds before they are written north, and stores them, keyed on this new value, in the bufferMap. When OVX intercepts a PacketOut, it can recover the corresponding original PacketIn using the PacketOut’s cookie value as the key. In OVXPacketOut.devirtualize():

// use bufferID of PacketOut to recover original (unmodified) PacketIn
final OVXPacketIn cause = sw.getFromBufferMap(this.bufferId);

// recover original OFMatch, packet data
this.match = new OFMatch().loadFromPacket(cause.getPacketData(),
this.inPort);
this.setBufferId(cause.getBufferId());
ovxMatch = new OVXMatch(match);
ovxMatch.setPktData(cause.getPacketData());

As shown above, the contents of the stored PacketIns are used to reverse the address, port, and buffer ID translations that were applied to the corresponding initial PacketIn (and therefore applied to this PacketOut) when it was virtualized by OVX. The mechanics of address virtualization are discussed next.

3.5.3 Address virtualization

3.5.3.1 Overview

OVX avoids address space collisions between tenant traffic flows by creating virtual (OVXIPAddress) and physical addresses (PhysicalIPAddress) for each Host. The former is unique within an OVXNetwork, and the latter is unique in the full PhysicalNetwork. Translation between virtual and physical IP addresses guarantee that each tenant controller can handle flows in terms of its network’s addressing scheme (despite possible overlaps with other tenants’ schemes), and that the datapaths are able to distinguish traffic from different tenants.

Address translation separates datapaths into two groups:

  • edges: datapaths that are host attachment points
  • core: datapaths only connected to other datapaths

Edge datapaths are charged with rewriting IP addresses. Specifically, edge switches:

  1. Match on OVXIPAddress values in nw_src and nw_dst fields, rewriting them to PhysicalIPAddress values, for network-bound traffic
  2. Match on PhysicalIPAddress values, rewriting them to OVXIPAddres values, for host-bound traffic

And core switches match and forward in terms of PhysicalIPAddresses.

OVX intercepts and alters FlowMods in order to impose these behaviors onto the datapaths. In addition to FlowMods, OVX also alters PacketIns and PacketOuts such that core datapaths only ‘see’ PhysicalIPAddress values, and controllers, OVXIPAddres values. Fig.3.6 illustrates the address translation process, and Table 2 shows a possible set of FlowMods pushed by the tenant controller to OVX, and by OVX to the datapaths, in order to achieve this behavior.

addr_virt

a) The PacketIn is sent to the tenant controller without modifications, with OVXIP values.
b) The corresponding FlowMod instructs matching on OVXIP, and rewrite of those values to PhysicalIPs.
c) The virtual link is mapped back to the two-hop path across psw2.
d) The PacketIn at the destination edge is translated similarly to those in the core network.
e) OVX installs FlowMods that match on PhysicalIPs and rewrite them to OVXIPs.

Fig 3.6 : The address virtualization process across three datapaths. The numbers next to the switches denote port numbers. The port numbering may differ in the tenant and actual networks, even for exact mappings, such as psw1 and vsw1. Here, h1 begins sending packets to h2. OVX handles PacketIns (tenant-bound arrows) and FlowMods (network-bound arrows) differently according to the location of the target datapath with respect to the source and destination hosts.

Table 2 : A possible set of FlowMods pushed to each OVXSwitch and datapath.

Tenant NOS -> OVXSwitch OpenVirteX -> datapath
OFMatch OFAction OFMatch OFAction
vsw1 nw_src=10.0.0.1
nw_dst=10.0.0.3
in_port=1
output=2 sw1 nw_src=10.0.0.1
nw_dst=10.0.0.3
in_port=1
nw_src=1.0.0.1
nw_dst=1.0.0.2
output=2
nw_src=10.0.0.3
nw_dst=10.0.0.1
in_port=2
output=1 nw_src=1.0.0.2
nw_dst=1.0.0.1
in_port=2
nw_src=10.0.0.3
nw_dst=10.0.0.1
output=1
vlink1 sw2 nw_src=1.0.0.1
nw_dst=1.0.0.2
in_port=2
output=3
nw_src=1.0.0.2
nw_dst=1.0.0.1
in_port=3
output=2
vsw3 nw_src=10.0.0.1
nw_dst=10.0.0.3
in_port=2
output=1 sw3 nw_src=10.0.0.1
nw_dst=10.0.0.3
in_port=2
nw_src=1.0.0.1
nw_dst=1.0.0.2
output=1
nw_src=10.0.0.3
nw_dst=10.0.0.1
in_port=1
output=2 nw_src=1.0.0.2
nw_dst=1.0.0.1
in_port=1
nw_src=10.0.0.3
nw_dst=10.0.0.1
output=2

A caveat to this behavior is in the handling of ARP messages; ARP is further discussed with link and route virtualization in Section 3.5.4.

3.5.3.2 Implementations

The translation procedure is implemented across several OVXMessage classes:

PhysicalIPAddress -> OVXIPAddress:
* OVXPacketIn

OVXIPAddress -> PhysicalIPAddress:
* OVXPacketOut
* OVXFlowMod
* OVXActionNetworkLayerSource/Destination

Figures 3.7, 3.8,and 3.9 illustrate, in order, the (de)virtualization process for OVXPacketIn, OVXPacketOut, and OVXFlowMod messages.

PacketIn

Fig 3.7: PacketIn virtualization.

PacketOut

Fig 3.8 PacketOut devirtualization

FlowMod

Fig 3.9 FlowMod devirtualization

3.5.4 Link and Route virtualization

TODO

Return to index


3.6 State Synchronization

3.6.1 Component State Coordination
3.6.2 Error/Event Escalation
3.6.3 Flow Table State Synchronization

3.6.1 Component State Coordination

3.6.2 Error Escalation

OVX uses errors intercpeted from the network to synchronize its PhysicalNetwork with the topology of the network.

Errors in the network – e.g. ports, links, and switches going down – are propagated to OVX as OFPortStatus messages. The current implementation of OVX expects PortStatus messages with OFPortReason fields of value OFPPR_DELETE to be sent by a failing switch. These PortStatus messages are handled as OVXPortStatus [net.onrc.openvirtex.messages] instances by OVX.

The handling of OVXPortStatus messages depends on OVX’s state. In the simplest case, no tenant networks exist and only ports, links, and switches in the PhysicalNetwork are removed. Even with tenants, OVX is capable of hiding away error conditions in the network given virtual topologies with certain properties:

  • networks of OVXBigSwitches: A port failure in a non-OVXPort port is analogous to a failure in the fabric. A loss of a Port in a BVS mapped to a well-connected network can be completely hidden from a tenant if alternate paths exist between the OVXPorts of the BVS, or no SwitchRoutes use them.

  • Redundant OVXLinks: If multiple paths are available between the OVXPorts defining an OVXLink, failure of a Port in one path may be suppressed by failing over to another path.

Additionally, failure of unmapped ports reduce to the simplest case. Figure 3.10 illustrates the failure scenarios that can be suppressed by OVX.

err_ignore

Fig 3.10 : Three scenarios where errors can be suppressed. Left) PhysicalSwitches b and c are unmapped to the OVXNetwork. The tenant is completely ignorant of b and c and any errors associated with them. Middle) Multiple physical paths map onto the OVXLink between vs1 and vs2 ([a-b,b-d],[a-d],[a-c,c-d]…), providing plenty of backup paths. No link failures are reported to the tenant unless all paths between a and d, or PhysicalPorts mapped to OVXPorts fail. Right) The whole PhysicalNetwork maps to one BVS and its crossbars. Failures of PhysicalLinks, Ports, and Switches may be hidden unless the SwitchRoute between a and d run out of paths, or OVXPorts fail.

OVXBigSwitch and OVXLink resiliency are discussed in detail in Section 3.7.

Incidentally, error escalation only comes into the picture when the affected PhysicalPorts are:

  • mapped to OVXPorts of OVXLinks and SwitchRoutes
  • parts of non-resilient paths
  • mapped to OVXSwitch edge ports

The removal process of a deleted PhysicalPort follow the dependency tree described in Fig.3.3 Section 3.2.3. The full error escalation process is shown in Figure 3.10, and is implemented in OVXPortStatus.virtualize().

$$$

Figure 3.10 : The algorithm used to modify network representations according to OFPortStatus message contents.

3.6.3 Flow Table State Synchronization

OVXFlowTable Synchronization An OVXFlowTable stores southbound FlowMods before they are altered by the devirtualization process, and represents the flow table that a tenant controller would see if it were to query a switch (in reality, an OVXSwitch) for its table contents. OVX keeps an up-to-date flow table for an OVXSwitch by handling OVXFlowMods [net.onrc.openvirtex.messages] as if it were a datapath handling FlowMod messages:

/* Within class OVXFlowMod */
public void devirtualize(final OVXSwitch sw) {
    ...
    FlowTable ft = this.sw.getFlowTable();
    ...
    long cookie = ((OVXFlowTable) ft).getCookie();
    //Store the virtual flowMod and obtain the physical cookie
    ovxMatch.setCookie(cookie);
    /* update sw's OVXFlowTable */
    boolean pflag = ft.handleFlowMods(this, cookie);

OVXFlowTable.handleFlowMods() modifies the entries in an OVXFlowTable instance according to the command field value of a FlowMod. The flow entry matching mechanism is implemented by OVXFlowEntry [net.onrc.openvirtex.elements.datapath], a wrapper class for OVXFlowMods.

After the virtual flow table is updated, the devirtualization process sends the FlowMod south.

Physical flow table synchronization The flow table in the PhysicalSwitch is represented as the PhysicalSwitch.flowStats structure, populated with FlowStatisticsRequests by periodically polling the corresponding datapath with OFFlowStatisticsRequests. This process is orchestrated by each PhysicalSwitch’s instance of StatisticsManager [net.onrc.openvirtex.elements.datapath.statistics]. The current polling interval is 30 seconds, set by refreshInterval. In addition to flow statistics, the StatisticsManager also collects port statistics by polling the datapath with OFPortStatisticsRequests.

Synchronization between flow tables The physical flow table is implicitly synchronized with OVXFlowTables that map to it via devirtualized FlowMods. Each FlowMod sent south also has the OFPFF_SEDN_FLOW_REM flag set so that its expiration is reported back to OVX as an OFFlowRemoved. the virtualize() method of OVXFlowRemoved [net.onrc.openvirtex.messages] determines and removes the FlowMods that match using the cookie value:

public void virtualize(final PhysicalSwitch sw) {
    /* determine tenant from cookie */
    int tid = (int) (this.cookie &gt;&gt; 32);
    ...
    try {
        /* find which OVXSwitch's flowtable is affected */
        OVXSwitch vsw = sw.getMap().getVirtualSwitch(sw, tid);
        if (vsw.getFlowTable().hasFlowMod(this.cookie)) {
            OVXFlowMod fm = vsw.getFlowMod(this.cookie);
            vsw.deleteFlowMod(this.cookie);
            /* send north ONLY if tenant controller wanted a FlowRemoved for the FlowMod*/
            if (fm.hasFlag(OFFlowMod.OFPFF_SEND_FLOW_REM)) {
            writeFields(fm);
            vsw.sendMsg(this, sw);
        }
    }
    ...
}

Return to index


3.7 Resilience

Network elements inevitably fail. OVX attempts to reduce the impact of infrastructure failures on OVXNetworks by allowing certain Components to be mapped redundantly onto the PhysicalNetwork:

  • OVXLinks : multiple paths
  • SwitchRoute : multiple paths
  • OVXBigSwitch : multiple SwitchRoutes, sets of PhysicalSwitches, or SwitchRoutes with multiple paths

Note: The last case has yet to be implemented, and is hypothetical. Future releases are expected to support BVS resilience.

A Component mapped to multiple paths can switch to alternate paths when ports and links fail in the network. This allows continued traffic handling with minimal disruption. Components that support failover mappings implement the Resilient[net.onrc.openvirtex.elements] interface. This interface provides two methods:

  • public boolean tryRecovery(Component c) : Given the failure of c, attempt to switch over to any backup mappings, if possible
  • public boolean tryRevert(Component c) : Given the resumed function of c, attempt to switch back to the original (favored) mapping

Currently, the two Components that implement Resilient are OVXLink and SwitchRoute. Both utilize similar mechanisms to implement resilience. Fig.3.11 and 3.12 illustrate the flowcharts for tryRecovery() and tryRevert() for these two Components, respectively.

tryRecovery

Fig.3.11: The failover precess, when a PhysicalLink goes down. The highest-priority path not containing the failed link replaces the current path. The displaced path is added to the list of broken links, and the new link removed from available backups.

tryRevert

Fig.3.12: The recovery process, after a failed PhysicalLink comes back up. A Component will try to revert to using the mappings that it started with. In Virtual links like OVXLink and SwitchRoute, this is assumed to be the path with the highest priority value. Paths that were broken earlier are moved from the ‘broken’ to the ‘backups’ list.

In the above figures, the ‘broken’ and ‘backups’ lists correspond to the previously-discussed unusableLinks/Routes and backupLinks/Routes TreeMap<Byte, List> structures, respectively. All paths available to a Virtual link are moved between these two TreeMaps as links fail and recover, with the exception of the currently functional path, which is moved to the global mapping for the Virtual link.

Traffic flow disruption is reduced by transferring flows between switched paths by reinstalling sets of FlowMods to guide the trafic through the new path. This is implemented in the switchPath() method for both OVXLinks and SwitchRoutes.

Return to index


3.8 Persistence

This section describes the subsystem that implements the persistence of virtual network configurations.

3.8.1 Overview
3.8.2 Parameters
3.8.3 Related Packages and Classes
3.8.4 Saving Configurations
3.8.5 Updating Configurations
3.8.6 Restoring Configurations

3.8.1 Overview

As mentioned in the previous section, OVX supports the persistence of administratively configured network topologies. When provided with a storage backend (database) that it can connect to, OVX saves the network topology to the database, and rebuilds the network topology again from the stored data at restarting. Currently we use MongoDB as the database backend.

Before and after restarting OVX, not only the network topology, but also all IDs (tenant ID, DPID, port number, link ID, route ID and host ID) are preserved. However, SwitchRoutes in OVXBigSwitches with algorithms set to “spf” are not kept, i.e. will be automatically regenerated across restarts.

Note: Currently, the flow entries in virtual switches and physical switches are not restored except for initial flow entries. We are now developing “live migration and snapshotting” to support system continuity, where all necessary flows are preserved.

3.8.2 Configuration Parameters

Command line options can be used to configure how OVX interacts with the storage backend:

Option Argument Comments
-dh or –db-host hostname default: "127.0.0.1"
-dp or –db-port port default:27017

Note, there are two cases where OVX starts up without pre-configured virtual topologies:

  • If OVX can’t connect to the database: Currently, this generates error messages in the log. These messages won’t interfere with the regular operation of OVX.
  • Using the option “–db-clear”: All persistence data is deleted from storage.

3.8.3 Related Packages and Classes

In addition to the Persistable interface, [net.onrc.openvirtex.db] is also associated with persistence. This package contains classes that define the document for OVXNetworks, and wrappers that allow OVX to interface with MongoDB.

The rest of this section will give overviews of the member classes in [net.onrc.openvirtex.db].

3.8.3.1 class DBManager

DBManager implements the read/write operations to the storage backend. It is instantiated as a singleton when OVX is started.

Fields

// Database collection names
public static final String DB_CONFIG = "CONFIG";
public static final String DB_USER = "USER";
public static final String DB_VNET = "VNET";

// Database object
private DBConnection dbConnection;

// Map of collection names and collection objects
private Map<String, DBCollection> collections;

// Mapping between physical dpids and a list of vnet managers
private Map<Long, List<OVXNetworkManager>> dpidToMngr;
// Mapping between physical links and a list of vnet managers
private Map<DPIDandPortPair, List<OVXNetworkManager>> linkToMngr;
// Mapping between physical ports and a list of vnet managers
private Map<DPIDandPort, List<OVXNetworkManager>> portToMngr;

Methods

// Initialize database connection
public void init(String host, Integer port, boolean clear)

// Create a document in database from persistable object obj
public void createDoc(Persistable obj)
// Remove a document
public void removeDoc(Persistable obj)

// Save an element to the list of specified key in document
public void save(Persistable obj
// Remove an element from the list of specified key in document
public void remove(Persistable obj)

// Reads all virtual networks from database and spawn an OVXNetworkManager
// for each.
private void readOVXNetworks()

// Reads virtual components from a list of maps in db format and registers the
// physical components in their manager.
private void readOVXSwitches(List<Map<String, Object>> switches,
                        OVXNetworkManager mngr)
private void readOVXLinks(List<Map<String, Object>> links,
                        OVXNetworkManager mngr)
private void readOVXPorts(List<Map<String, Object>> ports,
                        OVXNetworkManager mngr)
private void readOVXRoutes(List<Map<String, Object>> routes,
                        OVXNetworkManager mngr)

3.8.3.2 class OVXNetworkManager

OVXNetworkManager recreates a tenant network from storage, and is created per virtual network. The way in which it rebuilds a tenant network is described here.

Fields

// Document of virtual network
private Map<String, Object> vnet;

private Integer tenantId;

// Set of offline and online physical switches
private Set<Long> offlineSwitches;
private Set<Long> onlineSwitches;

// Set of offline and online physical links identified as (dpid, port number)-pair
private Set<DPIDandPortPair> offlineLinks;
private Set<DPIDandPortPair> onlineLinks;

// Set of offline and online physical ports
private Set<DPIDandPort> offlinePorts;
private Set<DPIDandPort> onlinePorts;

private boolean bootState;

Methods

// Register a physical component to offline list
public void registerSwitch(final Long dpid)
public void registerLink(final DPIDandPortPair dpp)
public void registerPort(final DPIDandPort port)

// Delete a physical component from offline list,
// add it to online list,
// and then, if all physical components are online,
// create a virtual network.
public synchronized void setSwitch(final Long dpid)
public synchronized void unsetSwitch(final Long dpid)
public synchronized void setLink(final DPIDandPortPair dpp)

3.8.3.3 interface DBConnection

DBConnection is an interface that defines the methods that must be implemented in order for OVX to interact with various storage backends. Class MongoConnection implements this interface with MongoDB-specific methods to connect() and disconnect() from the database.

3.8.4 Storing Configurations

3.8.4.1 Overview
3.8.4.2 Mechanism
3.8.4.3 Persistible Components

3.8.4.1 Overview

When virtual Components are instantiated, their information is added to the database as documents. Currently, the Components stored in the database are the following:

  • OVXNetwork
  • OVXSingleSwitch
  • OVXBigSwitch
  • OVXPort
  • OVXLink
  • SwitchRoute
  • Host

The remainder of this section describes the mechanisms and structures involved in storage.

3.8.4.2 Mechanism

When persistable Components are instantiated, their register() method is called. In the register(), DBManager.save() is called with an object which implements Persistable. The method save():

  1. Gets target collection by getDBName() e.g. “VNET”
  2. Gets query index by getDBIndex() e.g. { “tenantId”:1 }
  3. Gets key by getDBKey() and value by getDBObject() e.g. key is “switches”, value is { “dpids”:[4], “vdpid”:400 }
  4. Adds (updates) this value into the list of this key by using MongoDB’s $addToSet operator. If the initial set is {“switches”:[{“dpids”:[1], “vdpid”:100}]}, this becomes {“switches”:[{“dpids”:[1], “vdpid”:100}, {“dpids”:[4], “vdpid”:400}]}

Note, $addToSet doesn’t allow for duplication in the list. Refer to MongoDB’s documentations for further detail.

3.8.4.3 Persistible Components

The Components that implement Perisitable and are stored to database are OVXSwitch subclasses (OVXSingleSwitch, OVXBigSwitch), OVXLink, SwitchRoute, OVXPort and Host.

Note that OVXNetwork, Link, and Port also implement Persistable but are not stored to the database, i.e. aren’t stored by DBManager.save(). This is to allow PhysicalLink (extends Link) and PhysicalPort (extends Port), as well as OVXNetwork, to use some of Persistable‘s methods.

Add tables or links to them in storage API section

3.8.5 Updating (Deleting) Configurations

When components (switches, links, ports, hosts) are updated, OVX deletes the old instance and replaces it with a new instance. Elements in the database will be deleted and created at corresponding times. The procedure differs between OVXNetworks and other Components:

OVXNetworks : DBManager.removeDoc() deletes a document of the specified virtual network. This method is called by OVXNetwork.unregister().

Other Elements : DBManager.remove() deletes an element in the list of the value for specified key by the $pull operation of MongoDB. This method is called by component inactivation methods:

  • unregisterDP() – OVXSwitch
  • unregister() – OVXPort, OVXLink, SwitchRoute, OVXHost

3.8.6 Restoring Configurations

Upon booting, OVX adds the Physical Components that were previously stored in the DB to the “offline list”. This “offline list” is a checklist tracking whether if physical entities (switches, links, ports) are offline or not. When OVX detects that a physical element is active, OVX creates its corresponding physical Component instance (PhysicalSwitch, PhysicalPort, PhysicalLink). If all physical entities become live, OVX restores the saved OVXNetwork(s), complete with their virtual components (OVXSwitch, OVXPort, OVXLink, Host, etc.).

Return to index


3.9 JSONRPC API

3.9.1 The API Server
3.9.2 The OVX GUI
3.9.3 The Network Embedder

3.9.1 The API Server

TODO

3.9.2 The OVX GUI

TODO

3.9.3 The Network Embedder

TODO

Return to index


[ Previous Section | Documentation Home ]


Please send feedback and questions to ovx-discuss – at – googlegroups.com