[ Previous Section | Documentation Home ]
Synopsis
This section describes the internal workings of OpenVirteX and its various subsystems. These subsystems allow OVX to manipulate the components described in Section II in order to virtualize OpenFlow networks.
3.1 System Overview
3.2 Startup and Shutdown
3.3 The Event Loops
3.4 Network Discovery and Presentation
3.5 Virtualization and De-virtualization
3.6 State Synchronization
3.7 Resilience
3.8 Persistence
3.9 The JSONRPC API
3.1 System Overview
OVX is separated into several major parts:
- A network-facing southbound half that builds and maintains a representation of the infrastructure (PhysicalNetwork), and manages OpenFlow channels between OVX and the datapaths
- A tenant-facing northbound half that presents each tenant with a virtual network of software switches (OVXSwitches) and links (OVXLinks), and manages OpenFlow channels between OVXSwitches and tenant controllers
- Global maps (OVXMap, PhysicalPort.ovxPortMap) that map PhysicalNetwork and OVXNetwork Components onto eachother, bridging the two halves
- An API server that listens for JSONRPC calls for system configuration and system/network state information
The global mappings are populated as OVXNetworks are created, and channel management is done for the two halves in separate IO Loops. The loops are joined on a per-message basis for each message that 1) has a source or destination OVXNetwork, and 2) must cross the north-south gap i.e. has its virtualize() or devirtualize() method called. This ‘decoupled by default’ state of OVX’s two halves makes it possible to dynamically reconfigure OVXNetworks during runtime by manipulating the global mappings on the fly via API calls, and allows OVX to remain connected to the network even without the presence of tenants.
The rest of Section 3 is devoted to the subsystems and mechanisms that enable OVX to function in this way, and importantly, to create isolated virtual OpenFlow networks.
3.2 Startup and Shutdown
This section describes the startup and shutdown process of OVX.
3.2.1 Main process startup
3.2.2 PhysicalNetwork / Southbound channel initialization
3.2.3 Tenant Network (OVXNetwork) / Northbound channel initialization
3.2.4 System shutdown
3.2.1 Main process startup
The main method of OVX is found in OpenVirteX.java [package net.onrc.openvirtex.core]. The main method parses the command line arguments for system settings, and launches OpenVirteXController. OpenVirteXController implements the core OVX runnable, and keeps track of system configurations such as:
- Path to system configuration files (currently unused by OVX)
- Host and port OVX’s southbound half listens on for OpenFlow connections
- Host and port OVX should connect to for persistent storage
- Max number of worker threads
- Max number of tenant networks
- Polling rate of PhysicalNetwork statistics
For the full list, refer to the OpenVirteXController constructor or OVX’s help functions.
In addition to initializing these settings, OpenVirteXController, in order:
- initializes the single PhysicalNetwork instance
- attempts to connect to the database to recover earlier virtual network configurations (if any exist)
- starts up the API server, and
- initializes southbound channel handlers
OVX begins listening for API calls at step 3, and connections from datapaths at step 4, at which point OVX is considered to be initialized.
3.2.2 PhysicalNetwork population / Southbound channel initialization
The structures in PhysicalNetwork are populated as switches connect to OVX and links are discovered through them. OVX initializes network topology discovery on a per-datapath basis, creating a PhysicalSwitch and SwitchDiscoveryManager for each new switch. As mentioned before, from a switch’s perspective, OVX appears to be a controller. Therefore, connection establishment follows the OpenFlow handshake, with OVX taking on the role of the controller. Fig 3.1 shows the state machine associated with the switch.
Fig. 3.1 : State machine of southbound handshake with a datapath, with OVX acting as a controller.
The SwitchChannelHandler [net.onrc.openvirtex.core.io] implements this state machine in the ChannelState
enum. Each ChannelState
value represents a state that the datapath can take, and defines state-specific message handling methods. ChannelState
also defines a default behavior, but the state-specific methods override it whenever a datapath is in a specific state.
OVX maps a datapath to a PhysicalSwitch only when the datapath reaches the WAIT_DESCRIPTION_STAT_REPLY
state. OVX configures the PhysicalSwitch with the information provided by the datapath during the handshake, and adds it to the PhysicalNetwork. A datapath is deemed to be in ACTIVE
state once a SwitchDiscoveryManager is mapped to its PhysicalSwitch, and the PhysicalSwitch’s StatisticsManager is enabled. An ACTIVE
PhysicalSwitch may participate in network discovery and the event loop. The details of state and topology discovery are discussed in Section 3.4, and the event loop in Section 3.3.
3.2.3 Tenant network (OVXNetwork) / Northbound channel initialization
A tenant network is created, configured, and initialized via API calls. Procedurally, OVXNetwork creation involves the following:
- Declare an OVXNetwork, the Address block used, and tenant controller(s) to connect the OVXNetwork to
- Create OVXSwitches from available PhysicalSwitches
- Add OVXPorts to the OVXSwitches
- Add OVXLinks, Hosts, and for BVSes, SwitchRoutes
- If manual, specify paths for OVXLinks and SwitchRoutes
- Optionally, add backup paths for OVXLinks and SwitchRoutes
- Initialize the OVXNetwork
Internally, these commands cause OVX to:
- Instantiate virtual Components
- Map the virtual Components onto PhysicalNetwork components in global mappings, and
- Bring virtual Components to ACTIVE state, in dependency-imposed order, booting the OVXNetwork as the end result
The steps in the first list are associated with one or more calls to the API server [net.onrc.openvirtex.api.server]. These calls are handled by tenant handlers [api.server.handlers.tenant]. Table 1 shows the (current) tenant handlers, the virtual elements that they instantiate, and the parameters used for mapping:
Elements | Parameters | Tenant Handler(s) |
---|---|---|
OVXNetwork | tenant controller host/port, IP block | CreateOVXNetwork |
OVXSwitch | tenant ID, PhysicalSwitch(es) | CreateOVXSwitch |
OVXPort | tenant ID, PhysicalSwitch, PhysicalPort | CreateOVXPort |
OVXLink | tenant ID, source/destination OVXSwitch and OVXPort, intermediate PhysicalSwitches and PhysicalPorts (PhysicalLinks) | ConnectOVXLink, SetOVXLinkPath |
SwitchRoute | tenant ID, source/destination OVXPort, OVXSwitch, PhysicalLinks | ConnectOVXRoute |
Host | tenant ID, OVXSwitch, OVXPort, MAC Address | ConnectHost |
Note that calls that create virtual Components and paths require knowledge about the available physical Components and network topology. Also note that the above table is not a complete list of API handlers, just those that play a role in OVXNetwork initialization. This page provides a full list of API calls and their syntax, and Section 3.9 covers the API Server in depth.
Step 3 of the second list initializes the OVXNetwork through the process illustrated in Fig.3.2.
Fig 3.2 : The startup process for a tenant network, beginning with a call to the API Server.
As discussed before, each class that implements the Component interface (OVXNetwork, OVXSwitch, OVXPort) contains a register()
and boot()
method that implements parts of the initialization process. Note that initialization follows a certain order, starting with components whose states influence those of the subsequently initialized components. The state dependencies between various OVX objects are illustrated in Fig.3.3.
Fig 3.3 : The dependency graph for OVX components. Arrow direction indicates a “influences the state of” relationship, or if the arrow is followed in the reverse direction, an “influences the mappings within” relationship. For example, removal of an OVXPort implies that any OVXLinks, Hosts, and SwitchRoutes attached to the port are deleted. In turn, an OVXSwitch containing the port must remove it from its portMap, and the OVXNetwork must remove any deleted OVXLinks from its linkSet. The black arrows indicate lookups restricted to either the physical or virtual halves of the split, and green, dotted arrows indicate dependencies that cross the gap.
We refer back to Fig.3.3 when we talk about internal state synchronization.
The ControllerChannelHandler [net.onrc.openvirtex.core.io] implements the state machine associated with the handshake between OVXSwitch instances and the tenant controller, shown in Fig.3.4. An OVXNetwork is considered to be in ACTIVE
state once all of its switches have successfully connected to the controller.
Fig 3.4 : The controller state machine, from the perspective of OVX as a datapath.
3.2.4 System shutdown
Shutdown is handled through OpenVirtexShutdownHook [net.onrc.openvirtex.core.io], which calls OpenVirteXController.terminate()
. This method closes network and tenant-facing channels, deregisters the PhysicalNetwork (i.e. brings it to STOPPED state), and disconnects OVX from its database.
Given that the PhysicalNetwork is populated, and OVXNetworks are present, the shutdown process takes down components in the order dictated by the dependency graph in Fig.3.3.
3.3 The Event Loops
This section gives an overview of the operation of the core I/O loop.
3.3.1 Overview
3.3.2 Message handling and (de)virtualization
3.3.1 Overview
The OVX event loop handles the processing of OpenFlow messages. The primary roles of the event loop are:
- Carrying out the OpenFlow handshake with datapaths and tenant controllers (initialization) : As discussed in the previous section, OVX implements controller- and switch- side OpenFlow handshakes to establish control channels between it and the datapaths and tenant controllers.
-
Virtualization/devirtualization of OpenFlow messages : For each OpenFlow message that must ‘cross’ the physical-virtual split, OVX must be able to correctly look up which controller channel(s) it must be written to, and where needed, re-write message fields for consistency with the network views of different tenants. It must also be able to reverse this procedure by finding the datapaths that should receive a network-bound message, re-writing messages so that they are not only consistent with the PhysicalNetwork view, but also with the traffic separation between tenants, resolving overlaps between tenant header spaces.
-
Handling keep-alives to/from datapaths and controllers : datapaths and controllers exchange echo request/reply messages while idle. OVX handles these messages on a per-channel basis.
The current implementation of the event loop relies on Netty for asynchronous I/O and Java Executors for thread pools. This is a separate loop from that which handles API calls.
3.3.2 Message handling and (de)virtualization
OVXMessages implement either one or both of the two following interfaces:
Virtualizable : virtualize(PhysicalSwitch sw)
: controller-bound messages
Devirtualizable : devirtualize(OVXSwitch sw)
: network-bound messages
The argument to both of these interface methods is the Switch instance that has received the given message on its channel. Messages that never cross the virtual-physical gap, such as handshake messages and keep-alives (OVXEchoRequest/Reply) have empty virtualize() and devirtualize() methods. Messages that do cross the gap implement any virtual-to-physical and physical-to-virtual translation processes in their devirtualize() and virtualize() methods, respectively.
These methods are called from handleIO()
, a Switch abstract method implemented in PhysicalSwitch and OVXSwitch:
@Override
public void handleIO(OFMessage msg, Channel channel) {
this.state.handleIO(this, msg, channel);
}
The actual call to the OVXMessage methods occurs under the ACTIVE state of the SwitchState FSMs of PhysicalSwitch and OVXSwitch:
PhysicalSwitch.Switchstate.ACTIVE:
public void handleIO(PhysicalSwitch psw, final OFMessage msg, Channel ch) {
try {
((Virtualizable) msg).virtualize(psw);
} catch (final ClassCastException e) {
psw.log.error("Received illegal message : " + msg);
}
}
OVXSwitch.Switchstate.ACTIVE:
public void handleIO(OVXSwitch vsw, final OFMessage msg, Channel ch) {
/*
* Save the channel the msg came in on
*/
msg.setXid(vsw.channelMux.translate(msg.getXid(), ch));
try {
/*
* Check whether this channel (ie. controller) is permitted
* to send this msg to the dataplane
*/
if (vsw.roleMan.canSend(ch, msg) )
((Devirtualizable) msg).devirtualize(vsw);
else
vsw.denyAccess(ch, msg, vsw.roleMan.getRole(ch));
} catch (final ClassCastException e) {
OVXSwitch.log.error("Received illegal message : " + msg);
}
}
The default behavior of the FSM is to issue a warning and to drop the message.
Figure 3.5 summarizes the high-level view of the event loop.
Fig 3.5 : The core OVX event loop showing the main message handling paths through OVX. The blue, green, and orange blocks denote procedures that logically reside in/interact with the virtual, global, and physical components, respectively. The gray steps denote those in SwitchChannelHandler (orange region) and in ControllerChannelHandler (blue region). The blue arrows represent the OpenFlow channel.
The specifics of how each message is handled in virtualize() or devirtualize() depends on how their OVXMessage class defines these methods. We do not cover every single message here but will focus on specific messages wherever they crop up.
3.4 Network Discovery and Presentation
To carry out accurate virtualization, OVX must keep its view of the network state up-to-date. This involves:
- Detecting topology and flow table changes
- Applying the changes duly to the PhysicalNetwork/PhysicalSwitches, and
- Detecting, and if necessary, applying, changes that affect tenant networks.
OVX carries out both topology discovery for itself, and presents virtual topologies to its tenants. Both are achieved by manipulating LLDPs. The first two sections describe how OVX keeps its PhysicalNetwork synchronized with the network’s topology and state. It then describes how OVX presents OVXNetworks to tenant controllers in order to give the illusion that they are managing “real” networks.
3.4.1 Topology Discovery/LLDP Handling
3.4.2 PhysicalSwitch Statistics Collection
3.4.3 OVXNetwork Presentation
3.4.1 Topology discovery/LLDP handling
Physical LLDP handling. LLDP messages to/from the network are handled by SwitchDiscoveryManager
instances paired with each PhysicalSwitch
. As mentioned before, the pairs are found in PhysicalNetwork.discoveryManager
. Every proberate milliseconds, each SwitchDiscoveryManager
sends out a LLDP via the switch that it is paired with. The default proberate is 1000 ms, defined in the SwitchDiscoveryManager constructor. As LLDPs are intercepted at adjacent switches and passed up to OVX, the SwitchChannelHandler
intercepts them, invoking PhysicalNetwork.HandleLLDP()
. This method is defined by the LLDPEventHandler
[net.onrc.openvirtex.core.io] interface, implemented by the Network
superclass. HandleLLDP() invokes the SwitchDiscoveryManager paired with the PhysicalSwitch
that had received the LLDP. The SwitchDiscoveryManager
‘s overall behavior is illustrated below.
Updating the PhysicalNetwork. The topology is updated in SwitchDiscoveryManager.run() every proberate milliseconds. OVX identifies two types of ports:
- fast : ports where an LLDP was successfully received (e.g. is endpoint for a link)
- slow : ports whose LLDPs have not been acknowledged for MAX_PROBE_COUNT LLDPs sent (e.g. port is an edgeport or not part of a link. MAX_PROBE_COUNT is currently 3.
A port whose LLDP was received is considered fast, and added to Set fastPorts
of the SwitchDiscoveryManager of the recipient switch. For each LLDP sent by a fast port, its probe count is incremented; conversely, the count is decremented by an acknowledgement. The probe count is stored in Map<Short, AtomicInteger> portProbeCount
. When a port’s probe count exceeds MAX_PROBE_COUNT, it is moved to Set slowPorts
. The update from slow to fast port and decrement of probe count occurs in SwitchDiscoveryManager.ackProbe(), and the reverse occurs in run().
3.4.2 PhysicalSwitch Statistics Collection
As mentioned earlier, statistics associated with PhysicalSwitches are held in two structures:
- AtomicReference<Map<Short, OVXPortStatisticsReply>> portStats;
- AtomicReference<Map<Integer, List>> flowStats;
These maps are populated by a PhysicalSwitch’s instance of StatisticsManager[net.onrc.openvirtex.elements.datapath.statistics], which polls the corresponding datapath with OFFlowStatisticsRequest
s and OFPortStatisticsRequest
s.
Physical flow table synchronization The flow table in the PhysicalSwitch is represented as the PhysicalSwitch.flowStats
structure, populated with FlowStatisticsRequests
by periodically polling the corresponding datapath with OFFlowStatisticsRequest
s. This process is orchestrated by each PhysicalSwitch’s instance of StatisticsManager[net.onrc.openvirtex.elements.datapath.statistics]. The current polling interval is 30 seconds, set by refreshInterval
. In addition to flow statistics, the StatisticsManager also collects port statistics by polling the datapath with OFPortStatisticsRequests
.
3.4.3 OVXNetwork Presentation
Virtual topology presentation. OVX receives PacketOuts containing LLDPs from tenant controllers running their own topology discovery. LLDPs from tenants are handled within their OVXNetwork. By handling LLDPs in the virtual domain, OVX cuts down significantly on the LLDPs that hit the physical network.
For each output port indicated in such probe packets, OVXNetwork:
- looks up the destination port via its
neighborPortMap
- constructs a PacketIn with InPort set to the destination, and
- sends the PacketIn back to the NOS via the OVXSwitch containing the destination port
In other words, OVX emulates the broadcast/reception of LLDP packets within a network, for each tenant’s topology. This routine is implemented in OVXNetwork.handleLLDP(). The figure below summarizes this behavior.
Multiple-controller tenants (role management).
TODO
3.5 Network Virtualization
3.5.1 Switch Representation Translation
3.5.2 OpenFlow field translation – Cookies, Buffer IDs, XIDs
3.5.3 Address virtualization
3.5.4 Link and Route virtualization
Overview
In OVX, virtualization and devirtualization are the logical actions of moving across the virtual-physical split. In terms of operations on OpenFlow messages, this means:
- modification of source and destination network addresses
- translations of host attachment points to/from OVXSwitch/OVXPort and PhysicalSwitch/PhysicalPort
- dropping of messages originating from/destined to invalid points (hosts, switches) given virtual and physical network topologies
1 stems from the different addressing schemes used by the physical and virtual networks, for both Hosts (IP Addresses) and Switches (DPIDs and Port numbers). 2 follows logically from 1, since host attachment points will have different designations depending on the network view. 3 serves to isolate the traffic between virtual networks. This section provides descriptions of the various mechanisms that play a role in these three processes.
3.5.1 Switch Representation Translation
A key function in the virtualization process is the translation between OVXSwitches and PhysicalSwitches during message handling.
OVXSwitch -> PhysicalSwitch (Southbound)
OVXSwitches intercept southbound messages sent by tenant controllers. Two methods are used for looking up the destination PhysicalSwitch:
- By ingress OVXPort: The PhysicalSwitch is found through the PhysicalPort mapped to the OVXPort. The OVXPort is found from a port value field in the message, e.g the
in_port
field ofOFMatch
structures. For an OVXBigSwitch, any message without an ingress port value is ignored. -
By OVXMap lookup: For an OVXSingleSwitch, the 1:1 mapping allows OVX to do a direct lookup on the
physicalSwitchMap
by tenant ID.
This lookup occurs in the OVXSwitch.sendsouth()
method implemented in each of the OVXSwitch subclasses.
PhysicalSwitch -> OVXSwitch (Northbound)
The reverse lookup process exploits how OVX defines tenant networks, and the conventions used by OpenFlow conversations.
tenant networks: Hosts cannot be attached to more than one OVXNetwork, and can be uniquely identified by MAC address. The tenant ID can be fetched from OVXMap’s macMap
using the MAC address recovered from an OFMatch
field.
OpenFlow conversations: OpenFlow uses the same values for certain fields for multiple messages that are part of the same conversation(transaction). OVX may replace cookie
, XID
, and/or bufferId
fields of request(southbound) messages with new values that either encode context(e.g. tenantID) or can be mapped to the origin when it receives the corresponding reply(northbound). We discuss field translation in further detail in the next section.
3.5.2 OpenFlow field translation – Cookies, Buffer IDs, XIDs
OVX uses several structures to hold mappings used in field translations:
- XidTranslator [net.onrc.openvirtex.datapath] : LRULinkedHashMap<Integer, XidPair> xidMap
- OVXFlowTable [net.onrc.openvirtex.datapath] : ConcurrentHashMap<Long, OVXFlowMod> flowmodMap
- OVXSwitch : LRULinkedHashMap<Integer, OVXPacketIn> bufferMap
XIDTranslator. XID values must be unique within each datapath. The XIDTranslator uses the OpenFlow XID to multiplex/demultiplex conversations between a datapath and multiple tenants. For each southbound OVXMessage, the XIDTranslator:
- generates a new XID
- creates an XidPair to store the original XID and source OVXSwitch
- stores the XidPair in
xidMap
, using the new XID as the key - returns the new XID value to the caller
XidTranslator.translate()
implements the above actions. The caller (PhysicalSwitch) replaces the XID of the message with this new value so that a datapath only receives messages with unique XIDs. Conversely, for northbound messages, the XIDTranslator:
- recovers the XidPair given the XID of the message
- returns the XidPair to the caller
The tenant that initiated the conversation can be recovered from the OVXSwitch found in the XidPair. This reverse process is implemented in XidTranslator.untranslate()
.
OVXFlowTable. The OVXFlowTable stores flow entries in the form of a map holding unmodified OVXFlowMod
s keyed on cookies generated by OVX. The generated cookies encodes the tenant ID of the origin:
private long generateCookie() {
...
final int cookie = this.cookieCounter.getAndIncrement();
return (long) this.vswitch.getTenantId() << 32 | cookie;
}
}
The cookieCounter
ensures cookie uniqueness within an OVXSwitch. Specifically, a FlowMod is assigned a new cookie in OVXFlowMod.devirtualize()
when it added to the flow table i.e. has a command value of OFPFC_ADD
. The new cookie serves as a matching mechanism in the flow table e.g. when OVX receives a FlowRemoved and has to remove entries to maintain state consistency with the network. Table state maintenance is discussed here.
bufferMap. PacketIn/PacketOut pairs reference the same bufferID. OVX associates PacketIns with newly generated bufferIds before they are written north, and stores them, keyed on this new value, in the bufferMap
. When OVX intercepts a PacketOut, it can recover the corresponding original PacketIn using the PacketOut’s cookie value as the key. In OVXPacketOut.devirtualize()
:
// use bufferID of PacketOut to recover original (unmodified) PacketIn
final OVXPacketIn cause = sw.getFromBufferMap(this.bufferId);
…
// recover original OFMatch, packet data
this.match = new OFMatch().loadFromPacket(cause.getPacketData(),
this.inPort);
this.setBufferId(cause.getBufferId());
ovxMatch = new OVXMatch(match);
ovxMatch.setPktData(cause.getPacketData());
…
As shown above, the contents of the stored PacketIns are used to reverse the address, port, and buffer ID translations that were applied to the corresponding initial PacketIn (and therefore applied to this PacketOut) when it was virtualized by OVX. The mechanics of address virtualization are discussed next.
3.5.3 Address virtualization
3.5.3.1 Overview
OVX avoids address space collisions between tenant traffic flows by creating virtual (OVXIPAddress) and physical addresses (PhysicalIPAddress) for each Host. The former is unique within an OVXNetwork, and the latter is unique in the full PhysicalNetwork. Translation between virtual and physical IP addresses guarantee that each tenant controller can handle flows in terms of its network’s addressing scheme (despite possible overlaps with other tenants’ schemes), and that the datapaths are able to distinguish traffic from different tenants.
Address translation separates datapaths into two groups:
- edges: datapaths that are host attachment points
- core: datapaths only connected to other datapaths
Edge datapaths are charged with rewriting IP addresses. Specifically, edge switches:
- Match on OVXIPAddress values in nw_src and nw_dst fields, rewriting them to PhysicalIPAddress values, for network-bound traffic
- Match on PhysicalIPAddress values, rewriting them to OVXIPAddres values, for host-bound traffic
And core switches match and forward in terms of PhysicalIPAddresses.
OVX intercepts and alters FlowMods in order to impose these behaviors onto the datapaths. In addition to FlowMods, OVX also alters PacketIns and PacketOuts such that core datapaths only ‘see’ PhysicalIPAddress values, and controllers, OVXIPAddres values. Fig.3.6 illustrates the address translation process, and Table 2 shows a possible set of FlowMods pushed by the tenant controller to OVX, and by OVX to the datapaths, in order to achieve this behavior.
a) The PacketIn is sent to the tenant controller without modifications, with OVXIP values.
b) The corresponding FlowMod instructs matching on OVXIP, and rewrite of those values to PhysicalIPs.
c) The virtual link is mapped back to the two-hop path across psw2.
d) The PacketIn at the destination edge is translated similarly to those in the core network.
e) OVX installs FlowMods that match on PhysicalIPs and rewrite them to OVXIPs.
Fig 3.6 : The address virtualization process across three datapaths. The numbers next to the switches denote port numbers. The port numbering may differ in the tenant and actual networks, even for exact mappings, such as psw1 and vsw1. Here, h1 begins sending packets to h2. OVX handles PacketIns (tenant-bound arrows) and FlowMods (network-bound arrows) differently according to the location of the target datapath with respect to the source and destination hosts.
Table 2 : A possible set of FlowMods pushed to each OVXSwitch and datapath.
Tenant NOS -> OVXSwitch | OpenVirteX -> datapath | ||||
---|---|---|---|---|---|
OFMatch | OFAction | OFMatch | OFAction | ||
vsw1 | nw_src=10.0.0.1 nw_dst=10.0.0.3 in_port=1 |
output=2 | sw1 | nw_src=10.0.0.1 nw_dst=10.0.0.3 in_port=1 |
nw_src=1.0.0.1 nw_dst=1.0.0.2 output=2 |
nw_src=10.0.0.3 nw_dst=10.0.0.1 in_port=2 |
output=1 | nw_src=1.0.0.2 nw_dst=1.0.0.1 in_port=2 |
nw_src=10.0.0.3 nw_dst=10.0.0.1 output=1 |
||
vlink1 | sw2 | nw_src=1.0.0.1 nw_dst=1.0.0.2 in_port=2 |
output=3 | ||
nw_src=1.0.0.2 nw_dst=1.0.0.1 in_port=3 |
output=2 | ||||
vsw3 | nw_src=10.0.0.1 nw_dst=10.0.0.3 in_port=2 |
output=1 | sw3 | nw_src=10.0.0.1 nw_dst=10.0.0.3 in_port=2 |
nw_src=1.0.0.1 nw_dst=1.0.0.2 output=1 |
nw_src=10.0.0.3 nw_dst=10.0.0.1 in_port=1 |
output=2 | nw_src=1.0.0.2 nw_dst=1.0.0.1 in_port=1 |
nw_src=10.0.0.3 nw_dst=10.0.0.1 output=2 |
A caveat to this behavior is in the handling of ARP messages; ARP is further discussed with link and route virtualization in Section 3.5.4.
3.5.3.2 Implementations
The translation procedure is implemented across several OVXMessage classes:
PhysicalIPAddress -> OVXIPAddress:
* OVXPacketIn
OVXIPAddress -> PhysicalIPAddress:
* OVXPacketOut
* OVXFlowMod
* OVXActionNetworkLayerSource/Destination
Figures 3.7, 3.8,and 3.9 illustrate, in order, the (de)virtualization process for OVXPacketIn, OVXPacketOut, and OVXFlowMod messages.
Fig 3.7: PacketIn virtualization.
Fig 3.8 PacketOut devirtualization
Fig 3.9 FlowMod devirtualization
3.5.4 Link and Route virtualization
TODO
3.6 State Synchronization
3.6.1 Component State Coordination
3.6.2 Error/Event Escalation
3.6.3 Flow Table State Synchronization
3.6.1 Component State Coordination
3.6.2 Error Escalation
OVX uses errors intercpeted from the network to synchronize its PhysicalNetwork with the topology of the network.
Errors in the network – e.g. ports, links, and switches going down – are propagated to OVX as OFPortStatus
messages. The current implementation of OVX expects PortStatus messages with OFPortReason
fields of value OFPPR_DELETE
to be sent by a failing switch. These PortStatus messages are handled as OVXPortStatus [net.onrc.openvirtex.messages] instances by OVX.
The handling of OVXPortStatus
messages depends on OVX’s state. In the simplest case, no tenant networks exist and only ports, links, and switches in the PhysicalNetwork are removed. Even with tenants, OVX is capable of hiding away error conditions in the network given virtual topologies with certain properties:
- networks of OVXBigSwitches: A port failure in a non-OVXPort port is analogous to a failure in the fabric. A loss of a Port in a BVS mapped to a well-connected network can be completely hidden from a tenant if alternate paths exist between the OVXPorts of the BVS, or no SwitchRoutes use them.
-
Redundant OVXLinks: If multiple paths are available between the OVXPorts defining an OVXLink, failure of a Port in one path may be suppressed by failing over to another path.
Additionally, failure of unmapped ports reduce to the simplest case. Figure 3.10 illustrates the failure scenarios that can be suppressed by OVX.
Fig 3.10 : Three scenarios where errors can be suppressed. Left) PhysicalSwitches b and c are unmapped to the OVXNetwork. The tenant is completely ignorant of b and c and any errors associated with them. Middle) Multiple physical paths map onto the OVXLink between vs1 and vs2 ([a-b,b-d],[a-d],[a-c,c-d]…), providing plenty of backup paths. No link failures are reported to the tenant unless all paths between a and d, or PhysicalPorts mapped to OVXPorts fail. Right) The whole PhysicalNetwork maps to one BVS and its crossbars. Failures of PhysicalLinks, Ports, and Switches may be hidden unless the SwitchRoute between a and d run out of paths, or OVXPorts fail.
OVXBigSwitch and OVXLink resiliency are discussed in detail in Section 3.7.
Incidentally, error escalation only comes into the picture when the affected PhysicalPorts are:
- mapped to OVXPorts of OVXLinks and SwitchRoutes
- parts of non-resilient paths
- mapped to OVXSwitch edge ports
The removal process of a deleted PhysicalPort follow the dependency tree described in Fig.3.3 Section 3.2.3. The full error escalation process is shown in Figure 3.10, and is implemented in OVXPortStatus.virtualize()
.
$$$
Figure 3.10 : The algorithm used to modify network representations according to OFPortStatus message contents.
3.6.3 Flow Table State Synchronization
OVXFlowTable Synchronization An OVXFlowTable stores southbound FlowMods before they are altered by the devirtualization process, and represents the flow table that a tenant controller would see if it were to query a switch (in reality, an OVXSwitch) for its table contents. OVX keeps an up-to-date flow table for an OVXSwitch by handling OVXFlowMods [net.onrc.openvirtex.messages] as if it were a datapath handling FlowMod messages:
/* Within class OVXFlowMod */
public void devirtualize(final OVXSwitch sw) {
...
FlowTable ft = this.sw.getFlowTable();
...
long cookie = ((OVXFlowTable) ft).getCookie();
//Store the virtual flowMod and obtain the physical cookie
ovxMatch.setCookie(cookie);
/* update sw's OVXFlowTable */
boolean pflag = ft.handleFlowMods(this, cookie);
OVXFlowTable.handleFlowMods()
modifies the entries in an OVXFlowTable instance according to the command
field value of a FlowMod. The flow entry matching mechanism is implemented by OVXFlowEntry [net.onrc.openvirtex.elements.datapath], a wrapper class for OVXFlowMods.
After the virtual flow table is updated, the devirtualization process sends the FlowMod south.
Physical flow table synchronization The flow table in the PhysicalSwitch is represented as the PhysicalSwitch.flowStats
structure, populated with FlowStatisticsRequests
by periodically polling the corresponding datapath with OFFlowStatisticsRequest
s. This process is orchestrated by each PhysicalSwitch’s instance of StatisticsManager [net.onrc.openvirtex.elements.datapath.statistics]. The current polling interval is 30 seconds, set by refreshInterval
. In addition to flow statistics, the StatisticsManager also collects port statistics by polling the datapath with OFPortStatisticsRequests
.
Synchronization between flow tables The physical flow table is implicitly synchronized with OVXFlowTables that map to it via devirtualized FlowMods. Each FlowMod sent south also has the OFPFF_SEDN_FLOW_REM
flag set so that its expiration is reported back to OVX as an OFFlowRemoved
. the virtualize()
method of OVXFlowRemoved [net.onrc.openvirtex.messages] determines and removes the FlowMods that match using the cookie value:
public void virtualize(final PhysicalSwitch sw) {
/* determine tenant from cookie */
int tid = (int) (this.cookie >> 32);
...
try {
/* find which OVXSwitch's flowtable is affected */
OVXSwitch vsw = sw.getMap().getVirtualSwitch(sw, tid);
if (vsw.getFlowTable().hasFlowMod(this.cookie)) {
OVXFlowMod fm = vsw.getFlowMod(this.cookie);
vsw.deleteFlowMod(this.cookie);
/* send north ONLY if tenant controller wanted a FlowRemoved for the FlowMod*/
if (fm.hasFlag(OFFlowMod.OFPFF_SEND_FLOW_REM)) {
writeFields(fm);
vsw.sendMsg(this, sw);
}
}
...
}
3.7 Resilience
Network elements inevitably fail. OVX attempts to reduce the impact of infrastructure failures on OVXNetworks by allowing certain Components to be mapped redundantly onto the PhysicalNetwork:
- OVXLinks : multiple paths
- SwitchRoute : multiple paths
- OVXBigSwitch : multiple SwitchRoutes, sets of PhysicalSwitches, or SwitchRoutes with multiple paths
Note: The last case has yet to be implemented, and is hypothetical. Future releases are expected to support BVS resilience.
A Component mapped to multiple paths can switch to alternate paths when ports and links fail in the network. This allows continued traffic handling with minimal disruption. Components that support failover mappings implement the Resilient[net.onrc.openvirtex.elements]
interface. This interface provides two methods:
public boolean tryRecovery(Component c)
: Given the failure of c, attempt to switch over to any backup mappings, if possiblepublic boolean tryRevert(Component c)
: Given the resumed function of c, attempt to switch back to the original (favored) mapping
Currently, the two Components that implement Resilient
are OVXLink
and SwitchRoute
. Both utilize similar mechanisms to implement resilience. Fig.3.11 and 3.12 illustrate the flowcharts for tryRecovery() and tryRevert() for these two Components, respectively.
Fig.3.11: The failover precess, when a PhysicalLink goes down. The highest-priority path not containing the failed link replaces the current path. The displaced path is added to the list of broken links, and the new link removed from available backups.
Fig.3.12: The recovery process, after a failed PhysicalLink comes back up. A Component will try to revert to using the mappings that it started with. In Virtual links like OVXLink and SwitchRoute, this is assumed to be the path with the highest priority value. Paths that were broken earlier are moved from the ‘broken’ to the ‘backups’ list.
In the above figures, the ‘broken’ and ‘backups’ lists correspond to the previously-discussed unusableLinks/Routes
and backupLinks/Routes
TreeMap<Byte, List>
structures, respectively. All paths available to a Virtual link are moved between these two TreeMaps as links fail and recover, with the exception of the currently functional path, which is moved to the global mapping for the Virtual link.
Traffic flow disruption is reduced by transferring flows between switched paths by reinstalling sets of FlowMods to guide the trafic through the new path. This is implemented in the switchPath()
method for both OVXLinks and SwitchRoutes.
3.8 Persistence
This section describes the subsystem that implements the persistence of virtual network configurations.
3.8.1 Overview
3.8.2 Parameters
3.8.3 Related Packages and Classes
3.8.4 Saving Configurations
3.8.5 Updating Configurations
3.8.6 Restoring Configurations
3.8.1 Overview
As mentioned in the previous section, OVX supports the persistence of administratively configured network topologies. When provided with a storage backend (database) that it can connect to, OVX saves the network topology to the database, and rebuilds the network topology again from the stored data at restarting. Currently we use MongoDB as the database backend.
Before and after restarting OVX, not only the network topology, but also all IDs (tenant ID, DPID, port number, link ID, route ID and host ID) are preserved. However, SwitchRoutes in OVXBigSwitches with algorithms set to “spf” are not kept, i.e. will be automatically regenerated across restarts.
Note: Currently, the flow entries in virtual switches and physical switches are not restored except for initial flow entries. We are now developing “live migration and snapshotting” to support system continuity, where all necessary flows are preserved.
3.8.2 Configuration Parameters
Command line options can be used to configure how OVX interacts with the storage backend:
Option | Argument | Comments |
---|---|---|
-dh or –db-host | hostname | default: "127.0.0.1" |
-dp or –db-port | port | default:27017 |
Note, there are two cases where OVX starts up without pre-configured virtual topologies:
- If OVX can’t connect to the database: Currently, this generates error messages in the log. These messages won’t interfere with the regular operation of OVX.
- Using the option “–db-clear”: All persistence data is deleted from storage.
3.8.3 Related Packages and Classes
In addition to the Persistable
interface, [net.onrc.openvirtex.db] is also associated with persistence. This package contains classes that define the document for OVXNetwork
s, and wrappers that allow OVX to interface with MongoDB.
The rest of this section will give overviews of the member classes in [net.onrc.openvirtex.db].
3.8.3.1 class DBManager
DBManager implements the read/write operations to the storage backend. It is instantiated as a singleton when OVX is started.
Fields
// Database collection names
public static final String DB_CONFIG = "CONFIG";
public static final String DB_USER = "USER";
public static final String DB_VNET = "VNET";
// Database object
private DBConnection dbConnection;
// Map of collection names and collection objects
private Map<String, DBCollection> collections;
// Mapping between physical dpids and a list of vnet managers
private Map<Long, List<OVXNetworkManager>> dpidToMngr;
// Mapping between physical links and a list of vnet managers
private Map<DPIDandPortPair, List<OVXNetworkManager>> linkToMngr;
// Mapping between physical ports and a list of vnet managers
private Map<DPIDandPort, List<OVXNetworkManager>> portToMngr;
Methods
// Initialize database connection
public void init(String host, Integer port, boolean clear)
// Create a document in database from persistable object obj
public void createDoc(Persistable obj)
// Remove a document
public void removeDoc(Persistable obj)
// Save an element to the list of specified key in document
public void save(Persistable obj
// Remove an element from the list of specified key in document
public void remove(Persistable obj)
// Reads all virtual networks from database and spawn an OVXNetworkManager
// for each.
private void readOVXNetworks()
// Reads virtual components from a list of maps in db format and registers the
// physical components in their manager.
private void readOVXSwitches(List<Map<String, Object>> switches,
OVXNetworkManager mngr)
private void readOVXLinks(List<Map<String, Object>> links,
OVXNetworkManager mngr)
private void readOVXPorts(List<Map<String, Object>> ports,
OVXNetworkManager mngr)
private void readOVXRoutes(List<Map<String, Object>> routes,
OVXNetworkManager mngr)
3.8.3.2 class OVXNetworkManager
OVXNetworkManager recreates a tenant network from storage, and is created per virtual network. The way in which it rebuilds a tenant network is described here.
Fields
// Document of virtual network
private Map<String, Object> vnet;
private Integer tenantId;
// Set of offline and online physical switches
private Set<Long> offlineSwitches;
private Set<Long> onlineSwitches;
// Set of offline and online physical links identified as (dpid, port number)-pair
private Set<DPIDandPortPair> offlineLinks;
private Set<DPIDandPortPair> onlineLinks;
// Set of offline and online physical ports
private Set<DPIDandPort> offlinePorts;
private Set<DPIDandPort> onlinePorts;
private boolean bootState;
Methods
// Register a physical component to offline list
public void registerSwitch(final Long dpid)
public void registerLink(final DPIDandPortPair dpp)
public void registerPort(final DPIDandPort port)
// Delete a physical component from offline list,
// add it to online list,
// and then, if all physical components are online,
// create a virtual network.
public synchronized void setSwitch(final Long dpid)
public synchronized void unsetSwitch(final Long dpid)
public synchronized void setLink(final DPIDandPortPair dpp)
3.8.3.3 interface DBConnection
DBConnection is an interface that defines the methods that must be implemented in order for OVX to interact with various storage backends. Class MongoConnection
implements this interface with MongoDB-specific methods to connect() and disconnect() from the database.
3.8.4 Storing Configurations
3.8.4.1 Overview
3.8.4.2 Mechanism
3.8.4.3 Persistible Components
3.8.4.1 Overview
When virtual Components are instantiated, their information is added to the database as documents. Currently, the Components stored in the database are the following:
- OVXNetwork
- OVXSingleSwitch
- OVXBigSwitch
- OVXPort
- OVXLink
- SwitchRoute
- Host
The remainder of this section describes the mechanisms and structures involved in storage.
3.8.4.2 Mechanism
When persistable Components are instantiated, their register() method is called. In the register(), DBManager.save() is called with an object which implements Persistable
. The method save():
- Gets target collection by getDBName() e.g. “VNET”
- Gets query index by getDBIndex() e.g. { “tenantId”:1 }
- Gets key by getDBKey() and value by getDBObject() e.g. key is “switches”, value is { “dpids”:[4], “vdpid”:400 }
- Adds (updates) this value into the list of this key by using MongoDB’s
$addToSet
operator. If the initial set is {“switches”:[{“dpids”:[1], “vdpid”:100}]}, this becomes {“switches”:[{“dpids”:[1], “vdpid”:100}, {“dpids”:[4], “vdpid”:400}]}
Note, $addToSet
doesn’t allow for duplication in the list. Refer to MongoDB’s documentations for further detail.
3.8.4.3 Persistible Components
The Components that implement Perisitable
and are stored to database are OVXSwitch subclasses (OVXSingleSwitch, OVXBigSwitch), OVXLink, SwitchRoute, OVXPort and Host.
Note that OVXNetwork, Link, and Port also implement Persistable
but are not stored to the database, i.e. aren’t stored by DBManager.save(). This is to allow PhysicalLink (extends Link) and PhysicalPort (extends Port), as well as OVXNetwork, to use some of Persistable
‘s methods.
Add tables or links to them in storage API section
3.8.5 Updating (Deleting) Configurations
When components (switches, links, ports, hosts) are updated, OVX deletes the old instance and replaces it with a new instance. Elements in the database will be deleted and created at corresponding times. The procedure differs between OVXNetworks and other Components:
OVXNetworks : DBManager.removeDoc() deletes a document of the specified virtual network. This method is called by OVXNetwork.unregister().
Other Elements : DBManager.remove() deletes an element in the list of the value for specified key by the $pull
operation of MongoDB. This method is called by component inactivation methods:
- unregisterDP() – OVXSwitch
- unregister() – OVXPort, OVXLink, SwitchRoute, OVXHost
3.8.6 Restoring Configurations
Upon booting, OVX adds the Physical Components that were previously stored in the DB to the “offline list”. This “offline list” is a checklist tracking whether if physical entities (switches, links, ports) are offline or not. When OVX detects that a physical element is active, OVX creates its corresponding physical Component instance (PhysicalSwitch, PhysicalPort, PhysicalLink). If all physical entities become live, OVX restores the saved OVXNetwork(s), complete with their virtual components (OVXSwitch, OVXPort, OVXLink, Host, etc.).
3.9 JSONRPC API
3.9.1 The API Server
3.9.2 The OVX GUI
3.9.3 The Network Embedder
3.9.1 The API Server
TODO
3.9.2 The OVX GUI
TODO
3.9.3 The Network Embedder
TODO
[ Previous Section | Documentation Home ]
Please send feedback and questions to ovx-discuss – at – googlegroups.com