Camano 3.0

Release Notes

Configuration file changes

Thanks to new NDL bridging code several attributes and properties in authority actor config.xml file have become unnecessary. It is still possible to specify them manually, however this should be reserved only for special cases. In most cases however the bridging code will automatically generate necessary attributes and properties needed by policy controls based on NDL-OWL resource descriptions used by authority actors. For authorities relying on NDL-OWL substrate descriptions the following attributes should be removed from authority resource pool specifications used in configuration files for previous releases:

  • resource.domain
  • resource.memory
  • resource.cpu

The following properties should be removed from pool specifications:

  • vlan.tag.start
  • vlan.tag.end

The bridging code is easily extensible for authorities using pool factory orca.boot.inventory.NdlResourcePoolFactory.

ORCA Actor Registry Integration

Camano 3.0 tightly integrates the actor registry with ORCA operation. Specifically, the <topology> section of actor configuration files are no longer needed under most circumstances. If the container.properties file specifies using the remote registry, then all certificate information and locations of other actors are gleaned from the registry information and edges in actor topology are created automatically. See this page for more information.

GENI AM API and ProtoGENI AM API support

The XMLRPC controller included in the distribution exposes GENI AM API and ProtoGENI AM API. To enable it create a container with a service manager, make sure the actor is registered with the ORCA Actor Registry, then in the ORCA GUI under user tab start the XMLRPC controller. Once the controller is started the internal XMLRPC server will be initialize to expose the proper APIs that can be used by standard tools, like GPO's omni (part of GCF package).

By default this controller looks for 'ndl-broker' hosted at RENCI. Which broker is used is a configuration parameter in container.properties.

The ProtoGENI AM API interface accepts ProtoGENI RSpec v2, while the GENI AM API accepts NDL-OWL requests. The GENI AM API can be exercised using Python scripts located under $ORCA_SRC/controllers/xmlrpc/resources/scripts/.

Redeploying site actors

Camano 3.0 properly closes tickets from authorities to brokers, which allows for a clean redeployment of authority actors without affecting the rest of the ORCA actors. To redeploy an authority (to change the configuration or delegated resources):

  • Make sure no user slices hold resources delegated by the authority actor.
  • Shut down the authority actor
  • In the broker actor(s) close all tickets from this authority actor
  • Remove the $ORCA_HOME/state_recovery.lock file in the authority host, modify its configuration and restart the container with the authority actor
  • Claim delegations from this authority on one or more brokers

Notice that this is only needed if you need to make changes to the resources or resource descriptions exported by an authority site. ORCA includes container recovery code that properly restores actors to their pre-shutdown state in case the container they are in is shut down. If recovery and not a clean restart is desired after container with authority is shut down, do NOT remove the state_recovery.lock file and do not close the the tickets in the broker, instead simply restart the container.

Recovery

Recovery (restarting a container with actors, while retaining state) code has been updated. Recovery of actors under most circumstances has been tested and verified to work.

Where to get the code

Please read the top-level documentation. Binary release is available. To get the source, use the tag Camano-3.0 and follow these instructions for building from source.

Known Issues

  • Tomcat does not always exit after $ORCA_HOME/tomcat/stop.sh is invoked. ORCA container shuts down, but Tomcat continues running.
    • Kill Tomcat manually

Planned features

  • Re-deploying site requires redeploying broker
    • When you redeploy - don't wipe the state recovery file: this will preserve existing reservations. HOWEVER: there is a bug now that prevents expanding the packages if the state recovery lock file is present. #145. Every actor is designed to work like that. If a broker is blown away, service managers will continue working until they need extend. Basically we need to fix recovery bugs if there are any but by design, redeploying should be fine. Recovery is most problematic on SM and Authorities. Eucalyptus is made to recover but not well tested.
    • This issue has been resolved #184
    • Related but not critical #187, #182, #183, #185, #186
  • How can we pass port configuration from the switches into the log from failure of operations?
    • if you specify a task property as 'shirako.save.XXXX' (Config class) will be passed back (with stripped 'shirako.save') and the property will be attached to the unit for which the handler was executed (shirako.save.blah -> blah) however failed units aren't sent back to e.g. SM. Logging it in the handler is possible. Logging is probably easiest. Also possible to attach it to the reservation and pass it as a property. There is also a notice mechanism that can be used to pass it back to the reservation.
    • Problem is related to using Ant as the basis for handler scripts. To look at changing to using Jython as the engine, look at Config class. One issue may be checking progress of the operation. Jeff has details on this (added for one of the demos). Some code in AntConfig? class may be generically useful and would have to be pushed up. Implement a generic config, look at javax.scripting and BSF (bean-scripting framework).
    • This issue has been partially addressed through better logging
  • Do we still an interactor deadlock problem?
    • Yes. Needs fixing. Every inter-actor call holds big lock. Across multiple containers it is possible for call and response to be in different threads, which may cause deadlock. The solution is when calls are made across actors, they are not done while holding the big lock. The issue is handling exceptions.
  • If a parent reservation fails, should children be allowed to go on? #189
    • There is a concept of 'deferral'. If you don't have enough resources, you can send back partial resources, which is why this makes sense. We may need to add a property on the control for the resource pool, a new property 'non-deferrable' may need to be added. There is a binary predicate that determines whether a redeem is possible (yes redeem/no redeem/no redeem and release). The children reservation get tickets and they need to be cleaned up if a parent fails on redeem. This needs to be a transaction. Predecessor reservations need to be closed, however we may need to wait until they are done redeeming, as closing in redeeming state is difficult.
  • Recovery issue between multiple containers: #205
  • Trust-root at actor registry (to enable configuration without specifying topology section in config file): r3258
  • Need to re-enable ticket validation #181
  • Site consolidation (is it needed?): NO
    • Can/Should we get rid of RENCI-net UNC-net authorities? So we can have a single authority controlling EX3200? With NDL exporting multiple resource types we can now do the consolidation.
    • We could modify configuration processor to create resource pools based on NDL input to avoid repeating delegation in config.xml (and NDL). To be discussed.
    • Since we are dealing with a simplified layout of substrate, for longer term it is beneficial to continue using Net authorities since they will typically correspond to intermediate network providers.
  • NDL code:
    • #194, #195 (Yufeng)
    • Create a separate log file for NDL controller to aid debugging (and stop sending stuff to stdout)
  • Term-related issues
    • Is reservation time from RDF request properly used when creating ORCA reservation? Yes now r3210
    • How long are delegations to a broker from authority valid? Where is it set? is 21-day hard-coded somewhere?
  • config.xml has replicated information that should be gleaned from NDL the same way that abstract NDL is created and put on properties today: #199, #200, #201
  • Old bugs:
    • Adding resource pools from GUI/extending management API #120
    • Referencing actors by GUIDs instead of names #121
    • Emulation mode build #192
    • Node Agent install fail #193
    • Close reservation from reservation screen (for single VMs w/o controller use) fails #196, #116 (related)
    • GUI hangs in some situations: #206 #207