Cosmic Todo List

This list can serve as a basis for planning/prioritizing work going forward. It starts at an attempt at an exhaustive list (July 2011).

Strategic items

  • Two-pass stitching and dynamic stitching, modify support
    • Pieces of the slice described by separate pieces of NDL. Pieces of aggregates coming and going to/from the slice.
    • modify() entry point
      • sliver restart as example of modify
      • workflow-driven slice evolution
  • Group Allocation (atomic allocations of resources)
    • Scheduling of a group allocation around the bottleneck resource (advance reservations)
  • Make ResourceSet event-driven (per Aydan) #219
    • Introduce queues in ResourceSets; modifications to ResourceSets should be done using a queue to avoid locking.
    • Will clean up a lot of code.
  • Fix renewable problems #211
    • If close is in progress and renew comes, it causes panic.
  • Administration and robustness
    • reset inventory delegations at the broker: forget old delegations, or try to renew them
    • AM full restart or hard reset: clean the substrate, and reissue delegations. Rejects old tickets and leases?
    • AM full restart: how to rebuild delegations?
    • slice reclamation on AM and broker: time out an empty slice
    • SM query/discovery of reservations in the slice; poll/request lease status
    • AM query interfaces: slices per user, leases per slice, identify by IP or vlan
    • GMOC monitoring feed: faithful AMs publish an asynchronous stream "feed" of sliver create/renewal notifications to ST.

Administration

  • remoting the management interface
  • Add 'exportAll' to the AM config file
  • Add instantiating a controller from config file vs. GUI

Features to test, verify, cleanup, document

  • extension packages
    • AM side: handlers for new resource types
    • SM side: controller/handler/view
  • SM controller classes and interfaces
  • SM-side handler and stitching, e.g., for elastic Condor
  • pushbutton Euca site deploys with NDL cookbook, including IP connectivity, canned xCAT images, etc.
  • Error reporting/logging from NDL processors
  • error reporting/logging from handlers
  • error reporting distinguishes authorization failure vs. resource limit vs. internal error vs. user error
  • Ticket validation, including signature failure and oversubscription, and rejection path
  • Ticket rejection and cleanup at SM and broker
  • Stitch token validation
  • Multiple pools and multiple delegations per pool on an AM; register through portal
  • Failed renewal and cleanup in SM
  • Full SM-side slice abort (e.g., due to partial failure)
  • broker renews delegations
  • broker absorbs updated delegations
  • AM probe detects resource failure, updates containing lease
  • interdomain path computation across multiple brokers
  • openflow handler

Advance reservations

  • Resource counts/vectors and integration with controller calendars and SM/broker policy for SPARQL (e.g., VLAN tag tracking)
  • broker-based advance reservations in conjunction with inter-domain stitching
  • auto-launch at the reservation time

Interoperability

  • PG as an aggregate: requires new handler and PG cert pass-through from XMLRPC controller. Wait for PG-ABAC?
  • GUSH (in progress with Jeannie Albrecht)

Representations

  • NDL requests and manifest using the new edge standard. Propertylist cleanup and doc

Broker resource policies: wish list

  • generic web view for admin approval of slices and reservations
  • broker resource menu with fixed prices and budgets
  • attribute-based shares or quotas, with policy plugin view to manipulate them
  • congestion pricing
  • stackable broker policies
  • euca instance size selection (small/medium/large properties/handlers)
  • map global type attributes specified by SM into candidate pool IDs (site selection)
  • VM placement (site selection) based on data location

Identity management and portal

  • multi-user web portal with per-slice access control, requester ID in outgoing requests, proper authtoken slamming
  • shib: needed the user ID registered locally to get in on the web portal.
  • users can upload certs, which are stored indexed by user

Enhancements to ABAC-based authorization (when integration is complete)

  • per-site policies: ACLs by idp.attribute (test case: RENCI cluster)
  • signed security attributes on images transported by Image Proxy
  • GMOC back door: attribute-based authorization for slice shutdown
  • SA module and/or actor

Handler infrastructure

  • Config handler invocation: threadpool, synchronization, and new scripting support
  • dynamic interposition/withdrawal of perfsonar
  • handler-driven sliver stitching: storage volume create/attach
  • small/medium/large (independent of broker policy)
  • in-progress additions to handler catalog: EBS, sunfish storage, xcat, I2/ION
  • Support Q-in-Q.
  • EC2/Euca
    • Allow passing instance size as a parameter from controllers (and NDL)
  • Sherpa needs support for dealing with a pool of predefined VLANs
  • Network drivers
    • Improve 6509 driver performance by caching login sessions
    • Consider separating adding a QoS profile to vlan from vlan creation. This may be needed to deal with vlan delays and in general give more flexibility.
    • Add returning basic configuration errors, e.g., existing vlan or vlan mapping.
    • Add vlan translation to ex3200
      ckh# set vlans ORCA-test-Hadoop interface ge-0/0/0.0 mapping 10 ?
      Possible completions:
      + apply-groups         Groups from which to inherit configuration data
      + apply-groups-except  Don't inherit configuration data from these groups
       push                 Push additional tag on packet
       swap                 Translate VLAN tag for the packet
      [edit]
      

Extensions to staged core workflow

  • controllers as a separate stage
  • auth checks as a separate stage
  • resourceSet and below as a separate stage

Pushbutton slices/demos

  • Hadoop
  • elastic Condor w/local DAGman
  • triangle and/or star, intradomain and interdomain cases
  • Harold's three-tier cloudscale
  • netfence

Ideas to discuss

  • integrate puppet configuration service to AM
  • integrate nagios monitoring service to AM
  • move inter-domain path computation into the broker
  • broker pollicy for bin-packing computons (small/medium/large)
  • multiple SMs per slice: SM owner per-reservation, not per-slice

Proposed new component: secure image repo with simple web interface

  • content-addressable fetch by HTTP and bittorrent (by any ImageProxy? that knows URL and hash)
  • generate/retire a random token to allow a user to request approval to put images in the repo
  • user form to request approval to put images in the repo: requires access token
  • admin interface to approve image put, set storage quota for image elements, generate random image token
  • put image or update image, named by image token (server generates/checks hashes)
  • images optionally discoverable and browsable by short description, if allowed by user
  • we need to keep three maps: user tokens to image tokens, image tokens to image elements and storage consumed, and content hashes to their objects

NDL Integration

  • Describe site delegations in NDL (ILIA)
    • What I'm proposing is that in addition to an RDF file describing substrate there are one or more rdf files describing delegations, not dissimilar to the current request in rdf, where there is a topology description and then a Reservation object that lists members (URLs, unique names) of nodes and links that belong to the request. We can have a similar Delegation object class that is described inside a separate file referencing individuals from the substrate description. You can have several files describing several reservations (to e.g. different brokers). Since in RDF every delegation is a unique object with a unique name/URL, all that needs to be put in the export stanza of the config.xml file in this case is the URL of the delegation (and a reference to the rdf file describing it; note that there is not a one-to-one correspondence between files and delegations: they can all be in separate files or in one file). The config processor would need to rip the substrate RDF and the delegation RDFs which reference the objects in the substrate RDF, convert the delegations into unit counts and form traditional ORCA delegations that are used internally. This has the advantage of not impacting internal logic, but bringing order to the use of RDF as a resource description mechanism in ORCA without manually replicating delegation information in config.xml as is done today.
  • Review the code for static members and general structure
  • Multipoint BEN and Sherpa
  • Represent IP address assignment for multiple VMs per site in the RDF request. Need to parse IP range in the controller and pass to the VMControl policy.
  • Improve Port to Port provisioning to allow the request to specify the specific ports name.
  • Investigate persistent triple store from BBN http://parliament.semwebcentral.org/
  • Get port payload information out of the substrate RDF for the DTN switches
  • For BEN, in the case of multiple connections between different pair of devices, a connection may consist of portions of cross-layer segments and existing Ethernet virtual connection segments, need to review and run more tests to make sure the release actions in different orders are correct.
  • Can we have controller (ID controller) query NDL's on demand instead of only in the beginning.
  • Should improve the performance of the label assignment policy and label range update utility in the model.
  • May need to get rid of the domain name entry in the config.xml, and get it from the NDL when delegating resource pools.
  • May need to get rid of the build.property in the handler to get the device information (name/address) out of the NDL file.
  • Can we use cytoscape to visualize our RDFs in a useful way (example: in the registry add an option to show a visualization of the delegated resources)?