Version 7 (modified by ibaldin, 8 years ago)

--

Cosmic Todo List

This list can serve as a basis for planning/prioritizing work going forward. It starts at an attempt at an exhaustive list (July 2011).

Strategic items

  • Two-pass stitching and dynamic stitching
    • Pieces of the slice described by separate pieces of NDL. Pieces of aggregates coming and going to/from the slice.
  • Group Allocation
    • Scheduling of a group allocation around the bottleneck resource (advance reservations)

Administration and robustness

  • reset inventory delegations at the broker: forget old delegations, or try to renew them
  • AM full restart or hard reset: clean the substrate, and reissue delegations. Rejects old tickets and leases?
  • AM full restart: how to rebuild delegations?
  • slice reclamation on AM and broker: time out an empty slice
  • remoting the management interface
  • SM query/discovery of reservations in the slice; poll/request lease status
  • AM query interfaces: slices per user, leases per slice, identify by IP or vlan
  • GMOC monitoring feed: faithful AMs publish an asynchronous stream "feed" of sliver create/renewal notifications to ST.

Features to test, verify, cleanup, document

  • extension packages
    • AM side: handlers for new resource types
    • SM side: controller/handler/view
  • SM controller classes and interfaces
  • SM-side handler and stitching, e.g., for elastic Condor
  • pushbutton Euca site deploys with NDL cookbook, including IP connectivity, canned xCAT images, etc.
  • Error reporting/logging from NDL processors
  • error reporting/logging from handlers
  • error reporting distinguishes authorization failure vs. resource limit vs. internal error vs. user error
  • Ticket validation, including signature failure and oversubscription, and rejection path
  • Ticket rejection and cleanup at SM and broker
  • Stitch token validation
  • Multiple pools and multiple delegations per pool on an AM; register through portal
  • Failed renewal and cleanup in SM
  • Full SM-side slice abort (e.g., due to partial failure)
  • broker renews delegations
  • broker absorbs updated delegations
  • AM probe detects resource failure, updates containing lease
  • interdomain path computation across multiple brokers
  • openflow handler

Advance reservations

  • Resource counts/vectors and integration with controller calendars and SM/broker policy for SPARQL (e.g., VLAN tag tracking)
  • broker-based advance reservations in conjunction with inter-domain stitching
  • auto-launch at the reservation time

Interoperability

  • PG as an aggregate: requires new handler and PG cert pass-through from XMLRPC controller. Wait for PG-ABAC?
  • GUSH (in progress with Jeannie Albrecht)

Representations

  • NDL requests and manifest using the new edge standard. Propertylist cleanup and doc

Broker resource policies: wish list

  • generic web view for admin approval of slices and reservations
  • broker resource menu with fixed prices and budgets
  • attribute-based shares or quotas, with policy plugin view to manipulate them
  • congestion pricing
  • stackable broker policies
  • euca instance size selection (small/medium/large properties/handlers)
  • map global type attributes specified by SM into candidate pool IDs (site selection)
  • VM placement (site selection) based on data location

Identity management and portal

  • multi-user web portal with per-slice access control, requester ID in outgoing requests, proper authtoken slamming
  • shib: needed the user ID registered locally to get in on the web portal.
  • users can upload certs, which are stored indexed by user

Enhancements to ABAC-based authorization (when integration is complete)

  • per-site policies: ACLs by idp.attribute (test case: RENCI cluster)
  • signed security attributes on images transported by Image Proxy
  • GMOC back door: attribute-based authorization for slice shutdown
  • SA module and/or actor

Handler infrastructure

  • Config handler invocation: threadpool, synchronization, and new scripting support
  • modify() entry point
  • sliver restart as example of modify
  • dynamic interposition/withdrawal of perfsonar
  • handler-driven sliver stitching: storage volume create/attach
  • small/medium/large (independent of broker policy)
  • in-progress additions to handler catalog: EBS, sunfish storage, xcat, I2/ION

Extensions to staged core workflow

  • controllers as a separate stage
  • auth checks as a separate stage
  • resourceSet and below as a separate stage

Pushbutton slices/demos

  • Hadoop
  • elastic Condor w/local DAGman
  • triangle and/or star, intradomain and interdomain cases
  • Harold's three-tier cloudscale
  • netfence

Ideas to discuss

  • integrate puppet configuration service to AM
  • integrate nagios monitoring service to AM
  • move inter-domain path computation into the broker
  • broker pollicy for bin-packing computons (small/medium/large)
  • multiple SMs per slice: SM owner per-reservation, not per-slice

Proposed new component: secure image repo with simple web interface

  • content-addressable fetch by HTTP and bittorrent (by any ImageProxy? that knows URL and hash)
  • generate/retire a random token to allow a user to request approval to put images in the repo
  • user form to request approval to put images in the repo: requires access token
  • admin interface to approve image put, set storage quota for image elements, generate random image token
  • put image or update image, named by image token (server generates/checks hashes)
  • images optionally discoverable and browsable by short description, if allowed by user
  • we need to keep three maps: user tokens to image tokens, image tokens to image elements and storage consumed, and content hashes to their objects

NDL Integration

  • Describe site delegations in NDL (ILIA)
    • What I'm proposing is that in addition to an RDF file describing substrate there are one or more rdf files describing delegations, not dissimilar to the current request in rdf, where there is a topology description and then a Reservation object that lists members (URLs, unique names) of nodes and links that belong to the request. We can have a similar Delegation object class that is described inside a separate file referencing individuals from the substrate description. You can have several files describing several reservations (to e.g. different brokers). Since in RDF every delegation is a unique object with a unique name/URL, all that needs to be put in the export stanza of the config.xml file in this case is the URL of the delegation (and a reference to the rdf file describing it; note that there is not a one-to-one correspondence between files and delegations: they can all be in separate files or in one file). The config processor would need to rip the substrate RDF and the delegation RDFs which reference the objects in the substrate RDF, convert the delegations into unit counts and form traditional ORCA delegations that are used internally. This has the advantage of not impacting internal logic, but bringing order to the use of RDF as a resource description mechanism in ORCA without manually replicating delegation information in config.xml as is done today.