Ticket #205 (closed defect: fixed)

Opened 8 years ago

Last modified 8 years ago

Reservations in active state cannot be fully closed after recovery

Reported by: ibaldin Owned by: ibaldin
Priority: major Milestone: Camano 3.0
Component: ORCA: Shirako Core Version: Camano 3.0
Keywords: Cc:

Description

Recovery (with trust-root modifications) works except in one case: between several containers, if SM in once container creates a reservation but does not close it, shuts down then restarts (with recovery) and tries to close it, it only closes it locally (broker- and authority-side reservations in other containers are not closed). This is due to a NullPointerException? in Ticket.java around line 150 (see below):

public synchronized Properties encode(String protocol) throws Exception {

Properties enc = new Properties();

enc.setProperty(PropertyClassName?, this.getClass().getCanonicalName());
if (getType() != null) {

enc.setProperty(PropertyResourceType?, getType().toString());

}
PropList?.setProperty(enc, PropertyUnits?, getUnits());

// THE PROBLEM IS IN THE LINE BELOW BECAUSE authority is null in this case

IProxy proxyToUse = ActorRegistry?.getProxy(protocol, authority.getName());
// the proxy to the site
PropList?.setProperty(enc, PropertyTicketAuthorityProxy?, proxyToUse.save());
// the resource ticket XML
enc.setProperty(PropertyTicketResourceTicket?, plugin.getTicketFactory().toXML(getTicket()));
return enc;

}

this Ticket.encode(protocol) method is invoked in this case from

protected orca.shirako.proxies.soapaxis2.beans.Reservation passBrokerReservation(IReservation reservation, AuthToken? auth) {

IClientReservation r = (IClientReservation) reservation;
orca.shirako.proxies.soapaxis2.beans.Reservation rsvn = new orca.shirako.proxies.soapaxis2.beans.Reservation();

rsvn.setSlice(Translate.translate(r.getSlice()));
rsvn.setTerm(Translate.translate(r.getRequestedTerm()));
rsvn.setReservationID(r.getReservationID().toString());
rsvn.setSequence(r.getTicketSequenceOut());

orca.shirako.proxies.soapaxis2.beans.ResourceSet? rset = Translate.translate(r.getRequestedResources(), Translate.DirectionAgent?);

// THIS IS WHERE THE REFERENCE TO THE FAILING TICKET IS ACQUIRED

IConcreteSet cset = r.getRequestedResources().getResources();

if (cset != null) {

Plist encoded = null;
try {

// THIS IS THE Ticket.encode("soapaxis2") CALL THAT FAILS

Properties enc = cset.encode(ProtocolNames?.SoapAxis?2);
encoded = encodePropertiesSoap(enc);

} catch (Exception e) {

throw new RuntimeException?("Cannot encode concrete set", e);

}

if (encoded == null) {

throw new RuntimeException?("Unsupported IConcreteSet: " + cset.getClass().getCanonicalName());

}

rset.setConcrete(encoded);

}

rsvn.setResourceSet(rset);

return rsvn;

}

I ran the debugger on it and there are some properties on the reservation but I didn't see any related to authority or recovery. However reservation.authority appears to be set properly and is NOT null, so the issue is with whatever recovered the reservation did not put any authority on the resource set that was within it for some reason. The issue is with the Ticket, which appears to have the unit count set, but authority and other references in it are null.

I do not think it is my code (trust-root mods), since I do not interfere with recovery and only check not to create edges twice (once from recovery and once from the remote registry). I also don't think we've tested this before.

Other types of recovery work fine, i.e. if SM closes all reservations and restarts and tries to to create a new reservation, all is well. There is a similar failure (to the one above) in the same place between a Broker and an Authority in the same code (Ticket.java) in case of recovery. As I said, this works in a single container (all three actors) without problems.

Change History

Changed 8 years ago by ibaldin

  • status changed from new to accepted

From Chase:

OK, so:

- We can infer that save/reset of the authority proxy works correctly, since it has been restored from ReservationClient?, and as I understand it the SM is able to contact the authority, so at least one saved/reset proxy is good.

- I think I have verified that save/reset of the Reservation.ResourceSet? (where the ticket is) goes all the way through to the Ticket class.

- I understand that the Ticket itself is only partially recovered.

- I think I have verified that the authority proxy save/reset code used in the Ticket class is the same as the code used in the ReservationClient? class, so it should work.

- BUT: one difference is that the authority proxy in the Ticket class is recovered out of revisit(), while most other state is recovered out of reset().

It is possible that revisit() isn't being called correctly or isn't being propagated down the tree correctly. It is not clear that we would know, since the main state that is recovered from revisit() in ReservationClient? is the stitching DAG state (redeem and join predecessors etc.), and it's not clear that we would have noticed if this wasn't recovered correctly.

Perhaps Aydan can tell us why the Ticket's proxy reference is recovered out of revisit instead of reset.

Another possibility is that somehow the ActorRegistry? is choking when multiple proxies for the same actor are recovered.

Changed 8 years ago by ibaldin

  • status changed from accepted to closed
  • resolution set to fixed

Tested and now fixed by r3278

Note: See TracTickets for help on using tickets.