Ticket #367 (closed defect: duplicate)

Opened 5 years ago

Last modified 4 years ago

Intermittent delegation failure on startup

Reported by: anirban Owned by: ibaldin
Priority: major Milestone:
Component: ORCA: Shirako Core Version: baseline
Keywords: Cc: ibaldin, anirban, pruth, yxin, vjo

Description

Some resource pools fail to be delegated to rack broker during startup. Restarting cures the problem. This might be some kind of race, which manifests on racks with faster disks, eg. TAMU

Change History

Changed 5 years ago by anirban

2014-09-29 20:02:41,554 [RPC] ERROR orca - An error occurred while performing RPC. Error type=LocalError?
orca.shirako.util.RPCException: orca.shirako.proxies.soapaxis2.SoapAxis?2StubException: An error occurred while obtaining service stub

at orca.shirako.proxies.soapaxis2.SoapAxis?2Return.execute(SoapAxis?2Return.java:73)
at orca.shirako.kernel.RPCManager$RPCExecutor.run(RPCManager.java:808)
at java.util.concurrent.ThreadPoolExecutor?.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor?$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

Caused by: orca.shirako.proxies.soapaxis2.SoapAxis?2StubException: An error occurred while obtaining service stub

at orca.shirako.proxies.soapaxis2.SoapAxis?2Proxy.getServiceStub(SoapAxis?2Proxy.java:271)
at orca.shirako.proxies.soapaxis2.SoapAxis?2Return.execute(SoapAxis?2Return.java:64)
... 4 more

Caused by: org.apache.axis2.AxisFault?: Two services can not have same name, a service with ActorService?247100307 already exists in the system

at org.apache.axis2.engine.AxisConfiguration?.addServiceGroup(AxisConfiguration?.java:254)
at org.apache.axis2.engine.AxisConfiguration?.addService(AxisConfiguration?.java:206)
at org.apache.axis2.client.ServiceClient?.configureServiceClient(ServiceClient?.java:128)
at org.apache.axis2.client.ServiceClient?.<init>(ServiceClient?.java:114)
at orca.shirako.proxies.soapaxis2.services.ActorServiceStub?.<init>(ActorServiceStub?.java:95)
at orca.shirako.proxies.soapaxis2.StubManager?.createStub(StubManager?.java:110)
at orca.shirako.proxies.soapaxis2.StubManager?.getStub(StubManager?.java:83)
at orca.shirako.proxies.soapaxis2.SoapAxis?2Proxy.getServiceStub(SoapAxis?2Proxy.java:269)
... 5 more

Changed 4 years ago by ibaldin

Also seen as stuff like this in pequod:

pequod:show>show reservations for current actor psc-broker
Reservations for slice 7cabd0c7-4d62-4046-91d8-ad1ac217c395:
36eaaab2-560c-4577-b4d3-903c67d0ab75 psc-broker

Slice: 7cabd0c7-4d62-4046-91d8-ad1ac217c395
500 pscvmsite.vlan [ ticketed, nascent]
Notices: Reservation 36eaaab2-560c-4577-b4d3-903c67d0ab75 (Slice psc-broker) is in state [Ticketed,None]
Start: Fri Jan 29 19:00:00 EST 2010 End:Wed Jan 29 19:00:00 EST 2031

c12dc53f-cccb-4925-bfbb-63d9a04786e2 psc-broker

Slice: 7cabd0c7-4d62-4046-91d8-ad1ac217c395
52 pscvmsite.vm [ ticketed, nascent]
Notices: Reservation c12dc53f-cccb-4925-bfbb-63d9a04786e2 (Slice psc-broker) is in state [Ticketed,None]
Start: Fri Jan 29 19:00:00 EST 2010 End:Wed Jan 29 19:00:00 EST 2031

dc793afd-3df7-42fe-ba1f-124240fed8fd psc-broker

Slice: 7cabd0c7-4d62-4046-91d8-ad1ac217c395
0 6e69b6ad-5cf1-43b7-b007-71fbd7812a08 [ failed, nascent]
Notices: Reservation dc793afd-3df7-42fe-ba1f-124240fed8fd (Slice psc-broker) is in state [Failed,None], err=Failing reservation due to non-recoverable RPC error (LocalError?), message=orca.shirako.proxies.soapaxis2.SoapAxis?2StubException: An error occurred while obtaining service stub, stack=Exception stack trace:

orca.shirako.proxies.soapaxis2.SoapAxis?2BrokerProxy.execute(SoapAxis?2BrokerProxy.java:91)
orca.shirako.proxies.soapaxis2.SoapAxis?2AuthorityProxy.execute(SoapAxis?2AuthorityProxy.java:109)
orca.shirako.kernel.RPCManager$RPCExecutor.run(RPCManager.java:808)
java.util.concurrent.ThreadPoolExecutor?.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor?$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

Changed 4 years ago by ibaldin

  • cc vjo added

Changed 4 years ago by ibaldin

From Victor (for UMass rack, after getting multiple consecutive failures in claiming storage):

OK - between:
1) Clearing foreign keys out of the keystores
2) Starting AM/Broker container while SM container not running
I was able to get everything online.
Neither of those may have anything to do with anything
but - that’s what I did, before it

Changed 4 years ago by ibaldin

  • status changed from new to closed
  • resolution set to duplicate
Note: See TracTickets for help on using tickets.