Ticket #433 (closed task: fixed)

Opened 4 years ago

Last modified 4 years ago

UH head node drive replacement

Reported by: ibaldin Owned by: vjo
Priority: major Milestone:
Component: Infrastructure: Rack hardware Version: baseline
Keywords: Cc: vjo, jonmills

Description

One of the drives failed (in RAID1). Also the logical disk built on the RAID is off-line. Not clear if it will properly operate after the drive is replaced.

First we need to replace the failed disk and see if the logical drive comes back.

Requires opening a ticket with IBM.

Change History

Changed 4 years ago by ibaldin

Drive is live, however the data is lost due to fsck. Likely due to controller malfunction.

Changed 4 years ago by ibaldin

Need to reinstall it.

Changed 4 years ago by ibaldin

The head node is being reinstalled using software RAID, since we are no longer confident in the hardware RAID controller.

From Jonathan:

It's now possible, once again, to ssh to uh-hn.exogeni.net and be authenticated by LDAP. The system is puppetized, xCAT has been mostly set up. I need to fine-tune xCAT, and then I can start playing with OpenStack?.

Changed 4 years ago by ibaldin

From Jonathan:
UH is once again running OpenStack?. 7 of 8 worker nodes are up, puppetized, and *might* even have correct labels for dataplane ports. (Remember, UH chelsio ports a cable the exact opposite of FIU chelsios; if I could reach FIU, this game would be easy!)

The iSCSI storage is mounted on uh-hn, also.

Remaining items:
-- ORCA
-- Imageproxy
-- storage service

Changed 4 years ago by ibaldin

VLANs to UH appear to be working.

Changed 4 years ago by ibaldin

  • status changed from new to closed
  • resolution set to fixed

Rack back to normal

Note: See TracTickets for help on using tickets.