MAP4960 Recovery actions for special PCIe-related I/O enclosure errors (Model 941)

This MAP calls for SRCs that require special repair actions to be performed by the service representative or the next level of support.

MAP4960 Section-1

Procedure

  1. Does the FRU list, in the serviceable event, that sent you here contain a symbolic FRU similar to "Invalid-MTMS-cpssebay**"?
  2. When the FRU list contains a symbolic FRU similar to "Invalid-MTMS-cpssebay**" the location code is invalid and cannot be used to determine the failing I/O enclosure.
  3. To determine the cpssebay** value from the symbolic FRU of Invalid-MTMS-cpssebay**, use the first column of Table 1.
    Table 1. Symbolic FRU location to type-location translation
    Symbolic FRU location code Location Type location I/O enclosure number
    cpssebay00 1B1 1400-1B1 0
    cpssebay01 1B2 1400-1B2 1
    cpssebay02 1B3 1400-1B3 2
    cpssebay03 1B4 1400-1B4 3
    cpssebay04 2B1 1400-2B1 4
    cpssebay05 2B2 1400-2B2 5
    cpssebay06 2B3 1400-2B3 6
    cpssebay07 2B4 1400-2B4 7
  4. Determine the location code in Table 1, second column for the symbolic FRU location code in the FRU list.
  5. Translate the three-character location code from the prior step to a physical location of the I/O enclosure in the rack. See Figure 1.
    Figure 1. I/O enclosure locations in front of rack
    I/O enclosure locations in front of rack
  6. Determine the serial number of the I/O enclosure by reading the MTMS label.
  7. Launch the Advanced System Management (ASM) menu:
    1. From the navigation area, click Storage Facility Management > Server view.
    2. Select the server associated with the serviceable event that sent you here.
    3. From the bottom Task area, click Operations > Launch Advanced System Management (ASM).
    4. On the launch ASM interface confirmation, click OK.
    5. The management console Web browser will be launched, and the ASM login panel will appear.
  8. Log in as admin with a password of admin2107.
    Notes:
    1. If you are logged in and not active for 15 minutes, your session expires.
    2. If you make five invalid login attempts, your user account is locked out for five minutes and none of the other accounts is affected.
  9. Reset the I/O enclosure MTMS from the ASM menu:
    1. Expand System Configuration.
    2. Select Configure IO Enclosures.
    3. Observe the Type-Model column in the displayed Enclosure Configuration table.
    4. Find the row that has the Type-Model determined from step 3.
    5. Select the radio button for that I/O enclosure.
    6. Click Change settings.
    7. Modify the Type-Model field to match and Type-Location field in Table 1 for the I/O enclosure.
    8. Modify the Serial number field to match the serial number read from the I/O enclosure machine/type/model/serial number label.
    9. Click on Save Settings.
  10. Update the HMC microcode objects for the I/O enclosure machine/type/model/serial number by using a pseudo repair of the PCIe and SPCN card FRU, which will cause the I/O enclosure to be power cycled.
    1. From the navigation area, click Storage Facility Management > storage facility.
    2. From the Task area, click Exchange Parts > Exchange IO Enclosure and Components.
    3. Click Show I/O Enclosures and select the enclosure location.
    4. Click Show FRUS.
    5. Select I/O Enclosure PCIe/SPCN Card and then click Exchange FRU.
    6. When prompted to replace the FRU, do not disconnect the PCIe and SPCN cables from the card. Do not remove the card. Verify the card is properly seated.

      Continue with the repair.

    7. If the repair is successful, exit this MAP and ensure that any related serviceable events are closed.
    8. If the repair fails with the same error, replace the I/O enclosure PCIe / SPCN card.

MAP4960 Section-2

Procedure

  1. Find your SRC in Table 2.
    Table 2. Repair actions for special SRCs
    SRCs that require special repairs
    SRCs Action
    BE1E2546
    1. If this serviceable event was logged during an MES to install host adapter(s), go to the next step. Otherwise contact your next level of support.
    2. The host adapter(s) you were installing should be listed in the FRU list of the serviceable event(s) that sent you here. Unlatch only the host adapter(s) you were installing in the affected I/O enclosure, and remove them from the I/O enclosure slot(s).
      Note: If multiple host adapters are affected, multiple serviceable events are generated, which could affect multiple I/O enclosures. In that case, you would remove all affected host adapters, then do the following pseudo-repair on the affected PCIe and SPCN cards, one I/O enclosure at a time.
    3. Use the Exchange Parts process to perform a pseudo-repair of this IO enclosure's PCIe and SPCN card. Refer to MAP2600 Pseudo FRU exchange needed to reset existing FRU) for more information.
    4. Run View Storage Facility State (end of call) to verify that all adapters in this I/O enclosure are available. Refer to MAP1100 View storage facility state (end of call) for more information.
    5. Re-attempt MES to install host adapter(s).
    BE370012 PCIe I/O enclosure discovery failure (missing I/O enclosure). Go to MAP4960 Section-4.
    BE38256B PCIe enclosure discovery/configuration failure. Could not initialize path from local server to I/O enclosure. Go to MAP4960 Section-3.
    BE38256C I/O enclosure FPGA update image corrupted on local server. Contact your next level of support.
    BE38256D PCIe I/O enclosure FPGA error. Contact your next level of support.
    BE38256E PCIe I/O enclosure MTMS unknown/invalid. Contact your next level of support.
    BE38256F PCIe I/O enclosure mis-cable detected. Go to MAP4960 Section-3.
    BE382572 Error occurred during I/O enclosure error data collection. Go to MAP4960 Section-3.
    BE38257B PCIe interface to PCIe I/O enclosure down. Go to MAP4960 Section-3.
    BE382563 Multi PCIe link degraded detected on the local server. Contact your next level of support.
    BE382566 PCIe I/O enclosure discovery/configuration failure. Go to MAP4960 Section-3.
    BE382567 Invalid server config. Contact your next level of support.
    BE382574 One LPAR cannot communicate the I/O enclosure; a system failover is required. Go to MAP4960 Section-3.
    BE382575 PCIe I/O enclosure discovery failure (missing an I/O enclosure). Go to MAP4960 Section-4.
    Any other SRC Contact your next level of support.
  2. Use the Action column entry to continue the repair.

MAP4960 Section-3

About this task

The serviceable event FRU list that sent you here contains one or more cables and possibly additional FRUs.
Important: Both ends of each PCIe cable will appear in the FRU list. Only the first cable location code is available to select for repair or replace for each cable in the FRU list. The subsequent CBLCONT location code shows where a cable continues to connect to, but is not available to select for repair or replace.

Procedure

  1. Inspect both ends of each PCIe cable listed in the FRU list.
    1. Do not plug or unplug the cable.
    2. Refer to Figure 2, Figure 3, and Figure 4 cabling diagrams based on the number of installed I/O enclosures in the machine. The CBLCONT location code listed is the port on the I/O enclosure where the cable is supposed to be connected.

      Based on the appropriate cable figure, visually check each end of the cable listed on the screen that sent you here to ensure that it is properly plugged into the correct connector.

    3. Observe the body of the cable to ensure that it is not damaged.
    Figure 2. Model 941, two I/O enclosures
    Model 941, two I/O enclosures
    Figure 3. Model 941, four I/O enclosures
    Model 941, four I/O enclosures
    Figure 4. Model 941, eight I/O enclosures
    Model 941, eight I/O enclosures
  2. Is the PCIe cable properly plugged and not damaged?
    • Yes, go to the next step.
    • No, go to step 5
  3. The cable is properly plugged and is not damaged.
    Did you reach this step after replacing both the I/O enclosure PCIe and SPCN card and the I/O enclosure backplane?
    • No, go to the next step.
    • Yes, a pseudo-repair of the PCIe and SPCN card might recover this condition. Continue with the following steps:
      1. Return to the screen that sent you here.
      2. To the question, "What was the result of using the service procedure from Infocenter?" click Problem not fixed and then click Next.
      3. To the question, "Did you exchange any parts,?" click No and then click Next.
      4. To the question, "Did you isolate the problem,"? click Yes and then click Next.
      5. The current repair action will be terminated, but the serviceable event will be left open.
        Use the Exchange Parts menu to perform a pseudo-repair of the I/O enclosure PCIe and SPCN card:
        • Storage Facility Management > > storage facility > > Exchange Parts

        You will remove I/O enclosure power when instructed to do so in the exchange procedure, but you do not need to uncable or remove the PCIe and SPCN card.
  4. The cable is properly plugged and is not damaged.
    The I/O enclosure PCIe and SPCN card and the I/O enclosure backplane were not both replaced.
    1. Return to the screen that sent you here.
    2. To the question, "What was the result of using the service procedure from Infocenter,?" click Problem not fixed and then click Next.
    3. To the question, "Did you exchange any parts,?" click No and then click Next.
    4. To the question, "Did you isolate the problem,"? click No and then click Next.
    5. The next FRU in the list will be displayed. Continue the repair by replacing the remaining FRUs until the problem is fixed. Exit this MAP.
  5. The cable is incorrectly plugged or damaged.
    1. Return to the screen that sent you here.
    2. To the question, "What was the result of using the service procedure from Infocenter,?" click Problem not fixed and then click Next.
    3. To the question, "Did you exchange any parts,?" click No and then click Next.
    4. To the question, "Did you isolate the problem,?" click No and then click Next.
    5. The next FRU in the list will be displayed. You are going to pretend that the other FRUs in the prior FRU list are not available onsite to be replaced.
    6. When asked if the FRU is available to be replaced, answer No. This will cause each FRU in the list to be displayed until the incorrectly plugged cable or the damaged cable is displayed.

      When the incorrectly plugged cable or the damaged cable is displayed, do a normal FRU replace.

    7. When the repair is complete, exit this MAP.

MAP4960 Section-4

Procedure

  1. Observe the FRU list in the serviceable event details that sent you here. It should include one or more of the following FRUs:
    • I/O enclosure PCIe and SPCN card
    • I/O enclosure backplane
  2. Display open serviceable events that need repair. Is there any other serviceable event with either FRUs determined in step 1 or with other FRUs such as power supply or fan from this I/O enclosure?
    • Yes, exit this MAP and attempt to repair that serviceable event first.

      If that repair does not correct this problem, return here and continue with the next step.

      If that repair does correct this problem, remember to also close this serviceable event.

    • No, go to the next step.
  3. Inspect both ends of both PCIe cables that are associated with the I/O enclosure listed in the FRU list, that is, intended to be connected to this I/O enclosure.
    1. Do not plug or unplug the cables.
    2. Refer to Figure 2, Figure 3, and Figure 4 cabling diagrams based on the number of installed I/O enclosures in the machine. Based on the appropriate cable figure, visually check each end of both cables intended to be connected to this I/O enclosure to see if they are properly plugged into the correct connector.
    3. Observe the body of the cable to ensure it is not damaged.
  4. Are the PCIe cables to the I/O enclosure properly plugged and not damaged?
    • Yes, go to the next step.
    • No, go to step 6.
  5. The cables are properly plugged and are not damaged.
    1. Return to the screen that sent you here.
    2. To the question, "What was the result of using the service procedure from Infocenter,?" click Problem not fixed and then click Next.
    3. To the question, "Did you exchange any parts,?" click No and then click Next.
    4. To the question, "Did you isolate the problem,?" click No and then click Next.
    5. The next FRU in the list will be displayed. Continue the repair by replacing the remaining FRUs until the problem is fixed.

      Exit this MAP.

  6. One or both cables is incorrectly plugged or damaged.
    1. Return to the screen that sent you here.
    2. To the question, "What was the result of using the service procedure from Infocenter,?" click Problem not fixed and then click Next.
    3. To the question, "Did you exchange any parts,?" click No and then click Next.
    4. To the question, "Did you isolate the problem,?" click No and then click Next.
    5. The next FRU in the list will be displayed. Continue the repair on this FRU, but when instructed to replace the FRU, do not replace that FRU, but instead replace the damaged cables connected to the I/O enclosure.
    6. If the repair completes successfully, exit this MAP. Otherwise, contact your next level of support.