On August 26th, at 12:00 PM CDT, LEARN engineers started a warm restart of one of the waveservers located in Dallas Infomart. According to the vendor, a warm restart is not supposed to have a service impact. However, a license had expired on the waveserver and the warm restart brought down the services on that particular waveserver. This waveserver was one of the two waveservers that services the MUS-IX infrastructure between Dallas Akard and Infomart. The LEARN MPLS backbone also uses that same infrastructure between Akard and Infomart. The loss of the 100G triggered a cascade effect that had some unintended consequences. The MUS-IX infrastructure relies on an EVPN/VXLAN overlay network. A redundant path was built for this overlay network, but the outage caused the underlay network to also flap. This created an unstable environment that had a cascade effect on the MPLS network. Once the 100G circuit was restored, services restored, but the routes took a while to converge on some of the MX104 hardware in the Denton Loop, West Texas Loop, and Tyler, Beaumont, and Victoria spurs. To rectify the situation, LEARN is removing the redundant path for the MUS-IX EVPN overlay. It will be replaced by a protected 100G circuit instead. LEARN will also move the MPLS backbone off the MUS-IX infrastructure to a dedicated 100G between Akard and Infomart. When this is complete, we will schedule a failover test to make sure that an outage between Akard and Infomart doesn’t create the same network instability. We are also mandating that warm restarts will be scheduled after work hours during declared maintenance windows from this point forward. We apologize for the impact this outage had on your institutions. Thank you for your patience as we continually work to improve our processes and documentation. If you would like to discuss this in greater detail, please feel free to reach out to me directly. Subject: Re: Ticket# 56102 - Emergency maintenance on MUSIX/LONI Waveserver Unit - Completed UPDATE 2: Services have stabilized and should be normal. If you continue experiencing problems, please call the LEARN NOC at . UPDATE: There is a larger issue that is causing widespread stakeholder impacts. LEARN staff is aware and working to restore services. SUBJECT: Emergency maintenance on MUSIX/LONI Waveserver Unit AFFECTED: 100G MUSIX (Dallas AKARD-Dallas INFOMART) START TIME: Friday, August 26th, 2022, 1215H CST END TIME: Friday, August 26th, 2022, 1500H CST DESCRIPTION: LEARN will perform a warm reset on LONI/MUSIX waveserver units at Akard and Infomart. This is a nonservice effecting event. Visibility will be lost while the system restart, but traffic should remain unaffected. TICKET NO.: 56102 TIMESTAMP: Friday, August 26th, 2022, 1154H CST -- NOTE: LEARN Services may experience intermittent or no connectivity during the above stated period of time. This maintenance work is essential to ensure continued high quality network performance and stability. Replies to this list are dropped to avoid SPAMing the list. Please submit problems, requests, and questions to the LEARN NOC by calling with your ticket number. Thank You, The LEARN Network Operations Center Team