#11 - Netdev 0x18

PC conflicts

Jason Gunthorpe

Michael S. Tsirkin

Shannon Nelson

Submitted

Abstract

This talk explores our ongoing efforts to reduce the vDPA-net live migration downtime. Downtime in live migration represents the periods in which a device is not able to process packets due to the switchover procedure of live migration. This can cause large disturbances for applications running inside a virtual machine (timeouts, dropped packets, etc). Furthermore, the downtime can vary depending on the device characteristics, like the number of devices and queues.

First, concepts like VirtIO, vDPA and live migration are defined. Next, the vDPA device characteristics, the challenges they cause for live migration, and the solutions we developed are explored. Finally, future optimizations that are expected to considerably reduce live migration downtime are discussed.

vDPA is an abstraction layer that allows using the standardized VirtIO hardware datapath with a flexible software implementation of the control path. This enables vendors to use the existing control path implementation by simply writing an adaptation layer for the hardware configuration, while achieving full bandwidth for the dataplane.

The majority of the downtime is spent in two areas of the device initialization: dealing with memory maps and device configuration. We will show how we reduced or moved them out of the downtime window. On the source side we reduced the time for configuring the device to track the dirty memory [1]. On the destination side QEMU pre-warms the IOMMU [2] and configures the virtio-net features while the VM is still running at the source [3] [4]. The first changes already reduce the downtime from about 20 seconds to 200 milliseconds for a 128G VM with 2 net devices each with 2 combined queues [5].

It is expected to have a round table with vendors to discuss what is missing from VirtIO / vDPA after the talk.

[1] vq descriptor mapping (https://lore.kernel.org/lkml/20230928164550.980832-2-dtatulea@nvidia.com/T) [2] map memory ahead of downtime (Eugenio, (https://lists.nongnu.org/archive/html/qemu-devel/2024-01/msg02136.html) [3] Sending the device state ahead of time RFC (Yajun Wu, https://lore.kernel.org/qemu-devel/6c8ebb97-d546-3f1c-4cdd-54e23a566f61@nvidia.com/T/ ) [4] VQ suspend/resume (https://lore.kernel.org/virtualization/20231225151203.152687-1-dtatulea@nvidia.com) [5] qemu LM improvements RFC (Si-Wei, https://lore.kernel.org/qemu-devel/1701970793-6865-1-git-send-email-si-wei.liu@oracle.com)

Authors (blind)

Eugenio Perez Martin (RedHat) <eperezma@redhat.com>

Dragos Tatulea (NVIDIA) <dtatulea@nvidia.com>

Si-Wei Liu (Oracle) <si-wei.liu@oracle.com>

Submission Type

Talk

Submission Label

Nuts and Bolts

Talk Slides