Xen Issues

NR_IRQS

Running lots of networked Xen machines requires a large amount of virtual IRQs. If you run out of them you will see the message Kernel panic - not syncing: No available IRQ to bind to: increase NR_IRQS!. On a x86_64 kernel edit the file include/asm-x86_64/mach-xen/irq_vectors.h and increase #define NR_DYNIRQS to e.g. 1024.

Memory address space size

The default maximum of memory Xen can handle is 4GB. Use the Xen boot option max_addr=16G to increase it. Not sure if that is necessary.

Heap Size on X86_64

The heap size can only be adjusted on Xen X86_64 by setting the Xen boot option xenheap_megabytes. The default is 16, I set it to 64 for 254 machines. If you run out of heap space, the message (XEN) Cannot handle page request order 4! may appear.

Serial console

Serial console access over com2. Configure grub like this:

serial --unit=1 --speed=57600
terminal --timeout=10 serial

title Xen 3.0 / Debian / Kasuari
root    (hd0,0)
kernel /boot/xen-3.0-amd64.gz com2=57600,8n1 console=com2
module /boot/xen0-linux-kasuari root=/dev/sda1 ro noapic console=tty0 xencons=ttyS1 console=ttyS1
module /boot/xen0-linux-kasuari-initrd
boot

powernow-k8

CPU throttling does not work on my AMD64 X2. The screen gets flooded with kernel messages. Fortunately, there is a patch: http://lists.xensource.com/archives/html/xen-devel/2006-03/msg01410.html

Install sysfsutils and edit /etc/sysfs.conf. Set both CPUs to “ondemand” governor.

CPU clock cycling breaks ns2 nam! Disable it for now.

TX offload

Adding the checksum to a TCP or UDP packet in Linux is sometimes done by the NIC or NIC driver and sometimes by the kernel. This depends on a flag in the network interface which can be viewed using ethtool -k eth0. Xen netfront and netback drivers advertise this capability. Ethereal shows incorrect checksums and TCP or UDP retransmissions. TCP and UDP connections cannot be established between nodes when running the traffic through ns2, as incorrectly checksummed packets are apparently being discarded by ns2.

This patch for the Xen DomU and Dom0 kernel fixed that problem for me:

--- linux-2.6-xen-sparse/drivers/xen/netback/interface.c~       2006-05-31 17:02:08.000000000 +0200
+++ linux-2.6-xen-sparse/drivers/xen/netback/interface.c        2006-05-31 17:02:29.000000000 +0200
@@ -109,7 +109,7 @@
        dev->get_stats       = netif_be_get_stats;
        dev->open            = net_open;
        dev->stop            = net_close;
-       dev->features        = NETIF_F_IP_CSUM;
+       dev->features        = 0;

        SET_ETHTOOL_OPS(dev, &network_ethtool_ops);

--- linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c~       2006-05-31 17:02:46.000000000 +0200
+++ linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c        2006-05-31 17:03:00.000000000 +0200
@@ -1298,7 +1298,7 @@
        netdev->set_multicast_list = network_set_multicast_list;
        netdev->uninit          = netif_uninit;
        netdev->weight          = 64;
-       netdev->features        = NETIF_F_IP_CSUM;
+       netdev->features        = 0;

        SET_ETHTOOL_OPS(netdev, &network_ethtool_ops);
        SET_MODULE_OWNER(netdev);

All netdev features are explained here.

MTU > 1500

The NSE people (http://www-ivs.cs.uni-magdeburg.de/EuK/forschung/projekte/nse/) suggest to use a MTU of 2312 on the devices connecting to NS2 in a 802.11b simulation. They provide a kernel patch for the TUN/TAP device, which is of no use to Xen as Xen uses its own virtual interface driver instead of TUN/TAP. I'll stick with MTU 1500 for now.

This has been asked on the mailing list, but no answer was given.

Invalid argument

I see this error message after upgrading to the latest Xen unstable changeset 9925 (I did a “hg pull -u” today). With my previous Xen version (changeset 9800) I had no problems. Going back to 9800 is a workaround for now.

Console error message:

Error: (22, 'Invalid argument')

Config file:

kernel = ”/boot/vmlinuz-2.6.16-xenU”

memory = 80

/var/log/xend.log:

[2006-05-04 18:00:04 xend.XendDomainInfo] DEBUG (XendDomainInfo:1373) 
XendDomainInfo.destroy: domid=4
[2006-05-04 18:00:04 xend.XendDomainInfo] DEBUG (XendDomainInfo:1381) 
XendDomainInfo.destroyDomain(4)
[2006-05-04 18:00:04 xend] ERROR (xmlrpclib2:124) (22, 'Invalid argument')
Traceback (most recent call last):
  
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/util/
xmlrpclib2.py", 
line 103, in _marshaled_dispatch
    response = self._dispatch(method, params)
  File "/usr/lib/python2.3/SimpleXMLRPCServer.py", line 407, in _dispatch
    return func(*params)
  
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/server/
XMLRPCServer.py", 
line 63, in domain_create
    info = XendDomain.instance().domain_create(config)
  
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomain.py", 
line 228, in domain_create
    dominfo = XendDomainInfo.create(config)
  
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomainInfo.py", 
line 189, in create
    vm.initDomain()
  
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomainInfo.py", 
line 1269, in initDomain
    self.createChannels()
  
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomainInfo.py", 
line 1416, in createChannels
    self.store_port = self.createChannel()
  
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomainInfo.py", 
line 1424, in createChannel
    return xc.evtchn_alloc_unbound(dom=self.domid, remote_dom=0)
error: (22, 'Invalid argument')

Incorrect TCP checksums

In a setup dom0↔bridge↔domU, ping from 0 to U and vice versa works, but no SSH or other TCP traffic. Ethereal reports incorrect TCP checksums. Solution: on dom0 change the brigde's offload parameter ethtool -K xenbr0 tx off

Concurrency Issues

It seems that starting and shutting down many Xen VMs at the same time is not a good idea. On starting VMs simultaneously it occurs that some VMs cannot access the root fs on their block device. When shutting down, the kernel sometimes crashes, error messages indicate that the crash is related to cowloop. Lockfiles for cowloop operations don't fix this, as cowloop by itself is not the cause of these problems. It's rather a Xen/cowloop concurrency issue.

To work around this I found out it's relatively safe to boot the VMs sequentially (with a 2 second delay) and shut them down sequentially (with a 2 second delay).

Memory considerations

Dom0 memory needs to be at least 32MB for the hypervisor to work. Generally I found that a minimum of 128MB is required to avoid swapping.

Memory is not freed completely after shutting down VMs. I was able to start 69 machines with 12MB RAM each on a 1GB RAM host with 256MB Dom0 RAM. After shutting them down and starting another 69 VMs, the creation of some of them failed and later on the Dom0 kernel crashed.

 
xen/issues.txt · Last modified: 2006/08/18 19:00 by Christoph
 
Except where otherwise noted, content on this wiki is licensed under the following license:GNU Free Documentation License 1.2
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Run by Debian Driven by DokuWiki