Running lots of networked Xen machines requires a large amount of virtual IRQs. If you run out of them you will see the message Kernel panic - not syncing: No available IRQ to bind to: increase NR_IRQS!. On a x86_64 kernel edit the file include/asm-x86_64/mach-xen/irq_vectors.h and increase #define NR_DYNIRQS to e.g. 1024.
The default maximum of memory Xen can handle is 4GB. Use the Xen boot option max_addr=16G to increase it. Not sure if that is necessary.
The heap size can only be adjusted on Xen X86_64 by setting the Xen boot option xenheap_megabytes. The default is 16, I set it to 64 for 254 machines. If you run out of heap space, the message (XEN) Cannot handle page request order 4! may appear.
Serial console access over com2. Configure grub like this:
serial --unit=1 --speed=57600 terminal --timeout=10 serial title Xen 3.0 / Debian / Kasuari root (hd0,0) kernel /boot/xen-3.0-amd64.gz com2=57600,8n1 console=com2 module /boot/xen0-linux-kasuari root=/dev/sda1 ro noapic console=tty0 xencons=ttyS1 console=ttyS1 module /boot/xen0-linux-kasuari-initrd boot
CPU throttling does not work on my AMD64 X2. The screen gets flooded with kernel messages. Fortunately, there is a patch: http://lists.xensource.com/archives/html/xen-devel/2006-03/msg01410.html
Install sysfsutils and edit /etc/sysfs.conf. Set both CPUs to “ondemand” governor.
CPU clock cycling breaks ns2 nam! Disable it for now.
Adding the checksum to a TCP or UDP packet in Linux is sometimes done by the NIC or NIC driver and sometimes by the kernel. This depends on a flag in the network interface which can be viewed using ethtool -k eth0. Xen netfront and netback drivers advertise this capability. Ethereal shows incorrect checksums and TCP or UDP retransmissions. TCP and UDP connections cannot be established between nodes when running the traffic through ns2, as incorrectly checksummed packets are apparently being discarded by ns2.
This patch for the Xen DomU and Dom0 kernel fixed that problem for me:
--- linux-2.6-xen-sparse/drivers/xen/netback/interface.c~ 2006-05-31 17:02:08.000000000 +0200
+++ linux-2.6-xen-sparse/drivers/xen/netback/interface.c 2006-05-31 17:02:29.000000000 +0200
@@ -109,7 +109,7 @@
dev->get_stats = netif_be_get_stats;
dev->open = net_open;
dev->stop = net_close;
- dev->features = NETIF_F_IP_CSUM;
+ dev->features = 0;
SET_ETHTOOL_OPS(dev, &network_ethtool_ops);
--- linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c~ 2006-05-31 17:02:46.000000000 +0200
+++ linux-2.6-xen-sparse/drivers/xen/netfront/netfront.c 2006-05-31 17:03:00.000000000 +0200
@@ -1298,7 +1298,7 @@
netdev->set_multicast_list = network_set_multicast_list;
netdev->uninit = netif_uninit;
netdev->weight = 64;
- netdev->features = NETIF_F_IP_CSUM;
+ netdev->features = 0;
SET_ETHTOOL_OPS(netdev, &network_ethtool_ops);
SET_MODULE_OWNER(netdev);
All netdev features are explained here.
The NSE people (http://www-ivs.cs.uni-magdeburg.de/EuK/forschung/projekte/nse/) suggest to use a MTU of 2312 on the devices connecting to NS2 in a 802.11b simulation. They provide a kernel patch for the TUN/TAP device, which is of no use to Xen as Xen uses its own virtual interface driver instead of TUN/TAP. I'll stick with MTU 1500 for now.
This has been asked on the mailing list, but no answer was given.
I see this error message after upgrading to the latest Xen unstable changeset 9925 (I did a “hg pull -u” today). With my previous Xen version (changeset 9800) I had no problems. Going back to 9800 is a workaround for now.
Console error message:
Error: (22, 'Invalid argument')
Config file:
kernel = ”/boot/vmlinuz-2.6.16-xenU”
memory = 80
/var/log/xend.log:
[2006-05-04 18:00:04 xend.XendDomainInfo] DEBUG (XendDomainInfo:1373)
XendDomainInfo.destroy: domid=4
[2006-05-04 18:00:04 xend.XendDomainInfo] DEBUG (XendDomainInfo:1381)
XendDomainInfo.destroyDomain(4)
[2006-05-04 18:00:04 xend] ERROR (xmlrpclib2:124) (22, 'Invalid argument')
Traceback (most recent call last):
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/util/
xmlrpclib2.py",
line 103, in _marshaled_dispatch
response = self._dispatch(method, params)
File "/usr/lib/python2.3/SimpleXMLRPCServer.py", line 407, in _dispatch
return func(*params)
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/server/
XMLRPCServer.py",
line 63, in domain_create
info = XendDomain.instance().domain_create(config)
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomain.py",
line 228, in domain_create
dominfo = XendDomainInfo.create(config)
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomainInfo.py",
line 189, in create
vm.initDomain()
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomainInfo.py",
line 1269, in initDomain
self.createChannels()
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomainInfo.py",
line 1416, in createChannels
self.store_port = self.createChannel()
File "/usr/src/xen-unstable.hg/dist/install/usr/lib/python/xen/xend/
XendDomainInfo.py",
line 1424, in createChannel
return xc.evtchn_alloc_unbound(dom=self.domid, remote_dom=0)
error: (22, 'Invalid argument')
In a setup dom0↔bridge↔domU, ping from 0 to U and vice versa works, but no SSH or other TCP traffic. Ethereal reports incorrect TCP checksums. Solution: on dom0 change the brigde's offload parameter ethtool -K xenbr0 tx off
It seems that starting and shutting down many Xen VMs at the same time is not a good idea. On starting VMs simultaneously it occurs that some VMs cannot access the root fs on their block device. When shutting down, the kernel sometimes crashes, error messages indicate that the crash is related to cowloop. Lockfiles for cowloop operations don't fix this, as cowloop by itself is not the cause of these problems. It's rather a Xen/cowloop concurrency issue.
To work around this I found out it's relatively safe to boot the VMs sequentially (with a 2 second delay) and shut them down sequentially (with a 2 second delay).
Dom0 memory needs to be at least 32MB for the hypervisor to work. Generally I found that a minimum of 128MB is required to avoid swapping.
Memory is not freed completely after shutting down VMs. I was able to start 69 machines with 12MB RAM each on a 1GB RAM host with 256MB Dom0 RAM. After shutting them down and starting another 69 VMs, the creation of some of them failed and later on the Dom0 kernel crashed.