I have a bunch of openstack VMs running on Grizzly. I need to change their domain which is currently managed by cloud-init. How do I update the user-data?
dmourati's questions
I'm running Openstack Grizzly on CentOS installed by Mirantis Fuel.
[root@controller-20 ~]# cat /etc/redhat-release
CentOS release 6.4 (Final)
[root@controller-20 ~]# rpm -qa | grep -i openstack-nova
openstack-nova-console-2013.1.1.fuel3.0-mira.2.noarch
openstack-nova-common-2013.1.1.fuel3.0-mira.2.noarch
openstack-nova-scheduler-2013.1.1.fuel3.0-mira.2.noarch
openstack-nova-conductor-2013.1.1.fuel3.0-mira.2.noarch
openstack-nova-objectstore-2013.1.1.fuel3.0-mira.2.noarch
openstack-nova-novncproxy-0.4-8.el6.noarch
openstack-nova-cert-2013.1.1.fuel3.0-mira.2.noarch
openstack-nova-api-2013.1.1.fuel3.0-mira.2.noarch
The topology is currently one controller node and three compute nodes, all running on modern Dell rackmount hardware. I've provisioned roughly 25 VMs today before the issues started.
For some reason, while creating/deleting VMs, a fixed IP got stuck in some indeterminate state. Now, I'm having trouble creating new VMs. Openstack tries to use IPs that it thinks are still part of an old VM and fails to build the VM.
My fixed network is 10.129.0.0/24.
Here's a list of the problem IPs from the nova-manage command line:
# nova-manage fixed list | grep -E 'network|WARNING' -A 1
network IP address hostname host
10.129.0.0/24 10.129.0.0 None None
--
WARNING: fixed ip 10.129.0.20 allocated to missing instance
10.129.0.0/24 10.129.0.20 None None
--
WARNING: fixed ip 10.129.0.23 allocated to missing instance
10.129.0.0/24 10.129.0.23 None None
--
WARNING: fixed ip 10.129.0.25 allocated to missing instance
10.129.0.0/24 10.129.0.25 None None
WARNING: fixed ip 10.129.0.26 allocated to missing instance
10.129.0.0/24 10.129.0.26 None None
WARNING: fixed ip 10.129.0.27 allocated to missing instance
10.129.0.0/24 10.129.0.27 None None
--
WARNING: fixed ip 10.129.0.30 allocated to missing instance
10.129.0.0/24 10.129.0.30 None None
WARNING: fixed ip 10.129.0.31 allocated to missing instance
10.129.0.0/24 10.129.0.31 None None
WARNING: fixed ip 10.129.0.32 allocated to missing instance
10.129.0.0/24 10.129.0.32 None None
WARNING: fixed ip 10.129.0.33 allocated to missing instance
10.129.0.0/24 10.129.0.33 None None
WARNING: fixed ip 10.129.0.34 allocated to missing instance
10.129.0.0/24 10.129.0.34 None None
WARNING: fixed ip 10.129.0.35 allocated to missing instance
10.129.0.0/24 10.129.0.35 None None
WARNING: fixed ip 10.129.0.36 allocated to missing instance
10.129.0.0/24 10.129.0.36 None None
WARNING: fixed ip 10.129.0.37 allocated to missing instance
10.129.0.0/24 10.129.0.37 None None
WARNING: fixed ip 10.129.0.38 allocated to missing instance
10.129.0.0/24 10.129.0.38 None None
WARNING: fixed ip 10.129.0.39 allocated to missing instance
10.129.0.0/24 10.129.0.39 None None
WARNING: fixed ip 10.129.0.40 allocated to missing instance
10.129.0.0/24 10.129.0.40 None None
WARNING: fixed ip 10.129.0.41 allocated to missing instance
10.129.0.0/24 10.129.0.41 None None
WARNING: fixed ip 10.129.0.42 allocated to missing instance
10.129.0.0/24 10.129.0.42 None None
WARNING: fixed ip 10.129.0.43 allocated to missing instance
10.129.0.0/24 10.129.0.43 None None
WARNING: fixed ip 10.129.0.44 allocated to missing instance
10.129.0.0/24 10.129.0.44 None None
WARNING: fixed ip 10.129.0.45 allocated to missing instance
10.129.0.0/24 10.129.0.45 None None
WARNING: fixed ip 10.129.0.46 allocated to missing instance
10.129.0.0/24 10.129.0.46 None None
--
WARNING: fixed ip 10.129.0.48 allocated to missing instance
10.129.0.0/24 10.129.0.48 None None
WARNING: fixed ip 10.129.0.49 allocated to missing instance
10.129.0.0/24 10.129.0.49 None None
WARNING: fixed ip 10.129.0.50 allocated to missing instance
10.129.0.0/24 10.129.0.50 None None
--
WARNING: fixed ip 10.129.0.52 allocated to missing instance
10.129.0.0/24 10.129.0.52 None None
WARNING: fixed ip 10.129.0.53 allocated to missing instance
10.129.0.0/24 10.129.0.53 None None
--
WARNING: fixed ip 10.129.0.55 allocated to missing instance
10.129.0.0/24 10.129.0.55 None None
WARNING: fixed ip 10.129.0.56 allocated to missing instance
10.129.0.0/24 10.129.0.56 None None
WARNING: fixed ip 10.129.0.57 allocated to missing instance
10.129.0.0/24 10.129.0.57 None None
--
WARNING: fixed ip 10.129.0.59 allocated to missing instance
10.129.0.0/24 10.129.0.59 None None
WARNING: fixed ip 10.129.0.60 allocated to missing instance
10.129.0.0/24 10.129.0.60 None None
WARNING: fixed ip 10.129.0.61 allocated to missing instance
10.129.0.0/24 10.129.0.61 None None
I know that the 10.129.0.20 IP marks the VM instantiation that started the problems. The issue manifests itself in a failure to provision new VMs.
[root@controller-20 ~]# nova --os-username demetri --os-tenant-name admin --os-auth-url http://localhost:5000/v2.0/ fixed-ip-get 10.129.0.20
OS Password:
+-------------+---------------+----------+-----------------------+
| address | cidr | hostname | host |
+-------------+---------------+----------+-----------------------+
| 10.129.0.20 | 10.129.0.0/24 | devdbl9 | compute-21.domain.tld |
+-------------+---------------+----------+-----------------------+
The nova-manage commands don't seem to offer any tool to reclaim these IPs. I've tried reserve/unreserve but that doesn't do the trick. Also, these IPs are represented in a nova mysql table called fixed_ips. Example:
+---------------------+---------------------+------------+-----+--------------+------------+-----------+--------+----------+----------------------+-----------------------+--------------------------------------+---------+
| created_at | updated_at | deleted_at | id | address | network_id | allocated | leased | reserved | virtual_interface_id | host | instance_uuid | deleted |
+---------------------+---------------------+------------+-----+--------------+------------+-----------+--------+----------+----------------------+-----------------------+--------------------------------------+---------+
| 2013-08-05 11:10:19 | 2013-10-16 11:32:20 | NULL | 21 | 10.129.0.20 | 1 | 0 | 0 | 0 | NULL | NULL | df2e9214-78cf-49d3-b256-e35d48818f29 | 0 |
To further confirm the problem relates to the fixed ip networking, the UI reflects an incrementing IP address for the VM say starting at .21, going to .22, going to .23 before ultimately failing with state "ERROR".
All this by way of saying, since this started happening, most but not all attempts to invoke a new VM fail. How can I troubleshoot this further and ultimately how can I return to smoothly provisioning new VMs?
Thanks.
We're experiencing a frustrating problem on our LAN. Periodically, DNS queries to our ISP nameservers timeout forcing a 5 second delay. Even if I bypass /etc/resolv.conf
by using a direct dig to one of our DNS servers, I still encounter the problem. Here's an example:
mv-m-dmouratis:~ dmourati$ time dig www.google.com @209.81.9.1
; <<>> DiG 9.8.3-P1 <<>> www.google.com @209.81.9.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14473
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 4, ADDITIONAL: 4
;; QUESTION SECTION:
;www.google.com. IN A
;; ANSWER SECTION:
www.google.com. 174 IN A 74.125.239.148
www.google.com. 174 IN A 74.125.239.147
www.google.com. 174 IN A 74.125.239.146
www.google.com. 174 IN A 74.125.239.144
www.google.com. 174 IN A 74.125.239.145
;; AUTHORITY SECTION:
google.com. 34512 IN NS ns2.google.com.
google.com. 34512 IN NS ns1.google.com.
google.com. 34512 IN NS ns3.google.com.
google.com. 34512 IN NS ns4.google.com.
;; ADDITIONAL SECTION:
ns2.google.com. 212097 IN A 216.239.34.10
ns3.google.com. 207312 IN A 216.239.36.10
ns4.google.com. 212097 IN A 216.239.38.10
ns1.google.com. 212096 IN A 216.239.32.10
;; Query time: 8 msec
;; SERVER: 209.81.9.1#53(209.81.9.1)
;; WHEN: Fri Jul 26 14:44:25 2013
;; MSG SIZE rcvd: 248
real 0m5.015s
user 0m0.004s
sys 0m0.002s
Other times, the queries respond instantly, as in under 20 ms or so. I've done a packet trace and discovered something interesting. The DNS server is responding but the client ignores the initial response, then sends a second identical query which is immediately responded to.
See packet trace. Note the identical source ports to the queries (62076).
Question: what is causing the first DNS query to fail?
UPDATE
Resources:
Packet trace:
http://www.cloudshark.org/captures/8b1c32d9d015
Dtruss (strace for mac):
https://gist.github.com/dmourati/6115180
Mountain Lion firewall is randomly delaying DNS requests from apple.stackexchange.com:
UPDATE 2
System Software Overview:
System Version: OS X 10.8.4 (12E55)
Kernel Version: Darwin 12.4.0
Boot Volume: Macintosh HD
Boot Mode: Normal
Computer Name: mv-m-dmouratis
User Name: Demetri Mouratis (dmourati)
Secure Virtual Memory: Enabled
Time since boot: 43 minutes
Hardware Overview:
Model Name: MacBook Pro
Model Identifier: MacBookPro10,1
Processor Name: Intel Core i7
Processor Speed: 2.7 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
Firewall Settings:
Mode: Limit incoming connections to specific services and applications
Services:
Apple Remote Desktop: Allow all connections
Screen Sharing: Allow all connections
Applications:
com.apple.java.VisualVM.launcher: Block all connections
com.getdropbox.dropbox: Allow all connections
com.jetbrains.intellij.ce: Allow all connections
com.skype.skype: Allow all connections
com.yourcompany.Bitcoin-Qt: Allow all connections
org.m0k.transmission: Allow all connections
org.python.python: Allow all connections
Firewall Logging: Yes
Stealth Mode: No
I have an IP number associated with an AWS Elastic IP (EIP). I'd like to know, what DNS records in my Route 53 domain are associated with that A record's IP.
In a traditional DNS service, I would run dig -x $EIP and get back a PTR record and be done.
Amazon only allows actual PTR records by filling out some form.
Otherwise, the PTR records point to amazonaws.com.
The API for Route 53 doesn't seem to support the dig -x approach either. Moreover, the data looks like XML which will make it a bit challenging from the command line.
So, how can I get this data?
We have a server that acts as a "dropbox" for (outside) users to upload data to us over sftp/ssh. We need to process these files (gpg decrypt, unzip, etc) as they come in. In the past, we simply processed each file in each users home directory without regard to whether we had already processed it. This turned out to be wasteful. I updated (rewrote) our processing script to rely on a mechanism like:
FILESTOPROCESS=$(find -H /home/$CUST -type f -newer /home/$CUST/marker-file)
This combined with a touch /home/$CUST/marker-file
has worked great and our workload was dramatically reduced.
A day or so ago, we had a configuration issue in our SSH server which temporarily disallowed users to upload files to us. When the script ran again, it overlooked a file the user initially failed to upload, but subsequently uploaded via sftp/ssh with a "-p" option, for preserve timestamp. This set the c/a/mtimes of the file to be a day or so older than the marker-file and so it was subsequently ignored.
I'd like to disallow users from uploading with "-p" so that files are created with current timestamps.
Can I do this in sshd_config?
We use keepalived to manage our Linux Virtual Server (LVS) load balancer. The LVS VIPs are setup to use a FWMARK as configured in iptables.
virtual_server fwmark 300000 {
delay_loop 10
lb_algo wrr
lb_kind NAT
persistence_timeout 180
protocol TCP
real_server 10.10.35.31 {
weight 24
MISC_CHECK {
misc_path "/usr/local/sbin/check_php_wrapper.sh 10.10.35.31"
misc_timeout 30
}
}
real_server 10.10.35.32 {
weight 24
MISC_CHECK {
misc_path "/usr/local/sbin/check_php_wrapper.sh 10.10.35.32"
misc_timeout 30
}
}
real_server 10.10.35.33 {
weight 24
MISC_CHECK {
misc_path "/usr/local/sbin/check_php_wrapper.sh 10.10.35.33"
misc_timeout 30
}
}
real_server 10.10.35.34 {
weight 24
MISC_CHECK {
misc_path "/usr/local/sbin/check_php_wrapper.sh 10.10.35.34"
misc_timeout 30
}
}
}
http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/LVS-HOWTO.fwmark.html
[root@lb1 ~]# iptables -L -n -v -t mangle
Chain PREROUTING (policy ACCEPT 182G packets, 114T bytes)
190M 167G MARK tcp -- * * 0.0.0.0/0 w1.x1.y1.4 multiport dports 80,443 MARK set 0x493e0
62M 58G MARK tcp -- * * 0.0.0.0/0 w1.x1.y2.4 multiport dports 80,443 MARK set 0x493e0
[root@lb1 ~]# ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
FWM 300000 wrr persistent 180
-> 10.10.35.31:0 Masq 24 1 0
-> dis2.domain.com:0 Masq 24 3 231
-> 10.10.35.33:0 Masq 24 0 208
-> 10.10.35.34:0 Masq 24 0 0
At the time the realservers were setup, there was a misconfigured dns for some hosts in the 10.10.35.0/24 network. Thereafter, we fixed the DNS. However, the hosts continue to show up as only their IP numbers (10.10.35.31,10.10.35.33,10.10.35.34) above.
[root@lb1 ~]# host 10.10.35.31 31.35.10.10.in-addr.arpa domain name pointer dis1.domain.com.
OS is CentOS 6.3. Ipvsadm is ipvsadm-1.25-10.el6.x86_64. kernel is kernel-2.6.32-71.el6.x86_64. Keepalived is keepalived-1.2.7-1.el6.x86_64.
How can we get ipvsadm -L to list all realservers by their proper hostnames?
We run a web application serving up web APIs for an increasing number of clients. To start, the clients were generally home, office, or other wireless networks submitting chunked http uploads to our API. We've now branched out into handling more mobile clients. The files ranging from a few k to several gigs, broken down into smaller chunks and reassembled on our API.
Our current load balancing is performed at two layers, first we use round robin DNS to advertise multiple A records for our api.company.com address. At each IP, we host a Linux LVS: http://www.linuxvirtualserver.org/, load-balancer that looks at the source IP address of a request to determine which API server to hand the connection to. This LVS boxes are configured with heartbeatd to take-over external VIPs and internal gateway IPs from one another.
Lately, we've seen two new error conditions.
The first error is where clients are oscillating or migrating from one LVS to another, mid-upload. This in turn causes our load balancers to lose track of the persistent connection and send the traffic to a new API server, thereby breaking the chunked upload across two or more servers. Our intent was for the Round Robin DNS TTL value for our api.company.com (which we've set at 1 hour) to be honored by the downstream caching nameservers, OS caching layers, and client application layers. This error occurs for approximately 15% of our uploads.
The second error we've seen much less commonly. A client will initiate traffic to an LVS box and be routed to realserver A behind it. Thereafter, the client will come in via a new source IP address, which the LVS box does not recognize, thereby routing ongoing traffic to realserver B also behind that LVS.
Given our architecture as described in part above, I'd like to know what are people's experiences with a better approach that will allow us to handle each of the error cases above more gracefully?
Edit 5/3/2010:
This looks like what we need. Weighted GSLB hashing on the source IP address.