Julien's questions -server

Julien

Asked: 2014-11-19 21:13:26 +0800 CST

Broken RabbitMQ cluster wont 'restart

4

I run RabbitMQ on 3 servers, same version of Erlang and RabbitMQ: RabbitMQ 3.4.1, Erlang 17.3

One node crashed on server 2. The two other nodes did not connect together:

server 1:

[CentOS-62-64-minimal ~]$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit@CentOS-62-64-minimal' ...
[{nodes,[{disc,['rabbit@CentOS-62-64-minimal',rabbit@de3,rabbit@mysql]}]},
 {running_nodes,['rabbit@CentOS-62-64-minimal']},
 {cluster_name,<<"rabbit@CentOS-62-64-minimal">>},
 {partitions,[]}]

server 3:

[de3 ~]$ sudo rabbitmqctl cluster_status
Cluster status of node rabbit@de3 ...
[{nodes,[{disc,['rabbit@CentOS-62-64-minimal',rabbit@de3,rabbit@mysql]}]},
 {running_nodes,[rabbit@de3]},
 {cluster_name,<<"rabbit@CentOS-62-64-minimal">>},
 {partitions,[]}]

After restarting and resetting rabbitmq on server 3, it finally connected to server1:

[CentOS-62-64-minimal ~]$ sudo rabbitmqctl cluster_status
Cluster status of node 'rabbit@CentOS-62-64-minimal' ...
[{nodes,[{disc,['rabbit@CentOS-62-64-minimal',rabbit@de3,rabbit@mysql]}]},
 {running_nodes,['rabbit@CentOS-62-64-minimal']},
 {cluster_name,<<"rabbit@CentOS-62-64-minimal">>},
 {partitions,[]}]

Why did the cluster "break" with just 1 node down? server 3 was working fine, but server 1 was not: "Queue is located on a server that is down".

As for server 2, it did not restart. After a manual restart, I cannot make it reconnect to the cluster, even after multiple reset and removing /var/lib/rabbitmq/mnesia/:

[root@mysql ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@mysql ...
[{nodes,[{disc,[rabbit@mysql]}]},
 {running_nodes,[rabbit@mysql]},
 {cluster_name,<<"[email protected]">>},
 {partitions,[]}]

[mysql ~]# rabbitmqctl stop_app
Stopping node rabbit@mysql ...
[root@mysql ~]# rabbitmqctl force_reset
Forcefully resetting node rabbit@mysql ...
[ysql ~]# rabbitmqctl join_cluster rabbit@CentOS-62-64-minimal
Clustering node rabbit@mysql with 'rabbit@CentOS-62-64-minimal' ...
Error: {ok,already_member}
[mysql ~]# rabbitmqctl start_app
Starting node rabbit@mysql ...
[mysql ~]# rabbitmqctl cluster_status
Cluster status of node rabbit@mysql ...
[{nodes,[{disc,[rabbit@mysql]}]},
 {running_nodes,[rabbit@mysql]},
 {cluster_name,<<"[email protected]">>},
 {partitions,[]}]

I have no idea what went wrong. Last time this happened, I upgraded RabbitMQ qnd Erlang to the latest version.

Julien

Asked: 2014-02-09 19:08:01 +0800 CST

Randomly slow MySQL queries

1

I know this type of question come often. But I have done a lot of research, tried a lot of different settings, but still have the same issue: queries that usually are very fast can take 3s to 5s seemingly randomly.

The server is an i7-3770 (8 cores) with 32GB RAM. The CPU usage is about 50% idle, not CPU spike. No swap used, free memory is about 10GB in average. I run mysql 5.5.32 on CentOS 6.

9GB of RAM has been allocated for MySQL, it uses about 2GB. All data should fit in memory (600MB of data, 700MB of index).

Number of queries per second in average (no real spike):

1.5 SELECT
0.2 UPDATE
0.05 INSERT

Here is an example of query that takes just a few ms, but sometimes more than 3s:

# Query_time: 4.337884  Lock_time: 0.050146 Rows_sent: 1  Rows_examined: 1
SELECT me.id, me.url, me.filename, me.instance_id, me.virtual_id, me.status, me.user_id, me.time_added, me.time_finished, me.priority, me.size, me.delay, me.flash_delay, me.tries, me.details, me.json_file, me.html, me.shots, me.shot_interval, me.screen_width, me.screen_height FROM Screenshots me WHERE ( me.id = '5992705' );

id is a primary key.

Although I have more SELECT than INSERT queries, I have more slow INSERT than SELECT

What I have tried and tested:

Make sure all required indexes are there, but no redundant ones and no one unused
no CPU spike at the time, no IO spike, no swap
2nd instance of MySQL as slave, most SELECT queries are done on the slave
remove and TEXT and equivalent data type
tune my.cnf

Tuning my.cnf helped a lot. I tried with query cache enabled and disabled, not much difference.

Using a slave for SELECT made things actually worst: I had fewer slow queries on the master, but they could go up to 12s!

Here is my current my.cf (with query cache in this case):

tmp_table_size                 = 32M
max_heap_table_size            = 32M
query_cache_type               = 1
query_cache_size               = 1M
thread_cache_size              = 50
open_files_limit               = 65535
table_definition_cache         = 1024
table_open_cache               = 4096

innodb_flush_method            = O_DIRECT
innodb_log_files_in_group      = 2
innodb_log_file_size           = 256M
innodb_log_buffer_size         = 8M
innodb_thread_concurrency      = 8
innodb_flush_log_at_trx_commit = 0
innodb_file_per_table          = 1
innodb_buffer_pool_size        = 9G

max_connections=1000
transaction-isolation          = READ-UNCOMMITTED 
innodb_locks_unsafe_for_binlog = 1
innodb_io_capacity             = 1000
innodb_change_buffering        = inserts
innodb_fast_shutdown           = 0

key_buffer_size                = 2G

I'm out of ideas. I could not find any patterns (frequency, interval, etc.) that would explain these slow queries.

Julien

Asked: 2013-09-25 18:03:54 +0800 CST

Why is $request_time sometimes much bigger than $upstream_response_time?

0

I have an HTTPS website where sometimes, for the same clients, the $request_time is 10x the $upstream_response_time, or even 100x. I understand the different between the 2 times: $request_time is the duration between the first byte received and the last byte sent.

Some users have told me they experienced a Connection Timeout, so I think these long $request_time are real problems.

These long $request_time happen for GET requests (typical request size: 185 bytes). The upstream is a fastcgi process. I wonder under which scenario the $request_time could be too high:

no fastcgi worker is accepting connection, $request_time includes the "waiting time" for a fastcgi process
The response is incorrect (wrong Content Length, chunked response) and the client is waiting for data that is not coming
SSL certificate: the client gets our SSL certificate, ask for the OCSP and finishes the SSL connection.

I wonder which options are actually possible and how I would find out what is actually creating long $request_time.

Julien

Asked: 2013-02-27 20:48:49 +0800 CST

Why does Nginx remove Content-Length header for chunked content?

12

I use nginx 1.2.3 to proxy to a script:

proxy_set_header Host $host;
proxy_pass http://127.0.0.1:8880;
proxy_buffering off;
proxy_read_timeout 300s;
gzip off;

The scripts sends both Transfer-encoding: chunked and Content-Length: 251:

HTTP/1.0 307 Temporary Redirect
Content-length: 251
Pragma: no-cache
Location: /...
Cache-control: no-cache
Transfer-encoding: chunked

I need both, but nginx automatically removes the Content-Length:

HTTP/1.1 302 Found
Server: nginx/1.2.3
Content-Type: application/json; charset=utf-8
Content-Length: 58
Connection: keep-alive
Location: /...

As a result, the clients do not wait for the chunks to be sent. This used to work with an earlier version of nginx.

Julien

Asked: 2012-01-03 22:21:48 +0800 CST

How to insert a delay in response with nginx?

3

I have a fast-cgi application that do some work at http://myapp.com/do?arg=x. The job may take a few minutes. So I want to keep redirecting the users to the same URL (with 307) until the job is done, and serve the results with a 200 OK. I would like to slow done the response rate, so that it takes the client 30s (for example) to receive a 307. Obviously, I don't want to do in the back-end because I need to handle other jobs. It is possible to tell nginx to insert a delay, or rate limit the response for a particular URL?

I've looked at limit_req, but I don't think it can be used to rate limit unique URLS (http://myapp.com/do?arg=x versus http://myapp.com/do?arg=y) but rather unique IP addresses.

Julien

Asked: 2011-02-25 14:33:53 +0800 CST

Which services to get multiple outbound Ip address?

1

Im looking for a service that would allow me to NAT HTTP traffic through multiple outbound IP address. I've seen services taht can offer ~100 different public IP addresses, and let users change it every minutes for example. Can you suggest one such provider?

Julien

Asked: 2010-08-03 22:26:07 +0800 CST

How to create fake sound card on Linux server

1

I'm running Firefox with Xvfb on a Linux server (CentOS 5.4). I alspo have the Flash plugin loaded, but it does not work because I do not have a sound card: $ alsamixer alsamixer: function snd_ctl_open failed for default: No such file or directory

It is possible to create a fake sound card on the server?

Julien

Asked: 2009-08-12 16:26:15 +0800 CST

Source nating into GRE tunnel

1

On a Linux box, I have create a GRE tunnel called gre1 172.17.1 -> 172.17.2. The Linux box IP is 10.10.100.100, the end point IP is 10.10.101.101.

I am trying to do a source NAT (NOT destination NAT) to tunnel the traffic going from the Linux box to actually go to the tunnel is the destination port is 80. I have tried things along these lines without success:

iptables -t nat -A OUTPUT -p tcp --dport 80 -j SNAT --to 172.17.1.1
iptables -t nat -A FORWARD -p tcp --dport 80 -j SNAT --to 172.17.1.1

Most examples I found for GRE tunneling is for DNAT, not SNAT. Any example that would work for my case?

Julien

Asked: 2009-06-28 19:37:45 +0800 CST

Cannot recover from failed RAID

3

My situation is different from this one.

I have a CentOS system with 3 hard drives, and the following software RAID arrays:

/boot on RAID 1 over 2 disks
/ on RAID 5 over 3 disks
swap on RAID 0 over 2 disks (I believe)

My 3rd drive failed. At the beginning, no big deal, the array was still working. But after 1 day, when I got ready to swap the bad disk, the system cannot boot anymore with the new disk in:

md: md2: raid array is not clean -- starting background reconstruction
raid5: cannot start dirty degraded array for md2
raid5: failed to run raid set md2
[...]
Kernel panic

It stops there. I have no shell. I've tried to but on the Rescue disk, but I don't know how to go from there: my arrays are not seen, so I cannot rebuild them. Exact same issue if I boot with 2 disks, or with the bad disk as my 3rd drive.

How can I fix the array now that I have a new drive?

Julien

Asked: 2009-06-10 14:39:02 +0800 CST

Experience in migrating from Apache to nginx?

7

I'd like to get some feedback about a migration From Apache to nginx. My goal is to reduce the memory footprint of the web server. Currently, I use the following modules.features on Apache:

multiple virtual hosts
Server Side Include
Fast CGI

Please share your experience: problems during migration, benefits after migration (was it worth it?), useful modules for nginx, etc.

Broken RabbitMQ cluster wont 'restart

Randomly slow MySQL queries

Why is $request_time sometimes much bigger than $upstream_response_time?

Why does Nginx remove Content-Length header for chunked content?

How to insert a delay in response with nginx?

Which services to get multiple outbound Ip address?

How to create fake sound card on Linux server

Source nating into GRE tunnel

Cannot recover from failed RAID

Experience in migrating from Apache to nginx?

Can you pass user/pass for HTTP Basic Authentication in URL parameters?

Ping a Specific Port

Check if port is open or closed on a Linux server?

How to automate SSH login with password?

How do I tell Git for Windows where to find my private RSA key?

What's the default superuser username/password for postgres after a new install?

What port does SFTP use?

Command line to list users in a Windows Active Directory group?

What is a Pem file and how does it differ from other OpenSSL Generated Key File Formats?

How to determine if a bash variable is empty?