Skip to content

Locust master + 130 locust workers, all tests suddenly fail  #13

@razghe

Description

@razghe

I have a locust setup running on a 3 minion cluster. While watching the cpu usage of the minions and running 1300 users simulated and hatch rate 10 on 130 workers, suddenly all tests stop working and record only failure.

Type Name # requests # fails Median Average Min Max Content Size # reqs/sec
POST /login 0 352 0 0 0 0 0 0
POST /metrics 0 367376 0 0 0 0 0 0
Total 0 367728 0 0 0 0 0 0

I expected that something went wrong on the machines, but all workers are running and locust master us accessible without any issue:


....
locust-4069721582-xfcky     1/1       Running   4          20h       172.20.52.6     razvan-kube-minion1.openstack.local
locust-4069721582-xwwa6     1/1       Running   0          1h        172.20.140.35   razvan-kube-minion2.openstack.local
locust-4069721582-y3ij0     1/1       Running   0          1h        172.20.52.46    razvan-kube-minion1.openstack.local
locust-4069721582-y93zt     1/1       Running   0          20h       172.20.50.7     razvan-kube-minion0.openstack.local
locust-4069721582-yhjce     1/1       Running   0          20h       172.20.140.17   razvan-kube-minion2.openstack.local
locust-4069721582-ynj9r     1/1       Running   0          20h       172.20.52.23    razvan-kube-minion1.openstack.local
locust-4069721582-z3yte     1/1       Running   0          1h        172.20.52.36    razvan-kube-minion1.openstack.local
locust-4069721582-z5s3r     1/1       Running   0          20h       172.20.52.20    razvan-kube-minion1.openstack.local
locust-4069721582-z9k5l     1/1       Running   0          20h       172.20.140.12   razvan-kube-minion2.openstack.local
locust-4069721582-zkn79     1/1       Running   0          20h       172.20.50.19    razvan-kube-minion0.openstack.local
locust-4069721582-zkq1l     1/1       Running   1          20h       172.20.52.12    razvan-kube-minion1.openstack.local
locust-4069721582-zr8ox     1/1       Running   0          20h       172.20.140.15   razvan-kube-minion2.openstack.local
locust-4069721582-zt6e8     1/1       Running   0          1h        172.20.50.40    razvan-kube-minion0.openstack.local
locust-4069721582-zwpu2     1/1       Running   0          20h       172.20.140.14   razvan-kube-minion2.openstack.local
locust-master-wxpwd         1/1       Running   0          21h       172.20.140.86   razvan-kube-minion2.openstack.local

I presumed that the network has some issues so I pinged the TARGET_HOST=http://workload-simulation-webapp.appspot.com, the workers can ping the host

host-44-11-1-22:~ # kubectl exec locust-4069721582-zt6e8 -- ping -c 3 google.com 
PING google.com (74.125.133.113): 56 data bytes
64 bytes from 74.125.133.113: icmp_seq=0 ttl=42 time=104.528 ms
64 bytes from 74.125.133.113: icmp_seq=1 ttl=42 time=70.861 ms
64 bytes from 74.125.133.113: icmp_seq=2 ttl=42 time=71.639 ms
--- google.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 70.861/82.343/104.528/15.691 ms
host-44-11-1-22:~ # 

host-44-11-1-22:~ # kubectl exec locust-4069721582-zt6e8 -- ping -c 3 workload-simulation-webapp.appspot.com
PING appspot.l.google.com (74.125.133.141): 56 data bytes
64 bytes from 74.125.133.141: icmp_seq=0 ttl=42 time=14.486 ms
64 bytes from 74.125.133.141: icmp_seq=1 ttl=42 time=215.952 ms
64 bytes from 74.125.133.141: icmp_seq=2 ttl=42 time=14.178 ms
--- appspot.l.google.com ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max/stddev = 14.178/81.539/215.952/95.045 ms
host-44-11-1-22:~ # 

The kubectl logs on a random workers report a normal status:

.host-44-11-1-22:~ # kubectl logs locust-4069721582-y93zt
/usr/local/bin/locust -f /locust-tasks/tasks.py --host=http://workload-simulation-webapp.appspot.com --slave --master-host=172.20.140.86
[2016-11-28 14:18:21,053] locust-4069721582-y93zt/INFO/locust.main: Starting Locust 0.7.2
[2016-11-28 15:38:18,295] locust-4069721582-y93zt/INFO/locust.runners: Hatching and swarming 5 clients at the rate 0.1 clients/s...
[2016-11-28 15:39:08,468] locust-4069721582-y93zt/INFO/locust.runners: All locusts hatched: MetricsLocust: 5
[2016-11-28 15:39:08,469] locust-4069721582-y93zt/INFO/locust.runners: Resetting stats

[2016-11-29 08:07:14,152] locust-4069721582-y93zt/INFO/locust.runners: Hatching and swarming 5 clients at the rate 0.1 clients/s...
[2016-11-29 08:08:04,239] locust-4069721582-y93zt/INFO/locust.runners: All locusts hatched: MetricsLocust: 5
[2016-11-29 08:08:04,241] locust-4069721582-y93zt/INFO/locust.runners: Resetting stats

[2016-11-29 09:06:41,863] locust-4069721582-y93zt/INFO/locust.runners: Hatching and swarming 10 clients at the rate 0.0769231 clients/s...
[2016-11-29 09:08:52,472] locust-4069721582-y93zt/INFO/locust.runners: All locusts hatched: MetricsLocust: 10
[2016-11-29 09:08:52,504] locust-4069721582-y93zt/INFO/locust.runners: Resetting stats

[2016-11-29 10:14:32,405] locust-4069721582-y93zt/INFO/locust.runners: Hatching and swarming 10 clients at the rate 0.0769231 clients/s...
[2016-11-29 10:16:43,046] locust-4069721582-y93zt/INFO/locust.runners: All locusts hatched: MetricsLocust: 10
[2016-11-29 10:16:43,145] locust-4069721582-y93zt/INFO/locust.runners: Resetting stats

host-44-11-1-22:~ # 

What cold be the issue here? I run out of ideas :)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions