python - redis.exceptions.ConnectionError after approximately one day celery running -


this full trace:

    traceback (most recent call last):   file "/home/server/backend/venv/lib/python3.4/site-packages/celery/app/trace.py", line 283, in trace_task     uuid, retval, success, request=task_request,   file "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 256, in store_result     request=request, **kwargs)   file "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/base.py", line 490, in _store_result     self.set(self.get_key_for_task(task_id), self.encode(meta))   file "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 160, in set     return self.ensure(self._set, (key, value), **retry_policy)   file "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 149, in ensure     **retry_policy   file "/home/server/backend/venv/lib/python3.4/site-packages/kombu/utils/__init__.py", line 243, in retry_over_time     return fun(*args, **kwargs)   file "/home/server/backend/venv/lib/python3.4/site-packages/celery/backends/redis.py", line 169, in _set     pipe.execute()   file "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2593, in execute     return execute(conn, stack, raise_on_error)   file "/home/server/backend/venv/lib/python3.4/site-packages/redis/client.py", line 2447, in _execute_transaction     connection.send_packed_command(all_cmds)   file "/home/server/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 532, in send_packed_command     self.connect()   file "/home/pserver/backend/venv/lib/python3.4/site-packages/redis/connection.py", line 436, in connect     raise connectionerror(self._error_message(e)) redis.exceptions.connectionerror: error 0 connecting localhost:6379. error. [2016-09-21 10:47:18,814: warning/worker-747] data collector not contactable. can because of network issue or because of data collector being restarted. in event contact cannot made after period of time please report problem new relic support further investigation. error raised connectionerror(protocolerror('connection aborted.', blockingioerror(11, 'resource temporarily unavailable')),). 

i searched connectionerror there no matching problem mine.

my platform ubuntu 14.04. part of redis config. (i can share if need whole redis.conf file. way parameters closed on limits section.)

# default redis listens connections network interfaces # available on server. possible listen 1 or multiple # interfaces using "bind" configuration directive, followed 1 or # more ip addresses. # # examples: # # bind 192.168.1.100 10.0.0.1 bind 127.0.0.1  # specify path unix socket used listen # incoming connections. there no default, redis not listen # on unix socket when not specified. # # unixsocket /var/run/redis/redis.sock # unixsocketperm 755  # close connection after client idle n seconds (0 disable) timeout 0  # tcp keepalive. # # if non-zero, use so_keepalive send tcp acks clients in absence # of communication. useful 2 reasons: # # 1) detect dead peers. # 2) take connection alive point of view of network #    equipment in middle. # # on linux, specified value (in seconds) period used send acks. # note close connection double of time needed. # on other kernels period depends on kernel configuration. # # reasonable value option 60 seconds. tcp-keepalive 60 

this mini redis wrapper:

import redis  django.conf import settings   redis_pool = redis.connectionpool(host=settings.redis_host, port=settings.redis_port)   def get_redis_server():     return redis.redis(connection_pool=redis_pool) 

and how use it:

from redis_wrapper import get_redis_server  # view , task working in different, indipendent processes  def sample_view(request):     rs = get_redis_server()     # get-set stuff redis    @shared_task def sample_celery_task():     rs = get_redis_server()     # get-set stuff redis 

package versions:

celery==3.1.18 django-celery==3.1.16 kombu==3.0.26 redis==2.10.3 

so problem that; connection error occurs after time of starting celery workers. , after first seem of error, tasks ends error until restart of celery workers. (interestingly, celery flower fails during problematic period)

i suspect of redis connection pool usage method, or redis configuration or less network issues. ideas reason? doing wrong?

(ps: add redis-cli info results when see error today)

update:

i temporarily solved problem adding --maxtasksperchild parameter worker starter command. set 200. ofcourse not proper way solve problem, symptomatic cure. refreshes worker instance periodically (closes old process , creates new 1 when old 1 reached 200 task) , refreshes global redis pool , connections. so think should focus on global redis connection pool usage way , i'm still waiting new ideas , comments.

sorry bad english , in advance.

have enabled rdb background save method in redis ??
if check size of dump.rdb file in /var/lib/redis.
file grows in size , fill root directory , redis instance cannot save file anymore.

you can stop background save process issuing
config set stop-writes-on-bgsave-error no
command on redis-cli


Comments

Popular posts from this blog

angular - Is it possible to get native element for formControl? -

unity3d - Rotate an object to face an opposite direction -

javascript - Why jQuery Select box change event is now working? -