Redis Keyspace Notification Monitoring use case
Recently I came across a scenario where our managed redis instance on azure was alerting high cpu usage during peek business hours. It is a single redis instance(not sure of more details of hosting as it is with infra team). While internally there are hundreds of service using this redis instance for caching, I was not sure what is being written and by which service. There was a same single connection string stored in multiple keyvault to which app had access. Drilling down which exact service and what data is being written on that particular key was difficult. We have redis monitor command to debug but it reduces throughput by 50% is what doc says
I had 2 options to investigate from redis server side
Since this was only for identification of the apps and getting the keys operation I opted for Keyspace notification
So as per the docs, I can find out if the key space notification setting is enabled or not using the command:
1config get notify-keyspace-events
And similarly to listen for most of the events as stated in the doc we can use
1config set notify-keyspace-events AEK
Now let's write some python code to listen for all the AEK events.
Create a new python environment and install redis package
1pip install redis==5.2.1
Python script as follows
1if __name__ == '__main__':
2 r_client = redis.Redis(host='localhost', port=6379, decode_responses=True)
3 pubsub = r_client.pubsub()
4 pubsub.psubscribe('__keyevent@0__:*')
5 while True:
6 message = pubsub.get_message()
7 if message:
8 print(f'key updates: {message}')
notice that I wanted to just listen for the event and not the space notification using following line
1pubsub.psubscribe('__keyevent@0__:*')
Now go back to terminal and exec some set commands
1set user_1 "Ashish"
2set user_2 "Nitesh"
3set user_3 "Gaurav"
4
5del user_3
Once we execute these commands we get following output:
1key updates: {'type': 'psubscribe', 'pattern': None, 'channel': '__keyevent@0__:*', 'data': 1}
2key updates: {'type': 'pmessage', 'pattern': '__keyevent@0__:*', 'channel': '__keyevent@0__:set', 'data': 'user_1'}
3key updates: {'type': 'pmessage', 'pattern': '__keyevent@0__:*', 'channel': '__keyevent@0__:set', 'data': 'user_2'}
4key updates: {'type': 'pmessage', 'pattern': '__keyevent@0__:*', 'channel': '__keyevent@0__:set', 'data': 'user_3'}
5key updates: {'type': 'pmessage', 'pattern': '__keyspace@0__:*', 'channel': '__keyspace@0__:user_3', 'data': 'del'}
where data here denotes the key that was used because we want to find out the key and later can identify what payload it is writing, so identifying key as the first step gives us good direction to drill down the application which is using this key.
Then we stored this data of keys in append only sql table and did a simple aggregation to find which keys is being written more time or being deleted frequently. Once we find out from which application this key belongs we can further drill down on how big the payload is.
Redis is for Cache, don't use it as primary database if you are relying on cloud managed instance 🤪