An introduction to Redis – PyCon Singapore 2014.

The following is the transcript of the talk “Redis – What, why and where” that I gave at PyCon Singapore 2014. You can find the slides down below. Try as I might, I was not able to embed the slides from slides.com. So I have shared the links.

My talk was on Friday, 20th June, 2014 at 1:00PM.

—-

Ladies and gentlemen,

Do you know what my prayer was the moment I knew I got my talk selected? That I would not be allocated a slot right after lunch. Yet here we are.

You must be wondering why a dude from India has come all the way over to Singapore and is giving a talk on Redis at a Python conference. Well, I believe you’ll have the answers to those questions by the time I am done with my talk. This is intended for a beginner level audience and as such, if you have already implemented redis in your stack, then you might be a little disappointed.

There are times when, in your Django web application, you need a certain specific data to be saved. Let me give you an example. Let us say you are gathering all the tweets for the Football World Cup. You hit the Twitter API and tweets are pouring in by the second. How do you keep a counter? Of course, put a Python variable in the loop and keep incrementing.

tweets = fetch_tweets(hashtag = "#WorldCup2014") #Use the Twython Library
count = 0
for tweet in tweets:
    entities = process_tweet(tweet)
    count = count + 1

The only problem is that if another process/view wants to display it, it won’t be able to access it.

Which means you should have persistence. If you’re using Postgres or any other SQL database for that matter, you could have a field that would allow you to keep the count or maybe do a count(*) on your Tweets model each time you want to get the total number of tweets.

#Assume you have defined a model Tweet
count = Tweet.objects.all().count()

The count(*) option is going to get your SQL query to execute quite slow once you have about 20000 rows or so.

#Assume you have defined a model Stat to store the count which has a field tweet_count
Stat.objects.get(hashtag = "#WorldCup2014").update(tweet_count = F('tweet_count') + 1)

The next option being to increment the count within the Postgres field. This has an immense potential to lead you into race conditions and thereby screwing up your count.

So a fast, reliable and persistent solution is to have redis. Believe it or not, you can use this as an actual Database because of its persistence. All you need to do is to get the redis server up and running on your machine, use the redis-py Python library to increment a “key” by one each time a new tweet comes in. You don’t even need to “initialize” the key. The increment command creates a key if it is not already present and increments it. Really neat. Hence, redis is a persistent key-value based NoSQL Data storage.

import redis #We are using the redis-py library
r = redis.StrictRedis()

tweets = fetch_tweets(hashtag = "#WorldCup2014")
for tweet in tweets:
    entities = process_tweet(tweet)
    r.incr("tweets_count", amount = 1)

count = r.get("tweets_count")

Now, persistence is not the only thing that makes Redis useful. Suppose you just don’t stop with counting tweets. You count the pictures, videos and other links form within them. Also, you are doing the same with Facebook as well. Now you have two sources and their corresponding fields. Intuitively, a dictionary comes to mind. Name of one dictionary would be “Twitter” and the other one “Facebook”. Each of them will have fields “statuses”, “photos”, “links”, etc.

Guess what? Redis has a dictionary data type and let’s you do exactly this. The various types of in-built data types that it provides is fantastic. People tend to call it the data structure server due to this reason.

import redis
r = redis.StrictRedis()

tweets = fetch_tweets(hashtag = "#WorldCup2014")
for tweet in tweets:
    entities = process_tweet(tweet)
    r.hincrby("Twitter", "tweets_count", amount = 1)
    if "photo" in entities:
        r.hincrby("Twitter", "photo_count", amount = 1)
    if "video" in entities:
        r.hincrby("Twitter", "video_count", amount = 1)
    if "link" in entities:
        r.hincrby("Twitter", "link_count", amount = 1)

twitter_photos_count = r.hget("Twitter", "photo_count")
...

posts = fetch_fb_posts(hashtag = "#WorldCup2014")
for post in posts:
    entities = process_post(post)
    r.hincrby("Facebook", "posts_count", amount = 1)
    if "photo" in entities:
        r.hincrby("Facebook", "photo_count", amount = 1)
    if "video" in entities:
        r.hincrby("Facebook", "video_count", amount = 1)
    if "link" in entities:
        r.hincrby("Facebook", "link_count", amount = 1)

fb_photos_count = r.hget("Facebook", "photo_count")
...

It supports 5 data types comprising of strings, sets, dictionaries, sorted sets and lists.

So, one, the persistence and two, the data types. These two are what makes Redis special.

Narcissism
———-

Oh and incidentally, I am Haris Ibrahim K. V. and I am from the southern most state of India called Kerala. I work as a Computer Science Engineer at a small company called Eventifier. I’ve been a Python developer only since the past 7 months and hence, have relatively lesser experience when it comes to programming. Although I have organized conferences and workshops by myself, as a part of my earlier job, this is my first ever talk at one. So there might be a few rusty edges. Do bare with me. Also, as a hobby and passion, I love writing.

Alright, enough with the narcissism. Let’s get back to business.

Redis stores its data in a Big In-Memory dictionary where they keys can only be strings, but the values can be any of the 5 data types that we mentioned earlier. Each of these data structures have their own implementation which will come to later. Let us go back to a few more use cases where you can use redis.

LEADER BOARD (using sorted sets)

Let’s talk about leader board. What I am trying to do here is to give you examples that cover all the 5 data structures that Redis provides so that you will know what to use where and why. Leader board. I am sure you are familiar with the concept of leaderboard, but for those among you who are not, it is place where the top 10 of something is shown. Top 10 or 20, it does not matter. But a list of entities sorted based on their rank.

An example should clarify this right away. Let’s go back to the football world cup example. The tweets are pouring in. Boy, reminds me of monsoon back at home. Anyway, You want to show the most retweeted tweets in descending order of their retweet count. This will give you an idea of what is trending for that particular hashtag. Now, what do you do? This is where the “sorted set” data type comes into picture. As the name suggests, it is a set, but sorted.

What is this sorted based on? Ah yes. So when you hear a sorted set, the picture that should come into your mind is a key with a value as a list of tuples. I use “tuples” in a loose sense. Once you have that picture in mind, this is how the structure would look like:

key: (score member) (score member)

All you need to do is to define a key called “trending_tweets” and then use the “zadd” redis command to specify the score as the number of retweets and the member as the “tweet text + username” or something.

import redis
r = redis.StrictRedis()

tweets = fetch_tweets(hashtag = "#WorldCup2014") #Use the Twython Library
count = 0
for tweet in tweets:
    entities = process_tweet(tweet)
    r.zadd("trending_tweets", tweet.retweet_count, tweet.text)

trending_tweets = r.zrange("trending_tweets", 0, -1)

You could also store the tweet ids as the members and just do a query on your SQL database to fetch tweets with those particular ids. This would work much better since sorted set is a set and it will be expensive to maintain uniqueness on members if they are huge chunks of text.

import redis
r = redis.StrictRedis()

tweets = fetch_tweets(hashtag = "#WorldCup2014") #Use the Twython Library
count = 0
for tweet in tweets:
    entities = process_tweet(tweet)
    t = Tweet.objects.create(tweet = tweet)
    r.zadd("trending_tweets", tweet.retweet_count, t.id)

trending_tweets = r.zrange("trending_tweets", 0, -1)
popular_tweet_list = []
for tweet_id in trending_tweets:
    popular_tweet_list.append(Tweet.objects.get(id = tweet_id))

To retrieve the top 10, use the “zrange” command and specify the indices. That should get you going.

CACHING (using list)

This introduces a new data type as well as a useful feature.

Redis allows you to set “expire” on certain keys. You can specify the key name and the number of seconds in which the key should expire. You might have already guessed it. Yes, you can implement a caching mechanism with this. The timeout remains valid as long as you only “alter” the keys using operations such as increment, add, etc. However, if you set the key once more or delete it, the deal is off. No timeout for you.

The way to implement this would be to first know what value want to be cached. Save that value into redis with a key. Call expire(key, seconds) and you’re done. What goes hand in hand with this is the TTL command. Known as Time To Live. As you could guess, this gives you the time left before a certain key expires. It returns -2 if the key has expired or -1 if an expire has not been set on the key to begin with. Pretty handy.

Let’s go back to the Football world cup tweets example once again. Suppose you want to showcase the photos that got retweeted the most every 5 minute or so. You might have to do something like fetching the popular tweets, get the corresponding photo url, push them into a list and set an expiry on that list’s name.

import redis
r = redis.StrictRedis()

tweets = fetch_tweets(hashtag = "#WorldCup2014") #Use the Twython Library
count = 0
for tweet in tweets:
    entities = process_tweet(tweet)
    t = Tweet.objects.create(tweet = tweet)
    r.zadd("trending_tweets", tweet.retweet_count, t.id)

trending_tweets = r.zrange("trending_tweets", 0, -1)
popular_tweet_list = []
for tweet_id in trending_tweets:
    popular_tweet_list.append(Tweet.objects.get(id = tweet_id))

if r.ttl("trending_photos") in [-1, -2]:
    for tweet in popular_tweet_list:
        r.rpush("trending_photos", tweet.media_url)
        trending_photos = r.lrange("trending_photos", 0, -1)
        r.expire("trending_photos", 120) #Expire in 2 minutes
else:
    trending_photos = r.lrange("trending_photos", 0, -1)

The list is a double ended list actually. You can insert at the left or the right. Accordingly you can pop from either side as well.

CREDITS

The first person whom I would like to thank is someone who deserves much more than me to be up on this stage and give this talk. However, he usually prefers to be behind the scenes, getting things done and motivate people to do things. He is my colleague and the CTO of the company I work for, Mr Nazim Zeeshan and there he is.

The second would be Sripathi. There is a company called HasGeek back in India who organizes technology conferences and workshops. They had organized a Redis miniconf recently where Sripathi gave a talk on Redis Memory optimization. What I am going to present next is from his inspiration.

Last but not the least, the PyCon Singapore team who organized and made this a reality. Kudos to them!

INTERNAL DATA TYPES

This is something that I picked up from what Sripathi explained. I confess I’m not an expert on this but thought it would spark a few minds if presented. Redis stores all that we talked about right now internally using 6 different data types.

Refer to the slides and video for this part.

—-

Slides:

http://slides.com/harisibrahimkv/redis-what-why-and-where

Video:

https://archive.org/details/IntroductionToRedis

Advertisements

A sunny Saturday at BeaglesLoft.

Siva sent me, Krace, Kartik and Sayan an email asking whether we would be available on the 7th of June to volunteer for the first offline Django meetup. I was only too happy to receive the invitation and replied saying “I believe I can make it”.

The next mail in my inbox is where I found TechBuilders. The email was from the BangPypers mailing list posted by someone called Niranjan. This is the link that was in the mail:

http://techbuildersbayesianreasoning.splashthat.com/

Even during my time at HasGeek last year, I used to keep wondering why isn’t there any learning related to Math happening among all these Computer geeks who were working on Python, JS, Ruby, etc. I even had a decent conversation regarding this with the one person whom I found to be interested in the Math aspect of computers. His name is Abhijith and we became friends at the Fifth Elephant conference last year when he signed up to volunteer for it.

Suffice to say, visiting that link, when I saw that these people were trying to bring Math and Computer Science together, I knew it was something that I could not miss at any cost. I sent Siva and the rest of them an email then and there itself saying I had stumbled upon this TechBuilders meeting and might not be able to make it for the Django workshop.

I love teaching and hence was extremely upset about missing the Django workshop. However, on the other hand, I felt like the TechBuilders people had read my mind. It was, as Paulo Coelho would say, a calling. I could not resist going. Also, I had to give up on my Saturday writing as well.

It was being hosted at Haggle’s office. The people working at Haggle were the ones behind BeaglesLoft (a playground for creators and innovators) and also behind TechBuilders, their initiative to teach the Bangalore tech community something that it is lacking. The office was just a 5 minute walk away from my home.

The mail which we received from Asya, the quick witted community manager at BeaglesLoft, on the day before had asked all of us to be there at the venue exactly at 10:30AM and not to follow the “Indian Standard Time”. Little did they realize the inevitable force they were reckoning with. The meeting started at 11:00AM.

The event was supposed to start off with Sandipan from JustDial giving a talk on how they were using Bayesian theorem at their company. Unfortunately, he had some emergency and could not make it. So Niranjan, who is the founder of Haggle, took the stage and started off by introducing us to what the whole deal was about.

The thing that I liked about Niranjan was that he was not pretentious. He really observed Math was not a part of the IT culture, along with the liberal arts being treated as a completely separate entity as well. He wanted to create an atmosphere where these things would co-exist and would value each other’s importance. There, he was doing it.

Not just that. I have heard many people twisting their words to indirectly mean “spread the word”. Niranjan directly told us to do it. His conviction to doing this impressed me. Apart from taking the initiative to build the community, I must say he is a really good teacher too. He taught me Math and that, is amazing.

If you were to meet me before my 4th year of college, I would have told you, without question, that I was going to become a Math teacher. So when he talked about Mass Probability function and the Bayesian theorem in a way that I could understand after more than 2 years of staying away from it, it felt really great.

You must read his series of blog posts on Bayesian Reasoning here: http://beaglesblog.tumblr.com/tagged/techbuilders

We were asked to read them before attending the meetup. Having been the college kid, I put it to the last moment as usual. An hour before the meetup! I finished off all the posts within 45 minutes and it was time to leave in order to reach the venue on time. That dreadful feeling of not having revised what you had learned that dawns upon you on the morning of the exam day was on me. I know, it is funny. But to know that it was not something to worry about, made me feel even more excited to attend the gathering.

Towards the end of his session, he proposed a few use cases where Bayesian reasoning could be applied so that we could break up into teams and work on modelling them.

One was about a Rikshaw driver. Suppose you were one and someone came and asked you to take him to Jayanagar, how would you apply Bayesian reasoning to know whether it would be profitable for you to take him there.

Second one was about the problem given on the blog itself, identifying a person whom you meet in the US as being from Bangalore or not.

The third one was the famous Monty Hall problem. Even though I say it is famous, it was the first time I was hearing of it. It is an interesting problem which makes you realize why Math ain’t your gut feeling. It is a bit crazy, but yeah, read it.

We decided to then split up into three teams of 5 each. The decision was followed by an interesting 5 minutes of trying to figure out an algorithm to split us up. Whether the count should start from 1 and go until 5 before the 15 us were through or whether it should start form 1 until 3 until all of us were through. The confusion was funny enough to have while we were learning Math!

I was in team 2 consisting of:

Sandeep, an IIITB graduate who was going to join Haggle in a few months. He was sharp. The moment we gathered around a table to “brain-buzz”, he came up with this idea of building a recommendation system which would analyse the social media streams of users and figure out what sort of restaurants he preferred to eat out of.

Ashray, who was working with Haggle already. A strong and silent person, I would say. He was as keen as the rest of us on learning together.

Ashutosh, who is Sandeep’s junior at IIITB. He is awesome. When I was struggling to get the basics really strong, he took my pen and paper from me and taught me the reasoning from step 1 patiently, with examples and proper explanation. I hope to see more of him over the coming days.

Last but not the least, Fasil. I would define him as exuberant, but not the BSing kind. He was very outspoken but knew exactly well what he was speaking about. He was working on his own startup.

By the time we had discussed and modelled our recommendation system, it was time for presentations.

Oh, and I forgot to mention the drinks and biscuits that were there all along! No, no, seriously. What kind of a chump would I be if I did not mention this after eating 6 of those delicious cream biscuits right under the nose of my team mates while they were busy building the recommendation system!

Asya, Reya and Tania made sure we had the best atmosphere for thinking and solving the problems at hand. These are the times when I really see the importance of good community managers. They make other people’s lives easier. I never saw myself like that when I was at HasGeek I guess. I just hope others did at least.

It was time for the presentations and team 1 was the first one to go in front. They had build a model around detecting the person who was sarcastic. After analysing manually a few 100s of a person’s tweets and identifying the sarcastic ones in them, each person was assigned a probability of being sarcastic based on how many times he was sarcastic among his past 100 tweets.

This was done for more than a few users. After having built the prior data, when a new tweet came in, you could use the Bayesian reasoning to find out what was the probability of that tweet being sarcastic given it was from a particular user. They had a few numbers as well for demoing this.

Second one was us. Well, I have already explained what we did. The interesting point that Niranjan made was to use more than just words for our probability calculation. Because if we were to just look at words like “Pizza”, “Burger”, etc, then we would miss out on differences between sentences like “I hate pizza” and “I love pizza”.

Once ours was concluded, team 3 came in. They had a funny use case. I have learnt to take things in a lighter note and I hope people don’t jump around reading the use case. It was about the probability of a girl going out with you given the fact that she smiled at you. As funny as this was, for a few of them to think of something like this, would mean that the social media that we have today would have already gone miles ahead in terms of taking advantage of  us on similar terms. It was scary.

Niranjan came up to conclude the presentations. This is where he asked us to spread the word and help build the community. He left the rest of the afternoon as an open invitation to do anything sitting together or to move out.

They were taking memberships for the community and I “sold my soul”, as Asya put it. We hung out with each other for an hour or more, getting to know each other better.

I met Samarth, a smart lad who was a Hardware hacker by passion working at Infosys. His face was familiar and there was only one question that I could ask him about it. “Were you there at any HasGeek events?”. Yep, he was there for Droidcon 2013.

Then there was Vamsee, who was a kindred soul when it came to people calling him “Vamshee” adding that all-too-horrible “h” right in the middle! We shared our grief with each other over how inconsiderate people were towards our feelings.

Then there was Ashutosh, Jha (because I really can’t remember his other part of the name), Fasil, Prateek, who asked me, “Hey, aren’t you that guy who wrote that Eventifier blog post? That was amazing”. I was so happy! Jon from Minsh was there. It was good to meet him after such a long time. He was the first few geeks whom I interacted with as soon as I had joined HasGeek. Definitely a part of what made me grow.

We shook hands and were about to leave when I met this unassuming young fellow at the stairs.

“Hey, don’t I know you?”

“I am Rishab. Umm.. Do you know me?”

I unleashed my secret weapon once again.

“Were you at any HasGeek event before?”

“Oh… Were you at MetaRefresh 2013?”

“Yeah, I was a part of the organizing committee”

“Okay. Maybe you heard about that guy who gave a talk on CSSDeck?”

“Oooh! It was you! Now I remember… Cool man”

So that was him. He had generated a whole lot of buzz with his flash talk at that conference. He said he was working on his own startup now. I bid him goodbye and was on my way.

Now I have an excuse to learn Math. I hope these folks keep at it. It was amazing.

A short review on “On Writing” by Stephen King

When Krace first called me asking, “Hey, what is your take on Stephen King novels?”, I was pretentious.

“Although I have not read any of his works, I must say I have heard not-so-good reviews about them”.

I’m sure God won’t forgive me for saying that. I thought he wanted to buy a book for himself and hold true to his resolution of reading as much as possible. On my last day at HasGeek, I was surprised and happy when he gifted me with Stephen’s “On Writing”.

This was 7 months ago. That is how long my reading has suffered. Religiously, I would take this book with me every time I went home in the hope that I would read it from there. Alas, that never happened. During one of my visits, I was talking to my Dad about his childhood days in the hope of recreating the history of my village through a story. I got what I wanted from him and finished writing the first two paragraphs, setting the scene. My sister saw this unfinished piece of work lying on my bed. She read it. Being a voracious reader, she had already finished reading Stephen’s book. She came to me and said,

“I advice you to read ‘On Writing’ before you start off with this piece”.

I did not pay heed to that advice and hence the book remained unread and I did not make much progress with my story either.

The last time I remember when I could not put a book down and *had* to continue reading it was when I was going through Lord Of The Rings. That was during my second year of college and hence almost 3 years ago. Today morning is when I finally found that same spirit again.

The rooftop garden of my office is a lovely place. There is no wifi reception there to begin with. No distractions. It is high enough for one to consider the street sounds to be meager background noise. I reached there about 8 in the morning, made myself comfortable on one of the straw chairs, adjusted the cushion, pulled another chair to put my leg on, took out the book and started reading. I had finished almost 130 pages out of the 285 already.

The next three and half hours were magical. Stephen describes writing as telepathy. For the ignorant, that would sound like a cool word filled with philosophical mumbo jumbo. However, once you start seeing his study with the writing table, the unfinished manuscripts stacked neatly in the drawers, him sitting on the chair with door of the room closed, writing away, you realize you went back 16 years in time. Or maybe he came 16 years into the future.

The image of his life and his struggles become so vividly clear in your mind that there were moments when I really felt like I was with him during those days. Of course, if it was a writing describing his life story, that should be called a biography. This is not so. As he puts it himself, this book describes how a writer is formed. It maps to any budding author’s life. After all, no author had a red carpet laid in front him to be a part of the ministry of authors. They all struggled to get there.

Inspiration is overrated in the modern world. Many find that as an excuse. They wait as if it is someone else’s responsibility to get them inspired. Although websites like zenpencils and the kind are doing a pretty good job at it, the point you are missing is that inspiration is not the solution to your problem of creativity. It is only a part of the puzzle. Grit, determination and perseverance are those which will get you where you want to reach.

Coming back from the slight detour, even though there are more than a few moments in the book where you can imbibe tremendous inspiration from, Stephen gives absolutely practical advices so as to what makes a great writing and what constructs are the worst that you could use.

I quote, “The road to hell is paved with adverbs”.

Those words are going to hit you like a truck the next time you write ‘he said sarcastically’.

For the past few years, many people have told me to get rid of using the passive tone in my writing. Upon asking why that is, I never got a concrete explanation. Stephen just bursts forth with anger and criticisms against using the passive tone in your writing unnecessarily.

I quote, “It’s weak, it’s circuitous and it’s frequently torturous, as well. How about this: ‘My first kiss will always be recalled by me as how my romance with Shanya was begun’. Oh, man – who farted, right? A simpler way to express this idea – sweeter and more forceful, as well – might be this: ‘My romance with Shanya began with our first kiss. I’ll never forget it’. I’m not in love with this because it uses ‘with’ twice in four words, but at least we’re out of that awful passive voice.”

He goes on dissecting the construct even further and giving us more reasons to hate and crucify the passive tone. It is for the timid writers, he says. For those are afraid they are not able to convey what they want to. I was always afraid and I still am. However, I am happy that I wrote the last sentence instead of ‘I have always been afraid’.

I would not recommend this book to those who are interested in factual writing. As much as it offers practical advises like the ones above, it revolves around imagination and creative writing. Taking in things from around you and converting them into what you want them to be. Digging for the fossils, as he puts it. Be patient, be careful. Dig slowly and you will get the entire piece unscathed.

One thing becomes absolutely clear from this book. If you have a passion for writing, the only thing that is stopping you from doing it is your lame excuses. You will want to believe they are real excuses so that you can convince people. But they are lame. You are just lazy to write. You would rather be entertained and wait for your all important inspiration to shower upon you. Get rid of the fear. Don’t be afraid. Just write.

Thank you Krace for gifting me this book. You knew what I wanted more than I knew it myself.