Did you know that Redis can be used as a proper full database to store and query JSON efficiently?
Redis has full support for JSON storage, it can even store vectors for semantic searching with machine learning.
In this article - I will cover some of the core concepts you will need to know, to supercharge your regular old Redis cache servers into efficient databases for storing and searching JSON.
Why Redis?
Both SOLR and Elastic Search are powerful and efficient DB stores for search-related tasks, however, both are JAVA-based and will naturally consume more memory, and be a tad bit harder to deploy and maintain.
Redis on the other hand is fairly lightweight, and much easier to install and manage too. Furthermore, you still get the same performance benefits as you would using Redis as a cache store.
Setting up
The Redis you install via your package manager is in-memory only by default. Depending on your settings - Redis is going to delete old data to make way for new entries so that it efficiently manages memory usage.
This behavior works well for caching, however, is not going to work for our use case, as we want to persist this data just like with MySQL or any other DB store.
To do so, you will need to install extra modules for Redis. You can find installation documentation for your operating system here.
Building an index
Redis has various index types in which you can store data. For this article - we will focus on the JSON storage type because this will allow us to store more complex data than you could in other types such as HASHs.
Redis provides an ORM package we can use to create our index, load data, and query the index. To get started - you will need to install a few pip packages:
# At the time of this article - the default package
# - had issues in my setup on Python 3.9
# - so I'm forcing 4.6.0
pip install redis==4.6.0
pip install redis-om
# This will come in handy later to load data.
pip install faker
Next, let's create a models.py
from redis_om import JsonModel, Field,Migrator,get_redis_connection
class Book(JsonModel):
book_title: str = Field(full_text_search=True)
description: str = Field()
isbn: str = Field(index=True)
price: float = Field(index=True)
category: str = Field(index=True)
def migrate():
Migrator().run()
class Meta:
index_name = "books"
database = get_redis_connection(port=6379, host="127.0.0.1")
If you ever worked with Django - this will look very familiar. The above class essentially maps to an index in Redis, similar to a database table in MySQL.
We provide a data type for each field "str", "float" etc... and tell the model which fields should contain an index in our schema so that they can be optimized for querying.
full_text_search - as the name would suggest, simply allows us to efficiently perform "LIKE" queries on this field.
The Meta declaration is optional. Both database and index_name will default to localhost and an auto-generated index name respectively.
Loading data into our index
To load data - you simply have to create an instance of the "Book" model and call the "save" method.
To keep this code clean and readable - I omitted error handling, however, in a real-world scenario, wrap the instantiation and save it inside a try-catch block. The model will throw an exception if there's bad data or the save fails.
from faker import Faker
from models import Book
# You only need to run this whenever there's
# - changes to the model
Book.migrate()
faker = Faker()
books = []
for i in range(100):
book = Book(
book_title=faker.sentence(),
description=faker.text(),
price=faker.random_int(39, 200),
isbn=faker.uuid4(),
category=faker.word()
)
book.save()
Once this script finishes, you should now have 100 rows of data in Redis.
I use a handy desktop client to view data in the Redis backend - it's called "Redis Insight"
Searching the index
Redis-om provides a very Pythonic way of searching through your data, thus no need to mess around with using the standard weird Redis syntax - you would normally use in the Redis CLI client.
Here are some common search queries:
from models import Book
# Where book_title LIKE billions
for b in Book.find(Book.book_title % "billion"):
print(b.book_title)
print (">>>>>>>>")
for b in Book.find(Book.price > 100):
print(f"Book {b.book_title}, Price: {b.price}")
print (">>>>>>>>")
for b in Book.find(Book.category == "management"):
print(f"Book {b.book_title}, Price: {b.price}")
print (">>>>>>>>")
print(
Book.get(pk="01HG2QBM88936CYTEXHBQJX83J").book_title
)
print (">>>>>>>>")
for b in Book.find(
((Book.price > 110) & (Book.price < 150))
| (Book.price == 84)
):
print(f"Book {b.book_title}, Price: {b.price}")
Conclusion
Redis is by far the most efficient cache database around in my opinion. Using Redis stack server - we can now take advantage of that raw performance, and go beyond just caching to improve the snappiness of our apps.
Relational databases such as MySQL scale well and offer the best data integrity for most use cases. Complex querying with Joins, Subqueries, and so forth, is much easier to work with in relational databases, some complex queries have no equivalent in Redis. Essentially, you should use relational databases as your primary application database, especially for deep relational data.
When it comes to full-text searching though, you have to scan millions of rows to find a match, this is where relational databases tend to fall short. Since Redis is primarily memory-based and has efficient data-scanning algorithms - it can search and retrieve data much faster with less strain on your hardware.
Ultimately, I would use a combination of MySQL and Redis, MySQL for the primary application data. Redis for caching, searching, and maybe even log data or time series data.