Sonic the search engine
For reasons that should be abundantly clear, I’ve been poking at alternatives to Elasticsearch. I’m living in a mostly Rust-based ecosystem right now working on Vector, so I started looking within that world. I found Sonic and decided to give it a whirl.
Sonic is a “fast, lightweight, and schema-less search backend.” It’s written in Rust, licensed under MPL 2.0. It’s maintained by Valerian Saliou, who is one of the founders of Crisp.
Sonic is not Elasticsearch: it’s a lot lighter weight and much less fully-featured. Its focus is on normalizing natural language search queries and providing results. Also, unlike Elasticsearch, Sonic is an identifier index rather than a document index. Queries return IDs, which can then find matching documents in an external database. Search terms are stored in collections and organized in buckets; you can use buckets to segregate your data into separate indexes, for example, a bucket per user or the like.
Another difference worth mentioning is that Sonic indexes at the word level and not at the sentence level. This approach makes for fast and compact storage. It’s worth taking a look at Sonic’s benchmarks to see just how fast. And reviewing Sonic’s limitations to understand the trade-off you’re making to achieve those results.
It’s also important to note that Sonic runs on a single node and lacks fault tolerance capabilities like clustering/replication. Although lightweight, Sonic’s single node nature is likely to hit hardware scaling limits at some point.
Installing and Configuring Sonic
Let’s see Sonic in action. We’re going to run Sonic, add some data to it, and then query that data. The fastest way to do this is to run Sonic from its Docker image. All we need to do this is Docker installed, some quick scaffolding, and a sample configuration file.
Let’s create a directory to hold our Sonic test instance and data and change into that directory.
mkdir -p ~/sonic-searchcd ~/sonic-search
Now we’re going to grab the sample configuration file.
wget https://raw.githubusercontent.com/valeriansaliou/sonic/master/config.cfg
Inside the file, you’ll find a default configuration for Sonic. We’re going to change a few things to make it work for our demo. Firstly, by default, Sonic binds to localhost on port 1491. To work inside a Docker container, we need to bind it to all interfaces. To do this, find this line in the config.cfg
file.
[channel]inet = "[::1]:1491"
And change it to:
[channel]inet = "0.0.0.0:1491"
Next, we want to tell Sonic where to store its indexes. Let’s create some local directories for that now.
mkdir -p ./store/fst/ ./store/kv/
The kv
directory contains the Key-Value index, and the fst
directory contains a word graph of the data inside Sonic. We’ll be mounting these directories as volumes inside our Docker container, and we need to update our configuration to reference them. Find the following two lines inside config.cfg
and update them:
[store.kv]path = "./data/store/kv/"
And:
[store.fst]path = "/var/lib/sonic/store/fst/"
Lastly, let’s up Sonic’s logging to get some more feedback from it. To do this, change the log_level
option to:
[server]log_level = "debug"
All other defaults can stay the same.
Now let’s run Sonic.
docker run -p 1491:1491 -v ~/sonic-search/config.cfg:/etc/sonic.cfg -v ~/sonic-search/store/:/var/lib/sonic/store/ valeriansaliou/sonic:v1.3.0
We’ve mapped port 1491 outside of the container, mounted our configuration file, and store
directories into the container. We should see the Sonic server startup:
(INFO) - starting up(INFO) - started(DEBUG) - spawn managed thread: tasker(DEBUG) - spawn managed thread: channel(INFO) - tasker is now active(INFO) - listening on tcp://0.0.0.0:1491
And we can then telnet into port 1491 to see if the server responds.
telnet localhost 1491 (255) (18h 21m 56s 474ms) ┃Trying ::1...Connected to localhost.Escape character is '^]'.CONNECTED <sonic-server v1.3.0>
And hey presto, we’re up and running. It’s not very exciting without adding some data, so let’s generate some.
Testing Sonic
Sonic comes with a collection of official libraries and community-submitted libraries for languages and frameworks. As it’s Sunday and I am feeling particularly lazy, I will write two quick Ruby scripts: one to send data to Sonic for ingestion and a second to search it. These will both use the Ruby client for Sonic.
Let’s create a new directory to hold our test scripts:
mkdir -p ~/sonic-search/stest/cd ~/sonic-search/stest/
Now we’ll start our scripts with a Gemfile
:
source 'https://rubygems.org'gem 'sonic-ruby'gem 'faker'
And use Bundler to install the sonic-ruby
and the faker
gem we’ll be using to generate some sample data.
bundle install
Ingesting data
Now let’s write a quick script to ingest some sample data. We’ll call it ingest.rb
touch ~/sonic-search/stest/ingest.rb
And populate it like so:
require 'sonic-ruby'require 'faker'
# Connect to the Sonic server on localhost:1491client = Sonic::Client.new('localhost', 1491, 'SecretPassword')
# Connect to the ingest channelingest = client.channel(:ingest)
# Add data10000.times.map { Faker::Name.name }.each_with_index do |name, index| ingest.push('users', 'all', index, name)end
Here we’re using Faker to generate an array of 10,000 names and pushing them into a collection called users
and into a bucket called all
. We’ll see a flurry of activity from the Sonic server as it indexes all incoming data.
Searching data
We can then write another script to query this data.
touch ~/sonic-search/stest/search.rb
And populate it like so:
require 'sonic-ruby'
if ARGV.length != 1 puts "Too many names ... or not enough name?" exitelse name = ARGV[0]end
# Connect to the Sonic server on localhost:1491client = Sonic::Client.new('localhost', 1491, 'SecretPassword')
# Connect to the search channelsearch = client.channel(:search)
# Search for a matching name and return IDputs "Matching IDs: " + search.query('users', 'all', name)
# Search for suggested matches and return suggested nameputs "Matching suggestions: " + search.suggest('users', 'all', name)
Our script takes a single name as an input and performs two operations. The first is a straight search of the users
collection in the all
bucket. If it matches one or more index IDs, it’ll return them on the command line. The second search returns one or more suggested names. Let’s give it a try now:
$ ruby sonic_search.rb kateMatching IDs: 8384 684 79 9886 1514 9538 6445Matching suggestions: kate katelin katelyn katelynn katerine...$ ruby sonic_search.rb jimMatching IDs: 9087 6783 6074 674 9777 9435 8161 6879 5926 2499Matching suggestions: jim jimmie jimm
We can see Sonic has returned some matching IDs for kate
and jim
and some suggested variants.
I think this example shows Sonic’s simplicity and power and how easy it is to wire into a search box and gain suggestions and corrections. I can see use cases in the middle-ground between the search needs of folks who would previously have defaulted to using Elasticsearch and what Sonic provides. Naturally, Sonic’s single node nature, the lack of fault tolerance, and the potential scaling challenges may be an issue for many folks. However, I still think it’s a cool project and worth a look.