Replace Lunr with tinysearch
Recently, I’ve been exploring WebAssembly (WASM) and Rust. However, as a web engineer, I often wonder how to utilize WASM effectively.
One day, while riding my bike to the convenience store, I had an idea: why not use WASM as a front-end full-text search engine?
My blog is built as a JAMstack site, hosted for free on Netlify. Since I don’t want to pay for server instances, I needed a search engine that works entirely on the frontend.
Previously, I was using Lunr.js for this purpose. However, Lunr.js has known issues with Japanese, as mentioned in the following link: https://github.com/MihaiValentin/lunr-languages/issues/45
(Note: 1There appears to be a workaround now.)
So, I decided to replace Lunr with a WASM-based search engine. 2That’s when I 3came across tinysearch, which is written in Rust.
Unfortunately, Tinysearch has the same issue as Lunr. It tokenizes words based on spaces, which works for languages like English, but Japanese sentences aren’t separated by spaces.
I found a solution here: https://qiita.com/tamurahey/items/8db20ae0b1931c96d54a
Since the patch I found was based on an older version of tinysearch, I forked the project and created a “japanese” branch with 4the modifications.
I initially tried building the project in Docker, but ran into the following warning when attempting to generate the index:
<jemalloc>: MADV_DONTNEED does not work (memset will be used instead)
<jemalloc>: (This is the expected behaviour if you are running under QEMU)
This could be related to the CPU architecture since I’m using an M1 Mac. In the end, I gave up on Docker and used Tinysearch as a Git submodule instead.
I also encounterd this error.
error: failed to run custom build command for `rust_icu_sys v5.0.0`
I resolved it by setting the environment variable like this:
export PkG_CONFIG_PATH=/opt/homebrew/opt/icu4u/lib/pkgconfig
Next, I created a workflow that generates the index file from the RSS feed before deploying it to Netlify using GitHub Actions.
...
- name: Build wasm
run: |
yq -oj ../_site/feed.xml \
| jq '.feed.entry | map({title: .title."+content", url: .link."+@href", body: .content."+content"})' \
| sed -e 's/<[^>]*>//g' \
| sed -e 's/\\n//g' \
| cargo run --features=bin /dev/stdin
working-directory: tinysearch
...
This workflow generates the index file in the wasm_output
directory. The package.json
file 5then loads the index like this:
...
"dependencies": {
"jquery": "^3.4.1",
"tinysearch": "./tinysearch/wasm_output",
...
You can check out the full implementation in my pull request on GitHub. Since my blog is open-source:
https://github.com/rikuson/rikuson/pull/119