before we embark on upgrading to async, and all the refactoring that
this will bring on us (see #797 & #799), we should keep our `main`
branch as stable and current as possible.
Let's start by upgrading rocket and its dependencies.
* [REFACTORING]Rename whitespace_tokenizer to tag_tokenizer for
registration
Name representing its purpose is preferred.
* Add lindera-tantivy to plume-model's dependencies
* Install lindera-tantivy
* Add SearchTokenizerConfig struct
* Add search tokenizers to config option
* Use CONFIG for tokenizers
* Use enum to hold tokenizer config instead of initializing on config phase
* Use guard instead of duplicate default values
* Use as_deref() instead of guard
* Move SearchTokenizer from plume-models to plume-models::search::tokenizer
* Rename SearchTokenizer to TokenizerKind
* Define SearchTokenierConfig::determine_tokenizer()
* Use determine_tokenizer in SearchTokenizerConfig::init()
* Pass tokenizer config to Searcher methods
* Add LowerCase filter to Lindera tokenizer
* Add test for Lindera tokenizer
* Define SEARCH_LANG env to specify tokenizers set
* Run cargo fmt
* Make Lindera tokenizer optional
* Fix typos
* Upgrade Tantivy to 0.12.0
* Follow Tantivy Tokenizer's new type definition
* Wrap tokenizers with TextAnalyzer to use filter methods
* Replace async IndexWriter::garbage_collect_files with sync functions
* Update Cargo.toml
Per the issue, "runtime-fmt uses perma-unstable rust APIs and is
therefore susceptible to breakage".
This replaces the two calls to rt_format! with .replace() and drops the
dependency.
Fixes#769
* Add test for Persian language hashtags
See https://github.com/Plume-org/Plume/issues/757
* Add regex-syntax with unicode-perl feature to dependencies
* Install regex-syntax
* Allow hashtag to use Unicode word characters
* Run cargo fmt