Bag of Words in the Browser
March 15, 2025
This is a very simple semantic search. It creates embeddings of the text on the page. Then when the user searches, it creates an embedding of the query and filters the results based on the degree of similarity.
This uses a very simple "bag-of-words" embedding model, which means that the search just looks for the records where the inputted word(s) are present. If the word is present many times over, the similarity score will be higher.
You'll notice that two records might have the inputted word the same number of times, but different similarity scores nonetheless. This is because the embedding also encodes how much "weight" the word has in the record. If another word features very frequently in a given record, the relative weight of the inputted word will be lower.
No results found.