A “Spider” is a computer program. The Spider goes to different websites, explores, finds new things, and makes notes. When finished, it’s notes are organised in to files describing everything it has seen.
The next step is “Word Embedding”. A complicated topic! Word embedding creates a map, with coordinates, and puts each word on the map in it’s own location. Artificial Intelligence research from written language translation websites is used. English words that have a similar meaning, are grouped in to close areas on the map. This is important! Similar ideas are in similar areas in the map.
When you search for a word, your computer does this:
- Download a map of all the signs the spider found, and what kind of meaning they have
- Look at the Word Embedding map, and find the word you searched for, remembering the location
- Check every sign, and see how close it’s meaning is, to the word you entered, by how far away they are on the map
- Make a list, organised so the closest signs are at the top
- Download small videos and descriptions, and create the webpage, with results listed down the page
Privacy and Tracking
We use a program called Fathom to see how many people use this website, and to find out what website they came from if they clicked a link to come here. When you do a search, we don’t find out what words you typed in to the search box. We might be able to guess what kind of thing you were looking at, by seeing which search results your computer downloaded, but we don’t know exactly what you wrote. We don’t send your information to Google or Facebook at all, but if you come here using one of their apps like Google Chrome or Facebook App, they might be able to track you while you use their app.
Open Source and Dataset
Find Sign is open source (mostly with Unlicense software license). You can get the source code at GitHub.
Find Sign indexes copyright data, so the dataset is not open source, but it is intended to be open access for non-commercial culturally appropriate use. For now, BitTorrent and Hypercore access has been disabled due to performance issues and barely any usage. Slow progress is being made refactoring the site to use simple yaml files to store all the interesting data. A service is planned to also offer JSON, XML, and CBOR translations of these YAML files. When this is finished, the data will be available over a simple http interface. For now, if you want access send me (@Bluebie on Github or Twitter) a message. Maybe we can sort something temporary out.
If you want to apply this searching technology to another sign language, you'll need to capture the data for your search index, by building some kind of spider program. The easiest way to make this work is to output either YAML or JSON in the search-data format.