A “Spider” is a computer program. The Spider goes to different websites, explores, finds new things, and makes notes. When finished, it’s notes are organised in to files describing everything it has seen.
The next step is “Word Embedding”. A complicated topic! Word embedding creates a map, with coordinates, and puts each word on the map in it’s own location. Artificial Intelligence research from written language translation websites is used. English words that have a similar meaning, are grouped in to close areas on the map. This is important! Similar ideas are in similar areas in the map.
When you search for a word, your computer does this:
- Download a map of all the signs the spider found, and what kind of meaning they have
- Look at the Word Embedding map, and find the word you searched for, remembering the location
- Check every sign, and see how close it’s meaning is, to the word you entered, by how far away they are on the map
- Make a list, organised so the closest signs are at the top
- Download small videos and descriptions, and create the webpage, with results listed down the page
Privacy and Tracking
Find Sign never records what you search for, or who you are. Your usage of Find Sign is not transmitted to any privacy invading businesses like Google or Facebook. We do record how often the site is used, and you can view this information on our analytics page, but this data never records any information about the computers or people accessing the website, just the number of searches and other types of queries run. If you’re concerned about privacy, we recommend you never access Find Sign using a web browser built in to an app run by a privacy invading company. Always use apps like Safari or Firefox, instead of apps like Facebook, Instagram, Tiktok, or Chrome to run private searches. For more information about this risk, read the excellent article “iOS Privacy: Instagram and Facebook can track anything you do on any website in their in-app browser” ” by Felix Krause.
Open Source and Dataset
Find Sign is open source (mostly with Unlicense software license). You can get the source code at GitHub.
Find Sign indexes copyright data, so the dataset is not open source, but it is intended to be open access for non-commercial culturally appropriate use. You can explore the data Find Sign uses on our dataset website.
If you want to apply this searching technology to another sign language, you'll need to capture the data for your search index, by building some kind of spider program. If you can output the data in the same search-data json format this Find Sign instance uses, it'll be easy to setup your own version.
If you have quality Auslan vocab or phrase examples, that you would like to see featured in Find Sign, you can reach out to us for help getting your data in to Find Sign via email, iMessage, or Facetime.