When I’ve talked to people who’ve attempted to make meme tools, they say that search is a really hard problem. This sort of makes sense: one person might search “communism”, another “bugs bunny”, and another “we” trying to get this image template:
I was thinking about it today, though, and you know what? All of those people who’ve actually tried to build this are wrong. This is a super easy problem.
How this should work:
- A user searches for a meme and doesn’t find it. Keep a record of the query (say it’s “communism”).
- The same user uploads a meme. Now there is a strong possibility that the query the user did is a good match for the uploaded image, so associate that image with “communism” and give that pair a score of 1.
- Now diff that image to others in the database using image recognition to find “equivalent” images. For gifs, I’m not too familiar with gif’s file format, but I assume something could be done generating frames of image. (There are a lot of assumptions here.)
- Now associate that query with all “equivalent” images, plus the new image. Then take all of the query terms associated with the existing images and add them to the new image.
Next time someone searches for “communism”, show the meme template uploaded above. If they choose that template, increase the
(template, "communism") pair score. Whenever someone searches, show them a mix of high-scoring templates for their search term, plus some prospective templates that are still “young.”
In the example above, I assume the user is trustworthy. There’s also a strong possibility that the user is a bot/malicious actor/both. So that users rep should be tied to whether others use that prospective template/query pair, and that feeds back into how much a user can affect a template’s score.
Since memes change over time, you probably also want to overlay some decay function, so if you search for “drake” you get the latest templates, not ones from years ago.
Now, assuming you have some users, you can set up a “self-labeling” system.