The Forensic Files API, Part 1
It Begins
February 4, 2020Forensic Files is coming back, y'all!
I cannot describe how happy that makes me. The return of one of my favorite shows was just the boost I needed to finally get a project off the ground that's been sitting in my subconscious for the last few months. It's safe to say I've probably seen every episode of this show (more than once). I can recall several episodes in which a detective or forensic specialist solves the case with something he learned from a Forensic Files episode. That is meta!
My secret dream is to get props for helping an investigation on the show. The odds of that happening are astronomically low, so I opted to cook up a project that will help me learn some new technologies, hone my existing skills, and find a way to combine my love of development with my love of justice.
I'm going to build a tool that will allow users to query episodes based on keywords. So if the user types in "algae", they would see a list of other related keywords, like "Diatom". That would lead them to season 7, episode 3 ("Reel Danger"), in which investigators were able to place a suspect at the scene of a crime with Diatom evidence.
I still have no idea how I'm going to do this or exactly what technologies I'm going to use. But I have a pretty good idea how to start. Right now, I know I need to do the following things:
- Download all the episodes from YouTube
- Extract the audio from downloaded episodes
- Send audio off to a speech-to-text service to get transcriptions
I still have the following questions:
- What tools do I use to extract keywords/terms from the transcriptions?
- What kind of database should I use?
- How do I load the data in?
- How do I query said database?
This whole thing may get scrapped, but I equate it to NASA projects. Getting a rover to Mars was a phenomenal achievement, but if the mission wasn't successful, they still developed a bunch of new technologies that affect our lives in profound ways. I doubt this will affect anyone's life too profoundly, but I'll learn a bunch of stuff I've been wanting to learn in the best way I know how: by using it to solve a problem I'm interested in solving.
I'll be documenting what I've learned in subsequent blog posts. You can track my progress and see what
I've built at the forensic-files-api
repository on GitHub. Stay tuned!