The Forensic Files API, Part 4
Whodunit
April 12, 2020As I was working through each step on my journey to justice, I realized I was writing a lot of the same methods for each asset. Admittedly, I'm writing this post now after having worked through the speech recognition, transcript generation, and natural language processing steps. I plan on writing posts for each step, but I figured it'd be wise to deviate from that for a second because I started getting better at writing Go code (at least I think I did).
In this post, I'm going to walk through a new internal package called whodunit
, which provides
methods for the files in the /assets
directory. In keeping with tradition, I used the title of
season 6, episode 12: Whodunit?
Here's the synopsis of that episode, taken from the Forensic Files Wiki:
In 1998, an evening out at an Easton, Maryland murder mystery theatre performance turns into a real life whodunit when the badly burned body of Stephen Hricko is discovered in his hotel room after a fire.
Upon initial investigation, it appeared to be an accidental fire. Lies, greed, and medical trickery can't match the skills of forensic scientists, who bring the curtain down on the real killer, his wife Kimberly Hricko.
Let's get crackin'!
Course of Action
I didn't deliberately plan whodunit
, it kind of came to fruition organically. I once read
somewhere about an architect that wouldn't put down sidewalks. They would just plant grass, then
come back a few months later and put sidewalks down where the paths were worn. I tried looking up
whom that architect was, but couldn't get a straight answer from the internet. The whodunit
package came about in a similar way.
I was working with files in the /assets
directory and noticed that I was duplicating a lot of
code. Regardless of if I was downloading a video, converting it to audio, getting a recognition
from a speech-to-text service, and so on, I was writing the same methods. After I started
moving to the idiomatic Go way of adding methods to structs and seeing how it provided some
nicer encapsulation/prevented function naming collisions in packages, I wanted to utilize the
technique for code reuse.
The base entity for any file in the /assets
directory is an Episode. Regardless of the
asset type, it's named with the same convention and is grouped into a corresponding Season.
With that in mind, I created a new directory: /internal/whodunit
with 3 files: whodunit.go
,
episode.go
, and season.go
. The whodunit.go
file contains constants and methods used for
correlating an episode with an asset type. Here's what some of that file looks like with the
comments removed:
package whodunit
import (
"errors"
"log"
"os"
"path/filepath"
)
type AssetStatus int
const (
AssetStatusAny AssetStatus = iota
AssetStatusPending
AssetStatusInProcess
AssetStatusComplete
AssetStatusMissing
)
type AssetType int
const (
AssetTypeAnalysis AssetType = iota
AssetTypeAudio
AssetTypeRecognition
AssetTypeTranscript
AssetTypeVideo
)
var AssetsDirPath = assetsDirPath()
func (at AssetType) DirPath() string {
switch at {
case AssetTypeAnalysis:
return filepath.Join(AssetsDirPath, "analyses")
case AssetTypeAudio:
return filepath.Join(AssetsDirPath, "audio")
case AssetTypeRecognition:
return filepath.Join(AssetsDirPath, "recognitions")
case AssetTypeTranscript:
return filepath.Join(AssetsDirPath, "transcripts")
case AssetTypeVideo:
return filepath.Join(AssetsDirPath, "videos")
default:
return ""
}
}
func (at AssetType) FileExt() string {
switch at {
case AssetTypeAnalysis:
return ".json"
case AssetTypeAudio:
return ".mp3"
case AssetTypeRecognition:
return ".json"
case AssetTypeTranscript:
return ".txt"
case AssetTypeVideo:
return ".mp4"
default:
return ""
}
}
func assetsDirPath() string {
pwd, err := os.Getwd()
if err != nil {
log.Fatal("Error getting pwd")
}
return filepath.Join(pwd, "assets")
}
You may notice that I moved the AssetsDirPath
out of crimeseen
. I also got rid of the global
directory path variables from crimeseen
and replaced it with a DirPath()
method on AssetType
.
Since each asset will have an AssetType
associated with it, getting the corresponding path
in /assets
along with the file extension for that asset type is much simpler.
I ended up writing a script to extract the season number, episode number, title, and URL out
of the youtube-links.json
file into an episodes.json
file in the root /assets
directory.
This made it easier to get all the episodes in a season without having to walk a file directory
and parse each file. I added the code which reads that JSON file to the season.go
file shown below.
package whodunit
import (
"encoding/json"
"fmt"
"io/ioutil"
"os"
"path/filepath"
"sort"
"github.com/mikerourke/forensic-files-api/internal/crimeseen"
)
type Season struct {
SeasonNumber int
episodeMap map[int]*Episode
}
const SeasonCount = 14
func NewSeason(seasonNumber int) *Season {
// Return a new Season instance associated with the season number...
}
func (s *Season) PopulateEpisodes() error {
// Loop through episodes.json and populate episodeMap for season...
}
func (s *Season) EpisodeCount() int {
// Return count of episodes in a season...
}
func (s *Season) AllEpisodes() []*Episode {
// Return slice of Episodes within season...
}
func (s *Season) Episode(episodeNumber int) *Episode {
// Return the episode in the season associated with the specified episode number...
}
func (s *Season) EnsureDir(assetType AssetType) error {
// Ensure the `season-x` directory exists for the specified asset...
}
func (s *Season) AssetDirPath(assetType AssetType) string {
// Return the absolute path to the `season-x` directory in `/assets`...
}
func (s *Season) DirName() string {
// Return the directory name for the season (e.g. "season-1")...
}
The code for this file should be pretty self-explanatory. There are some convenience methods
for dealing with the season directories, but most functionality revolves around episodes.
I can instantiate a new season (calling NewSeason()
), populate the episodes in that season
from the episodes.json
file, and return Episode
instances for each one. Getting new episodes
is done by calling season.Episode(1)
, instead of instantiating an episode directly.
The meat and potatoes of whodunit
resides in the episodes.go
, shown below.
package whodunit
import (
"fmt"
"path/filepath"
"strconv"
"strings"
"github.com/mikerourke/forensic-files-api/internal/crimeseen"
)
type Episode struct {
SeasonNumber int `json:"season"`
EpisodeNumber int `json:"episode"`
Title string `json:"title"`
URL string `json:"url"`
assetStatus AssetStatus
season *Season
}
func newEpisode(
season *Season,
episodeNumber int,
title string,
url string,
) *Episode {
// Return a new episode instance. This isn't public because it's only called from Season...
}
func NewEpisodeFromName(name string) (*Episode, error) {
// Return a new Episode instance from the specified name (e.g. "01-11-outbreak")...
}
func (e *Episode) DisplayTitle() string {
// Return a nice title, so "01-02-the-magic-bullet" returns "The Magic Bullet"...
}
func (e *Episode) AssetExists(assetType AssetType) bool {
// Return true if the asset file exists...
}
func (e *Episode) AssetFilePath(assetType AssetType) string {
// Return the absolute file path to the associated asset type...
}
func (e *Episode) AssetFileName(assetType AssetType) string {
// Return the file name for the associated asset type...
}
func (e *Episode) SetAssetStatus(status AssetStatus) {
// Return the status of the asset...
}
func (e *Episode) AssetStatus(assetType AssetType) AssetStatus {
// Return the status of the asset (is it Pending? Complete?, etc).
}
func (e *Episode) Name() string {
// Return name of the episode (e.g. "01-02-the-magic-bullet")...
}
The NewEpisodeFromName()
method is a convenience method for getting an Episode
instance from
parsing the name. For example, I can call NewEpisodeFromName("01-11-outbreak")
instead of
calling NewSeason(1)
, then season.Episode(1)
. It's used primarily for quick debugging.
With this in place, it makes it much easier to set up an interface for each asset, knowing
reliably which methods are available.
Here's what the new audio.go
file in the visibilityzero
package looks like using
the new Episode
implementation:
package visibilityzero
import (
"os"
"strings"
"time"
"github.com/mikerourke/forensic-files-api/internal/crimeseen"
"github.com/mikerourke/forensic-files-api/internal/videodiary"
"github.com/mikerourke/forensic-files-api/internal/whodunit"
"github.com/sirupsen/logrus"
)
type Audio struct {
*whodunit.Episode
}
func NewAudio(ep *whodunit.Episode) *Audio {
return &Audio{
Episode: ep,
}
}
func (a *Audio) Extract(isPaused bool) {
// Extract video file here...
}
func (a *Audio) Open() *os.File {
// Open audio file and return contents here...
}
func (a *Audio) Exists() bool {
return a.AssetExists(whodunit.AssetTypeAudio)
}
func (a *Audio) FilePath() string {
return a.AssetFilePath(whodunit.AssetTypeAudio)
}
func (a *Audio) FileName() string {
return a.AssetFileName(whodunit.AssetTypeAudio)
}
I'll be using the whodunit
package quite a bit in upcoming posts.
You can view the source on GitHub.
I should probably also note that most of my examples from earlier posts have been completely rewritten, but the functionality is the same.
Now that you know what's going on behind the scenes, I can move on to how I used a speech-to-text service to get the episode transcripts. Hooray!