The Forensic Files API, Part 4

Whodunit

April 12, 2020

Orange text that says Forensic Files and Whodunit

As I was working through each step on my journey to justice, I realized I was writing a lot of the same methods for each asset. Admittedly, I'm writing this post now after having worked through the speech recognition, transcript generation, and natural language processing steps. I plan on writing posts for each step, but I figured it'd be wise to deviate from that for a second because I started getting better at writing Go code (at least I think I did).

In this post, I'm going to walk through a new internal package called whodunit, which provides methods for the files in the /assets directory. In keeping with tradition, I used the title of season 6, episode 12: Whodunit? Here's the synopsis of that episode, taken from the Forensic Files Wiki:

In 1998, an evening out at an Easton, Maryland murder mystery theatre performance turns into a real life whodunit when the badly burned body of Stephen Hricko is discovered in his hotel room after a fire.

Upon initial investigation, it appeared to be an accidental fire. Lies, greed, and medical trickery can't match the skills of forensic scientists, who bring the curtain down on the real killer, his wife Kimberly Hricko.

Let's get crackin'!

Course of Action

I didn't deliberately plan whodunit, it kind of came to fruition organically. I once read somewhere about an architect that wouldn't put down sidewalks. They would just plant grass, then come back a few months later and put sidewalks down where the paths were worn. I tried looking up whom that architect was, but couldn't get a straight answer from the internet. The whodunit package came about in a similar way.

I was working with files in the /assets directory and noticed that I was duplicating a lot of code. Regardless of if I was downloading a video, converting it to audio, getting a recognition from a speech-to-text service, and so on, I was writing the same methods. After I started moving to the idiomatic Go way of adding methods to structs and seeing how it provided some nicer encapsulation/prevented function naming collisions in packages, I wanted to utilize the technique for code reuse.

The base entity for any file in the /assets directory is an Episode. Regardless of the asset type, it's named with the same convention and is grouped into a corresponding Season. With that in mind, I created a new directory: /internal/whodunit with 3 files: whodunit.go, episode.go, and season.go. The whodunit.go file contains constants and methods used for correlating an episode with an asset type. Here's what some of that file looks like with the comments removed:

package whodunit

import (
    "errors"
    "log"
    "os"
    "path/filepath"
)

type AssetStatus int

const (
    AssetStatusAny AssetStatus = iota
    AssetStatusPending
    AssetStatusInProcess
    AssetStatusComplete
    AssetStatusMissing
)

type AssetType int

const (
    AssetTypeAnalysis AssetType = iota
    AssetTypeAudio
    AssetTypeRecognition
    AssetTypeTranscript
    AssetTypeVideo
)

var AssetsDirPath = assetsDirPath()

func (at AssetType) DirPath() string {
    switch at {
    case AssetTypeAnalysis:
        return filepath.Join(AssetsDirPath, "analyses")
    case AssetTypeAudio:
        return filepath.Join(AssetsDirPath, "audio")
    case AssetTypeRecognition:
        return filepath.Join(AssetsDirPath, "recognitions")
    case AssetTypeTranscript:
        return filepath.Join(AssetsDirPath, "transcripts")
    case AssetTypeVideo:
        return filepath.Join(AssetsDirPath, "videos")
    default:
        return ""
    }
}

func (at AssetType) FileExt() string {
    switch at {
    case AssetTypeAnalysis:
        return ".json"
    case AssetTypeAudio:
        return ".mp3"
    case AssetTypeRecognition:
        return ".json"
    case AssetTypeTranscript:
        return ".txt"
    case AssetTypeVideo:
        return ".mp4"
    default:
        return ""
    }
}

func assetsDirPath() string {
    pwd, err := os.Getwd()
    if err != nil {
        log.Fatal("Error getting pwd")
    }
    return filepath.Join(pwd, "assets")
}

You may notice that I moved the AssetsDirPath out of crimeseen. I also got rid of the global directory path variables from crimeseen and replaced it with a DirPath() method on AssetType. Since each asset will have an AssetType associated with it, getting the corresponding path in /assets along with the file extension for that asset type is much simpler.

I ended up writing a script to extract the season number, episode number, title, and URL out of the youtube-links.json file into an episodes.json file in the root /assets directory. This made it easier to get all the episodes in a season without having to walk a file directory and parse each file. I added the code which reads that JSON file to the season.go file shown below.

package whodunit

import (
    "encoding/json"
    "fmt"
    "io/ioutil"
    "os"
    "path/filepath"
    "sort"

    "github.com/mikerourke/forensic-files-api/internal/crimeseen"
)

type Season struct {
    SeasonNumber int
    episodeMap   map[int]*Episode
}

const SeasonCount = 14

func NewSeason(seasonNumber int) *Season {
    // Return a new Season instance associated with the season number...
}

func (s *Season) PopulateEpisodes() error {
    // Loop through episodes.json and populate episodeMap for season...
}

func (s *Season) EpisodeCount() int {
    // Return count of episodes in a season...
}

func (s *Season) AllEpisodes() []*Episode {
    // Return slice of Episodes within season...
}

func (s *Season) Episode(episodeNumber int) *Episode {
    // Return the episode in the season associated with the specified episode number...
}

func (s *Season) EnsureDir(assetType AssetType) error {
    // Ensure the `season-x` directory exists for the specified asset...
}

func (s *Season) AssetDirPath(assetType AssetType) string {
    // Return the absolute path to the `season-x` directory in `/assets`...
}

func (s *Season) DirName() string {
    // Return the directory name for the season (e.g. "season-1")...
}

The code for this file should be pretty self-explanatory. There are some convenience methods for dealing with the season directories, but most functionality revolves around episodes. I can instantiate a new season (calling NewSeason()), populate the episodes in that season from the episodes.json file, and return Episode instances for each one. Getting new episodes is done by calling season.Episode(1), instead of instantiating an episode directly.

The meat and potatoes of whodunit resides in the episodes.go, shown below.

package whodunit

import (
    "fmt"
    "path/filepath"
    "strconv"
    "strings"

    "github.com/mikerourke/forensic-files-api/internal/crimeseen"
)

type Episode struct {
    SeasonNumber  int    `json:"season"`
    EpisodeNumber int    `json:"episode"`
    Title         string `json:"title"`
    URL           string `json:"url"`
    assetStatus   AssetStatus
    season        *Season
}

func newEpisode(
    season *Season,
    episodeNumber int,
    title string,
    url string,
) *Episode {
    // Return a new episode instance. This isn't public because it's only called from Season...
}

func NewEpisodeFromName(name string) (*Episode, error) {
    // Return a new Episode instance from the specified name (e.g. "01-11-outbreak")...
}

func (e *Episode) DisplayTitle() string {
    // Return a nice title, so "01-02-the-magic-bullet" returns "The Magic Bullet"...
}

func (e *Episode) AssetExists(assetType AssetType) bool {
    // Return true if the asset file exists...
}

func (e *Episode) AssetFilePath(assetType AssetType) string {
    // Return the absolute file path to the associated asset type...
}

func (e *Episode) AssetFileName(assetType AssetType) string {
    // Return the file name for the associated asset type...
}

func (e *Episode) SetAssetStatus(status AssetStatus) {
    // Return the status of the asset...
}

func (e *Episode) AssetStatus(assetType AssetType) AssetStatus {
    // Return the status of the asset (is it Pending? Complete?, etc).
}

func (e *Episode) Name() string {
    // Return name of the episode (e.g. "01-02-the-magic-bullet")...
}

The NewEpisodeFromName() method is a convenience method for getting an Episode instance from parsing the name. For example, I can call NewEpisodeFromName("01-11-outbreak") instead of calling NewSeason(1), then season.Episode(1). It's used primarily for quick debugging. With this in place, it makes it much easier to set up an interface for each asset, knowing reliably which methods are available.

Here's what the new audio.go file in the visibilityzero package looks like using the new Episode implementation:

package visibilityzero

import (
    "os"
    "strings"
    "time"

    "github.com/mikerourke/forensic-files-api/internal/crimeseen"
    "github.com/mikerourke/forensic-files-api/internal/videodiary"
    "github.com/mikerourke/forensic-files-api/internal/whodunit"
    "github.com/sirupsen/logrus"
)

type Audio struct {
    *whodunit.Episode
}

func NewAudio(ep *whodunit.Episode) *Audio {
    return &Audio{
        Episode: ep,
    }
}

func (a *Audio) Extract(isPaused bool) {
    // Extract video file here...
}

func (a *Audio) Open() *os.File {
    // Open audio file and return contents here...
}

func (a *Audio) Exists() bool {
    return a.AssetExists(whodunit.AssetTypeAudio)
}

func (a *Audio) FilePath() string {
    return a.AssetFilePath(whodunit.AssetTypeAudio)
}

func (a *Audio) FileName() string {
    return a.AssetFileName(whodunit.AssetTypeAudio)
}

I'll be using the whodunit package quite a bit in upcoming posts. You can view the source on GitHub. I should probably also note that most of my examples from earlier posts have been completely rewritten, but the functionality is the same.

Now that you know what's going on behind the scenes, I can move on to how I used a speech-to-text service to get the episode transcripts. Hooray!