Share



Tracking DFS Results with Python using Plotly


Fantasy sports come in many different flavors, and daily fantasy sports is one of the flavors I enjoy playing. There are numerous paid tools available on the Internet for tracking your overall DFS performance. However, why use a paid tool when we can build our own DFS Return on Investment (ROI) tracking tool using Python and Plotly!

The two sites I play DFS on, DraftKings and FanDuel, offer downloadable CSV files for your entire entry history. The two sites’ CSV files are in different formats, and if you have been playing DFS for as long as I have, could be very long or even broken up into multiple files. In other words, these files are unwieldly by themselves. Python is great for quickly parsing CSV data and Plotly is the perfect compliment for graphing our results.

Simpsons Graph

Initial Setup

If you have read any of my prior posts you know I am a fan of pipenv for package management. Thus, let’s set up a simple Python project, using pipenv as our packaging tool. The first step is to create a dedicated directory for our project. Open a terminal window and enter the following command to navigate to your Desktop directory (or a different directory of your choice), and create a new dfs_tracker directory which will be our project’s main directory:

$ cd ~/Desktop
$ mkdir dfs_tracker && cd dfs_tracker

Note: I assume you already have Python 3.9 installed as well as pipenv. For guidance with installing Python 3 please visit this site. If you need assistance with pipenv please reference it’s installation documentation.

Now, we can create our Pipfile and activate our virtual environment. Open up our project in a code editor of your choice, though I suggest either PyCharm or VSCode and create a new Pipfile:

[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true

[dev-packages]
black = "==21.5b2"

[packages]
plotly = "==4.14.3"

[requires]
python_version = "3.9"

The above Pipfile might appear confusing at first, so let’s go over it. Pipenv will go to "https://pypi.org/simple" for our Python packages. It will install plotly, and will also install black if the --dev tag is included in the install command as seen a bit below. Lastly, this project requires Python 3.9 at minimum.

Note: I always use Black for my code formatting needs. I suggest you do as well. It takes the guesswork out of code formatting, allowing developers to better use their time on more important things.

Next, go back to our terminal window, install everything via pipenv, and activate our virtualenv.

$ pipenv install --dev
$ pipenv shell

Above we are installing everything listed in the Pipfile, including the dev-dependencies. Black is currently our only dev-package, thus if you don’t want to bother with code formatting simply do pipenv install instead. We can now activate our virtualenv via pipenv shell. On to project structure we go and then code!

Project Structure

Layout the project’s structure as so:

dfs_tracker/
│
├── csv_data/
├── dfs_tracker/
│   ├── __init__.py
│   ├── tracker.py
│   └── contests/
|       ├── __init__.py
|       ├── entry.py
|       └── sports.py
│   └── dfs_data/
|       ├── __init__.py
|       ├── parse_dfs_data.py
|       └── plot_dfs_data.py
├── Pipfile
└── Pipfile.lock

First, add a dfs_tracker package to our project’s root directory. Next, create a contests package and dfs_data package inside of our dfs_tracker package. Be mindful of the __init__.py files listed above and add them where necessary, as well as all listed Python modules. Our project setup is complete and we are ready to write some code.

Note: You will need CSV data for this project. If you don’t have any DFS CSVs of your own feel free to grab mine here.

Parsing our CSV Files

Let’s take a look at our CSV files and devise a plan for parsing their data.

The Entry Object

The command line tool we are writing in this post will consume CSV files from both DraftKings and FanDuel. Here are quick previews of each of these files (format seen below is valid as of the writing of this post), and their associated header rows:

DraftKings CSV: DraftKings CSV

FanDuel CSV: FanDuel CSV

Each row in a CSV files represents a single contest entry. An entry consists of, for our needs, a site identifier (DraftKings or FanDuel), a sport (NFL, NBA, etc.), the contest’s entry fee, any winnings received from the entry, the contest name, and finally the contest date. Our code will contain an Entry object used to track each of our contest entries. These Entry objects need to be instantiated from either a DraftKings or FanDuel CSV row. Open entry.py and implement the Entry class and a couple of additional functions:

from datetime import datetime
from decimal import Decimal
from re import sub


SITE_DRAFTKINGS = "DraftKings"
SITE_FANDUEL = "FanDuel"

DRAFTKINGS_SPORT_COLUMN = "Sport"
DRAFTKINGS_CONTEST_NAME_COLUMN = "Entry"
DRAFTKINGS_CONTEST_DATE_COLUMN = "Contest_Date_EST"
DRAFTKINGS_WINNINGS_COLUMN = "Winnings_Non_Ticket"
DRAFTKINGS_ENTRY_FEE_COLUMN = "Entry_Fee"

FANDUEL_SPORT_COLUMN = "Sport"
FANDUEL_CONTEST_NAME_COLUMN = "Title"
FANDUEL_DATE_COLUMN = "Date"
FANDUEL_ENTRY_FEE_COLUMN = "Entry ($)"
FANDUEL_WINNINGS_COLUMN = "Winnings ($)"


class Entry:
    def __init__(self, site, sport, entry_fee, winnings, contest_name, contest_date):
        self.site = site
        self.sport = sport
        self.contest_name = contest_name
        self.contest_date = contest_date
        self.entry_fee = entry_fee
        self.winnings = winnings


def get_entry_from_csv_row(csv_row):
    if DRAFTKINGS_WINNINGS_COLUMN in csv_row:
        return parse_draftkings_entry(csv_row)
    elif FANDUEL_WINNINGS_COLUMN in csv_row:
        return parse_fanduel_entry(csv_row)


def parse_fanduel_entry(csv_entry_row):
    return Entry(
        site=SITE_FANDUEL,
        sport=csv_entry_row[FANDUEL_SPORT_COLUMN].upper(),
        contest_name=csv_entry_row[FANDUEL_CONTEST_NAME_COLUMN],
        contest_date=datetime.strptime(
            csv_entry_row[FANDUEL_DATE_COLUMN], "%Y/%m/%d"
        ).date(),
        entry_fee=_convert_currency_to_decimal(csv_entry_row[FANDUEL_ENTRY_FEE_COLUMN]),
        winnings=_convert_currency_to_decimal(csv_entry_row[FANDUEL_WINNINGS_COLUMN]),
    )


def parse_draftkings_entry(csv_entry_row):
    return Entry(
        site=SITE_DRAFTKINGS,
        sport=csv_entry_row[DRAFTKINGS_SPORT_COLUMN].upper(),
        contest_name=csv_entry_row[DRAFTKINGS_CONTEST_NAME_COLUMN],
        contest_date=datetime.strptime(
            csv_entry_row[DRAFTKINGS_CONTEST_DATE_COLUMN], "%Y-%m-%d %H:%M:%S"
        ).date(),
        entry_fee=_convert_currency_to_decimal(
            csv_entry_row[DRAFTKINGS_ENTRY_FEE_COLUMN]
        ),
        winnings=_convert_currency_to_decimal(csv_entry_row[DRAFTKINGS_WINNINGS_COLUMN]),
    )


def _convert_currency_to_decimal(currency_val):
    return Decimal(sub(r"[^\d.]", "", currency_val))

Above, you might notice we are accessing our CSV rows like a dictionary. This is because we will be reading our CSV data via Python’s DictReader.

Reading the CSV Data

A collection of DFS entries spread out across two separate sites for numerous entries can contain entry data for a various amount of sports. When reading our CSV data we want to not only keep track of all Entry objects, but all unique sports played across all entries too. Doing so will allow us to plot data not just for all of our entries in general, but also for all of the individual sports too.

Open parse_dfs_data.py and implement a couple of functions for reading the CSV data:

import csv
import os

from dfs_tracker.contests.entry import get_entry_from_csv_row
from dfs_tracker.contests.sports import SPORTS_ALL


def parse_all_csv_files(files_dir):
    dfs_entries = []
    sports = {SPORTS_ALL}

    for file in os.listdir(files_dir):
        csv_file = os.path.join(files_dir, file)
        cur_dfs_entries, cur_sports = _parse_csv_file(csv_file)

        dfs_entries += cur_dfs_entries
        sports = sports | cur_sports

    return dfs_entries, sports


def _parse_csv_file(csv_file):
    cur_entries = []
    cur_sports = set()

    with open(csv_file, "r") as f:
        csv_reader = csv.DictReader(f)

        for dfs_entry_row in csv_reader:
            cur_entry = get_entry_from_csv_row(dfs_entry_row)
            cur_entries.append(cur_entry)
            cur_sports.add(cur_entry.sport)

    return cur_entries, cur_sports

The import of the constant SPORTS_ALL might give you some trouble as I have not mentioned it here yet. Open sports.py and define a few constant values for some of the sports you might come across in our DFS entry data:

SPORTS_ALL = "ALL"
SPORTS_MLB = "MLB"
SPORTS_NASCAR = "NASCAR"
SPORTS_NBA = "NBA"
SPORTS_NFL = "NFL"
SPORTS_NHL = "NHL"
SPORTS_PGA = "PGA"
SPORTS_WNBA = "WNBA"

Running from the Command Line

Python is “batteries included” when it comes to writing runnable command line tools. The argparse module makes it easy to write user-friendly command-line interfaces, and it is the module we are using in this post. The module tracker.py contains our command line interface implementation using argparse:

import sys
from argparse import ArgumentParser
from decimal import Decimal
from os import path


sys.path.append(path.dirname(path.dirname(path.abspath(__file__))))


from dfs_tracker.dfs_data.parse_dfs_data import parse_all_csv_files
from dfs_tracker.contests.sports import SPORTS_ALL


def track_dfs_performance(files_dir):
    dfs_entries, sports = parse_all_csv_files(files_dir)

    entry_fees = {}
    winnings = {}

    for sport in sports:
        entry_fees[sport] = 0
        winnings[sport] = 0

    for entry in dfs_entries:
        entry_fees[SPORTS_ALL] += entry.entry_fee
        entry_fees[entry.sport] += entry.entry_fee

        winnings[SPORTS_ALL] += entry.winnings
        winnings[entry.sport] += entry.winnings

    roi = {
        sport: _calculate_roi(entry_fees[sport], winnings[sport]) for sport in sports
    }

    _summarize_results(dfs_entries, sports, entry_fees, winnings, roi)


def _calculate_roi(initial_investment, final_value):
    if initial_investment == 0 and final_value == 0:
        return 0

    if initial_investment == 0:
        return "Infinity"

    net_return = final_value - initial_investment
    roi = Decimal(net_return / initial_investment * 100)
    return "{0:.2f}".format(roi)


def _summarize_results(dfs_entries, sports, entry_fees, winnings, roi):
    sports_sorted = sorted(sports)

    print(f"Entries Recorded, All Sports: {len(dfs_entries)}\n")

    for sport in sports_sorted:
        print(f"Entry Fees, {sport}: ${entry_fees[sport]}")
        print(f"Winnings, {sport}: ${winnings[sport]}")
        print(f"ROI, {sport}: {roi[sport]}%\n")


if __name__ == "__main__":
    parser = ArgumentParser()
    parser.add_argument(
        "-f",
        "--files",
        dest="files",
        help="Directory containing all of our CSV files",
        metavar="FILES",
    )
    args = parser.parse_args()

    track_dfs_performance(args.files)

Let’s start from the top. First, the method track_dfs_performance is where all of our main functionality exists for this module. This function calls our previously implemented parse_all_csv_files function, tracks all fees and winnings across all entries for all sports, and calculates the ROIs for each sport. Lastly, this function summarizes our data by printing the results to the console.

We will be invoking tracker.py as a script, thus the line if __name__ == "__main__": is included towards the bottom of our module. Any code that falls within this if statement will be executed but only if the module is invoked as a script. The code found in this if statement instantiates an ArgumentParser object and configures our script to take a --files parameter from the command line. This parameter represents the directory containing our CSV data we want to analyze. Lastly, track_dfs_performance is invoked. Try running our CLI from a console via the following command from our project’s root directory:

$ python dfs_tracker/tracker.py --files=csv_data/

We don’t have plotting implemented yet but we should see a summary printed to our terminal window!

Note: This post won’t serve as a tutorial for Plotly. If you’re looking for a Plotly tutorial I suggest its official documentation on getting started before proceeding further.

Plotting with Plotly

The DFS contest data we are working with can contain entries for an undetermined number of sports. My DFS entry data might have entries from five different sports and yours might contain entries from just two different sports. Our plotting implementation has to be flexible enough to handle such scenarios. We will want to make a plot for each sport, or better yet, a subplot:

from itertools import accumulate

import plotly.graph_objects as go
from plotly.subplots import make_subplots

from dfs_tracker.contests.sports import (
    SPORTS_ALL,
    SPORTS_MLB,
    SPORTS_NASCAR,
    SPORTS_NBA,
    SPORTS_NFL,
    SPORTS_NHL,
    SPORTS_PGA,
    SPORTS_WNBA,
)


DFS_TRACKER = "DFS Tracker"
COLOR_LOOKUP = {
    SPORTS_ALL: "navy",
    SPORTS_MLB: "red",
    SPORTS_NASCAR: "gold",
    SPORTS_NBA: "orange",
    SPORTS_NFL: "brown",
    SPORTS_NHL: "black",
    SPORTS_PGA: "green",
    SPORTS_WNBA: "purple",
}
SUBPLOT_HEIGHT = 300


def plot_dfs_results(dfs_entries, sports):
    dfs_entries.sort(key=lambda entry: entry.contest_date)
    sports_sorted = sorted(sports)

    profit_by_day = {sport: {} for sport in sports_sorted}

    for entry in dfs_entries:
        _add_profits_for_day(entry, SPORTS_ALL, profit_by_day)
        _add_profits_for_day(entry, entry.sport, profit_by_day)

    cumulative_profits = {
        sport: list(accumulate(profit_by_day[sport].values()))
        for sport in sports_sorted
    }

    # Build up a list of our subplot text titles. The
    # Plotly figure will expect a list
    subplot_titles = []
    for sport in sports_sorted:
        subplot_titles.append(f"Cumulative Profits, {sport}")

    fig = make_subplots(rows=len(sports_sorted), cols=1, subplot_titles=subplot_titles)

    for count, sport in enumerate(sports_sorted, start=1):
        # Append each subplot to our Plotly figure. If the sport we are
        # plotting is not found in our COLOR_LOOKUP we default to black.
        # Our x-axis is our unique dates played for the current sport,
        # and our y-axis is our cumulative profits over these dates.
        fig.append_trace(
            go.Scatter(
                x=list(profit_by_day[sport].keys()),
                y=list(cumulative_profits[sport]),
                name=f"{DFS_TRACKER}, {sport}",
                line=dict(
                    color=COLOR_LOOKUP.get(sport, "black"),
                    width=2,
                    shape="spline",
                    smoothing=1.2,
                ),
            ),
            row=count,
            col=1,
        )

        # This line is necessary to label our y-axis as dollars or "$"
        fig.update_yaxes(tickprefix="$", tickformat=",.", row=count)

    # Wrap up with some Plotly configuration
    fig_height = SUBPLOT_HEIGHT * len(sports_sorted)
    fig.update_layout(height=fig_height, width=1000, title_text="DFS Tracker by Sport")
    fig.update_traces(mode="markers+lines")
    fig.show()


def _add_profits_for_day(entry, sport, profit_by_day):
    if entry.contest_date in profit_by_day[sport]:
        profit_by_day[sport][entry.contest_date] += entry.winnings - entry.entry_fee
    else:
        profit_by_day[sport][entry.contest_date] = entry.winnings - entry.entry_fee

The code above is doing a lot and I will quickly go over some of it. The goal is to plot our cumulative profits (or lack thereof) along our y-axis and entry dates along the x-axis. After mapping our individual profits for each sport to their specific dates via the use of dictionaries we then use a dict comprehension and the itertools accumulate function to sum up our cumulative profits by date.

    rolling_profits_by_date = {
        sport: list(accumulate(profit_by_day[sport].values()))
        for sport in sports_sorted
    }

We now have all of the data we need to render our subplots. The code then enumerates over our list of sorted sports, keeping track of which row in the Plotly subplot to render via the count variable. Within this loop we add each sport’s subplot to our Plotly figure. Each subplot’s x-axis represents the current sport being plotted and its list of unique dates we have entries for. The y-axis for each subplot is the cumulative profits over those dates and entries. Notice, in addition to plotting each sport we also plot the totals for ALL sports combined.

With the plotting implementation behind us we need to go back to tracker.py and make a couple of small changes. First, add the following import statement to the bottom of the other import statements found in tracker.py:

from dfs_tracker.dfs_data.plot_dfs_data import plot_dfs_results

Next, add the following line to the very end of the track_dfs_performance function:

plot_dfs_results(dfs_entries, sports)

That should do it. Run the CLI one more time and we should not just see the summary but our Plotly graphs as well!

$ python dfs_tracker/tracker.py --files=csv_data/

Assuming everything went well Plotly should open a browser for you, containing similar plots:

Plotly DFS Plots


In addition to the graphs, your console should have printed out a summary too:

Entries Recorded, All Sports: 3726

Entry Fees, ALL: $4029.08
Winnings, ALL: $5500.35
ROI, ALL: 36.52%

Entry Fees, MLB: $294.25
Winnings, MLB: $286.67
ROI, MLB: -2.58%

Entry Fees, NBA: $677.59
Winnings, NBA: $791.78
ROI, NBA: 16.85%

Entry Fees, NFL: $3043.24
Winnings, NFL: $4415.40
ROI, NFL: 45.09%

Entry Fees, NHL: $10.00
Winnings, NHL: $4.50
ROI, NHL: -55.00%

Conclusion

That wraps up our introduction to plotting CSV data using Python and Plotly! This post merely scratches the surface regarding Plotly’s capabilities. If you’re looking for the source code for this post you can find it on GitHub, as well as the CSV data used to make the charts found above. Grab the code and the data, and plot away!

Mr. Burns Plotting

Subscribe

Get updates on new content straight to your inbox! Unsubscribe at anytime.

* indicates required