Fantasy sports come in many different flavors, and daily fantasy sports is one of the flavors I enjoy playing. There are numerous paid tools available on the Internet for tracking your overall DFS performance. However, why use a paid tool when we can build our own DFS Return on Investment (ROI) tracking tool using Python and Plotly!
The two sites I play DFS on, DraftKings and FanDuel, offer downloadable CSV files for your entire entry history. The two sites’ CSV files are in different formats, and if you have been playing DFS for as long as I have, could be very long or even broken up into multiple files. In other words, these files are unwieldly by themselves. Python is great for quickly parsing CSV data and Plotly is the perfect compliment for graphing our results.
Initial Setup
If you have read any of my prior posts you know I am a fan of pipenv for package management. Thus, let’s set up a simple Python project, using pipenv as our packaging tool. The first step is to create a dedicated directory for our project. Open a terminal window and enter the following command to navigate to your Desktop
directory (or a different directory of your choice), and create a new dfs_tracker
directory which will be our project’s main directory:
$ cd ~/Desktop
$ mkdir dfs_tracker && cd dfs_tracker
Note: I assume you already have Python 3.11 installed as well as pipenv. For guidance with installing Python 3 read this article. If you need assistance with pipenv please reference it’s installation documentation.
Now, we can create our Pipfile and activate our virtual environment. Open up our project in a code editor of your choice, though I suggest either PyCharm or VSCode and create a new Pipfile
:
[[source]]
name = "pypi"
url = "https://pypi.org/simple"
verify_ssl = true
[dev-packages]
black = "==22.12.0"
[packages]
plotly = "==5.11.0"
[requires]
python_version = "3.11"
The above Pipfile
might appear confusing at first, so let’s go over it. Pipenv will go to "https://pypi.org/simple"
for our Python packages. It will install plotly
, and will also install black
if the --dev
tag is included in the install command as seen a bit below. Lastly, this project requires Python 3.11
at minimum.
Note: I always use Black for my code formatting needs. I suggest you do as well. It takes the guesswork out of code formatting, allowing developers to better use their time on more important things.
Next, go back to our terminal window, install everything via pipenv, and activate our virtualenv.
$ pipenv install --dev
$ pipenv shell
Above we are installing everything listed in the Pipfile, including the dev-dependencies. Black is currently our only dev-package, thus if you don’t want to bother with code formatting simply do pipenv install
instead. We can now activate our virtualenv via pipenv shell
. On to project structure we go and then code!
Project Structure
Layout the project’s structure as so:
dfs_tracker/
│
├── csv_data/
├── dfs_tracker/
│ ├── __init__.py
│ ├── tracker.py
│ └── contests/
| ├── __init__.py
| ├── entry.py
| └── sports.py
│ └── dfs_data/
| ├── __init__.py
| ├── parse_dfs_data.py
| └── plot_dfs_data.py
├── Pipfile
└── Pipfile.lock
First, add a dfs_tracker
package to our project’s root directory. Next, create a contests
package and dfs_data
package inside of our dfs_tracker
package. Be mindful of the __init__.py
files listed above and add them where necessary, as well as all listed Python modules. Our project setup is complete and we are ready to write some code.
Note: You will need CSV data for this project. If you don’t have any DFS CSVs of your own feel free to grab mine here.
Parsing our CSV Files
Let’s take a look at our CSV files and devise a plan for parsing their data.
The Entry Object
The command line tool we are writing in this post will consume CSV files from both DraftKings and FanDuel. Here are quick previews of each of these files (format seen below is valid as of the writing of this post), and their associated header rows:
DraftKings CSV:
FanDuel CSV:
Each row in a CSV files represents a single contest entry. An entry consists of, for our needs, a site identifier (DraftKings or FanDuel), a sport (NFL, NBA, etc.), the contest’s entry fee, any winnings received from the entry, the contest name, and finally the contest date. Our code will contain an Entry
object used to track each of our contest entries. These Entry
objects need to be instantiated from either a DraftKings or FanDuel CSV row. Open entry.py
and implement the Entry
class and a couple of additional functions:
from datetime import datetime
from decimal import Decimal
from re import sub
SITE_DRAFTKINGS = "DraftKings"
SITE_FANDUEL = "FanDuel"
DRAFTKINGS_SPORT_COLUMN = "Sport"
DRAFTKINGS_CONTEST_NAME_COLUMN = "Entry"
DRAFTKINGS_CONTEST_DATE_COLUMN = "Contest_Date_EST"
DRAFTKINGS_WINNINGS_COLUMN = "Winnings_Non_Ticket"
DRAFTKINGS_ENTRY_FEE_COLUMN = "Entry_Fee"
FANDUEL_SPORT_COLUMN = "Sport"
FANDUEL_CONTEST_NAME_COLUMN = "Title"
FANDUEL_DATE_COLUMN = "Date"
FANDUEL_ENTRY_FEE_COLUMN = "Entry ($)"
FANDUEL_WINNINGS_COLUMN = "Winnings ($)"
class Entry:
def __init__(self, site, sport, entry_fee, winnings, contest_name, contest_date):
self.site = site
self.sport = sport
self.contest_name = contest_name
self.contest_date = contest_date
self.entry_fee = entry_fee
self.winnings = winnings
def get_entry_from_csv_row(csv_row):
if DRAFTKINGS_WINNINGS_COLUMN in csv_row:
return parse_draftkings_entry(csv_row)
elif FANDUEL_WINNINGS_COLUMN in csv_row:
return parse_fanduel_entry(csv_row)
def parse_fanduel_entry(csv_entry_row):
return Entry(
site=SITE_FANDUEL,
sport=csv_entry_row[FANDUEL_SPORT_COLUMN].upper(),
contest_name=csv_entry_row[FANDUEL_CONTEST_NAME_COLUMN],
contest_date=datetime.strptime(
csv_entry_row[FANDUEL_DATE_COLUMN], "%Y/%m/%d"
).date(),
entry_fee=_convert_currency_to_decimal(csv_entry_row[FANDUEL_ENTRY_FEE_COLUMN]),
winnings=_convert_currency_to_decimal(csv_entry_row[FANDUEL_WINNINGS_COLUMN]),
)
def parse_draftkings_entry(csv_entry_row):
return Entry(
site=SITE_DRAFTKINGS,
sport=csv_entry_row[DRAFTKINGS_SPORT_COLUMN].upper(),
contest_name=csv_entry_row[DRAFTKINGS_CONTEST_NAME_COLUMN],
contest_date=datetime.strptime(
csv_entry_row[DRAFTKINGS_CONTEST_DATE_COLUMN], "%Y-%m-%d %H:%M:%S"
).date(),
entry_fee=_convert_currency_to_decimal(
csv_entry_row[DRAFTKINGS_ENTRY_FEE_COLUMN]
),
winnings=_convert_currency_to_decimal(csv_entry_row[DRAFTKINGS_WINNINGS_COLUMN]),
)
def _convert_currency_to_decimal(currency_val):
return Decimal(sub(r"[^\d.]", "", currency_val))
Above, you might notice we are accessing our CSV rows like a dictionary. This is because we will be reading our CSV data via Python’s DictReader.
Reading the CSV Data
A collection of DFS entries spread out across two separate sites for numerous entries can contain entry data for a various amount of sports. When reading our CSV data we want to not only keep track of all Entry
objects, but all unique sports played across all entries too. Doing so will allow us to plot data not just for all of our entries in general, but also for all of the individual sports too.
Open parse_dfs_data.py
and implement a couple of functions for reading the CSV data:
import csv
import os
from dfs_tracker.contests.entry import get_entry_from_csv_row
from dfs_tracker.contests.sports import SPORTS_ALL
def parse_all_csv_files(files_dir):
dfs_entries = []
sports = {SPORTS_ALL}
for file in os.listdir(files_dir):
csv_file = os.path.join(files_dir, file)
cur_dfs_entries, cur_sports = _parse_csv_file(csv_file)
dfs_entries += cur_dfs_entries
sports = sports | cur_sports
return dfs_entries, sports
def _parse_csv_file(csv_file):
cur_entries = []
cur_sports = set()
with open(csv_file, "r") as f:
csv_reader = csv.DictReader(f)
for dfs_entry_row in csv_reader:
cur_entry = get_entry_from_csv_row(dfs_entry_row)
cur_entries.append(cur_entry)
cur_sports.add(cur_entry.sport)
return cur_entries, cur_sports
The import of the constant SPORTS_ALL
might give you some trouble as I have not mentioned it here yet. Open sports.py
and define a few constant values for some of the sports you might come across in our DFS entry data:
SPORTS_ALL = "ALL"
SPORTS_MLB = "MLB"
SPORTS_NASCAR = "NASCAR"
SPORTS_NBA = "NBA"
SPORTS_NFL = "NFL"
SPORTS_NHL = "NHL"
SPORTS_PGA = "PGA"
SPORTS_WNBA = "WNBA"
Running from the Command Line
Python is “batteries included” when it comes to writing runnable command line tools. The argparse module makes it easy to write user-friendly command-line interfaces, and it is the module we are using in this post. The module tracker.py
contains our command line interface implementation using argparse:
import sys
from argparse import ArgumentParser
from decimal import Decimal
from os import path
sys.path.append(path.dirname(path.dirname(path.abspath(__file__))))
from dfs_tracker.dfs_data.parse_dfs_data import parse_all_csv_files
from dfs_tracker.contests.sports import SPORTS_ALL
def track_dfs_performance(files_dir):
dfs_entries, sports = parse_all_csv_files(files_dir)
entry_fees = {}
winnings = {}
for sport in sports:
entry_fees[sport] = 0
winnings[sport] = 0
for entry in dfs_entries:
entry_fees[SPORTS_ALL] += entry.entry_fee
entry_fees[entry.sport] += entry.entry_fee
winnings[SPORTS_ALL] += entry.winnings
winnings[entry.sport] += entry.winnings
roi = {
sport: _calculate_roi(entry_fees[sport], winnings[sport]) for sport in sports
}
_summarize_results(dfs_entries, sports, entry_fees, winnings, roi)
def _calculate_roi(initial_investment, final_value):
if initial_investment == 0 and final_value == 0:
return 0
if initial_investment == 0:
return "Infinity"
net_return = final_value - initial_investment
roi = Decimal(net_return / initial_investment * 100)
return "{0:.2f}".format(roi)
def _summarize_results(dfs_entries, sports, entry_fees, winnings, roi):
sports_sorted = sorted(sports)
print(f"Entries Recorded, All Sports: {len(dfs_entries)}\n")
for sport in sports_sorted:
print(f"Entry Fees, {sport}: ${entry_fees[sport]}")
print(f"Winnings, {sport}: ${winnings[sport]}")
print(f"ROI, {sport}: {roi[sport]}%\n")
if __name__ == "__main__":
parser = ArgumentParser()
parser.add_argument(
"-f",
"--files",
dest="files",
help="Directory containing all of our CSV files",
metavar="FILES",
)
args = parser.parse_args()
track_dfs_performance(args.files)
Let’s start from the top. First, the method track_dfs_performance
is where all of our main functionality exists for this module. This function calls our previously implemented parse_all_csv_files
function, tracks all fees and winnings across all entries for all sports, and calculates the ROIs for each sport. Lastly, this function summarizes our data by printing the results to the console.
We will be invoking tracker.py
as a script, thus the line if __name__ == "__main__":
is included towards the bottom of our module. Any code that falls within this if
statement will be executed but only if the module is invoked as a script. The code found in this if
statement instantiates an ArgumentParser
object and configures our script to take a --files
parameter from the command line. This parameter represents the directory containing our CSV data we want to analyze. Lastly, track_dfs_performance
is invoked. Try running our CLI from a console via the following command from our project’s root directory:
$ python dfs_tracker/tracker.py --files=csv_data/
We don’t have plotting implemented yet but we should see a summary printed to our terminal window!
Note: This post won’t serve as a tutorial for Plotly. If you’re looking for a Plotly tutorial I suggest its official documentation on getting started before proceeding further.
Plotting with Plotly
The DFS contest data we are working with can contain entries for an undetermined number of sports. My DFS entry data might have entries from five different sports and yours might contain entries from just two different sports. Our plotting implementation has to be flexible enough to handle such scenarios. We will want to make a plot for each sport, or better yet, a subplot:
from itertools import accumulate
import plotly.graph_objects as go
from plotly.subplots import make_subplots
from dfs_tracker.contests.sports import (
SPORTS_ALL,
SPORTS_MLB,
SPORTS_NASCAR,
SPORTS_NBA,
SPORTS_NFL,
SPORTS_NHL,
SPORTS_PGA,
SPORTS_WNBA,
)
DFS_TRACKER = "DFS Tracker"
COLOR_LOOKUP = {
SPORTS_ALL: "navy",
SPORTS_MLB: "red",
SPORTS_NASCAR: "gold",
SPORTS_NBA: "orange",
SPORTS_NFL: "brown",
SPORTS_NHL: "black",
SPORTS_PGA: "green",
SPORTS_WNBA: "purple",
}
SUBPLOT_HEIGHT = 300
def plot_dfs_results(dfs_entries, sports):
dfs_entries.sort(key=lambda entry: entry.contest_date)
sports_sorted = sorted(sports)
profit_by_day = {sport: {} for sport in sports_sorted}
for entry in dfs_entries:
_add_profits_for_day(entry, SPORTS_ALL, profit_by_day)
_add_profits_for_day(entry, entry.sport, profit_by_day)
cumulative_profits = {
sport: list(accumulate(profit_by_day[sport].values()))
for sport in sports_sorted
}
# Build up a list of our subplot text titles. The
# Plotly figure will expect a list
subplot_titles = []
for sport in sports_sorted:
subplot_titles.append(f"Cumulative Profits, {sport}")
fig = make_subplots(rows=len(sports_sorted), cols=1, subplot_titles=subplot_titles)
for count, sport in enumerate(sports_sorted, start=1):
# Append each subplot to our Plotly figure. If the sport we are
# plotting is not found in our COLOR_LOOKUP we default to black.
# Our x-axis is our unique dates played for the current sport,
# and our y-axis is our cumulative profits over these dates.
fig.append_trace(
go.Scatter(
x=list(profit_by_day[sport].keys()),
y=list(cumulative_profits[sport]),
name=f"{DFS_TRACKER}, {sport}",
line=dict(
color=COLOR_LOOKUP.get(sport, "black"),
width=2,
shape="spline",
smoothing=1.2,
),
),
row=count,
col=1,
)
# This line is necessary to label our y-axis as dollars or "$"
fig.update_yaxes(tickprefix="$", tickformat=",.", row=count)
# Wrap up with some Plotly configuration
fig_height = SUBPLOT_HEIGHT * len(sports_sorted)
fig.update_layout(height=fig_height, width=1000, title_text="DFS Tracker by Sport")
fig.update_traces(mode="markers+lines")
fig.show()
def _add_profits_for_day(entry, sport, profit_by_day):
if entry.contest_date in profit_by_day[sport]:
profit_by_day[sport][entry.contest_date] += entry.winnings - entry.entry_fee
else:
profit_by_day[sport][entry.contest_date] = entry.winnings - entry.entry_fee
The code above is doing a lot and I will quickly go over some of it. The goal is to plot our cumulative profits (or lack thereof) along our y-axis and entry dates along the x-axis. After mapping our individual profits for each sport to their specific dates via the use of dictionaries we then use a dict comprehension and the itertools
accumulate
function to sum up our cumulative profits by date.
rolling_profits_by_date = {
sport: list(accumulate(profit_by_day[sport].values()))
for sport in sports_sorted
}
We now have all of the data we need to render our subplots. The code then enumerates
over our list of sorted sports, keeping track of which row in the Plotly subplot to render via the count
variable. Within this loop we add each sport’s subplot to our Plotly figure. Each subplot’s x-axis represents the current sport being plotted and its list of unique dates we have entries for. The y-axis for each subplot is the cumulative profits over those dates and entries. Notice, in addition to plotting each sport we also plot the totals for ALL sports combined.
With the plotting implementation behind us we need to go back to tracker.py
and make a couple of small changes. First, add the following import
statement to the bottom of the other import
statements found in tracker.py
:
from dfs_tracker.dfs_data.plot_dfs_data import plot_dfs_results
Next, add the following line to the very end of the track_dfs_performance
function:
plot_dfs_results(dfs_entries, sports)
That should do it. Run the CLI one more time and we should not just see the summary but our Plotly graphs as well!
$ python dfs_tracker/tracker.py --files=csv_data/
Assuming everything went well Plotly should open a browser for you, containing similar plots:
In addition to the graphs, your console should have printed out a summary too:
Entries Recorded, All Sports: 3726
Entry Fees, ALL: $4029.08
Winnings, ALL: $5500.35
ROI, ALL: 36.52%
Entry Fees, MLB: $294.25
Winnings, MLB: $286.67
ROI, MLB: -2.58%
Entry Fees, NBA: $677.59
Winnings, NBA: $791.78
ROI, NBA: 16.85%
Entry Fees, NFL: $3043.24
Winnings, NFL: $4415.40
ROI, NFL: 45.09%
Entry Fees, NHL: $10.00
Winnings, NHL: $4.50
ROI, NHL: -55.00%
Conclusion
That wraps up our introduction to plotting CSV data using Python and Plotly! This post merely scratches the surface regarding Plotly’s capabilities. If you’re looking for the source code for this post you can find it on GitHub, as well as the CSV data used to make the charts found above. Grab the code and the data, and plot away!