May-12-2024, 09:15 AM
Greetings Pythonistas!
I’ve got a data set in CSV format which spans from July 2019 up to December 2023. It is a collection of productivity intervals that I’ve meticulously recorded while doing social science research like Philosophy as well as learning how to program Python, Django, some DevOps, and a little algorithmic design. When I finish spending 25 minutes on my lunch hour on my tablet watching Udemy course content teaching the basics of Binary Search Trees, at the end of my lunch break, using an app called aTimeLogger on my Android phone I enter the “Activity” type (discrete subjects such as “Python”, “algorithms”, “Django”, “sysadmin”, or even “Magick” / “writing”). I also enter the date, the start time, and the end time. After entering a start time and end time, a time delta is automatically calculated. Then I write an annotation (1-2 sentences) which is sort of like a mental note for my future reference. See attached for the data set. Take note: I purged the “Comment” (annotation) data objects (string fields) from the data set for the sake of this forum thread.
For additional context, over the term of the data set, I’ve spent a total of ~378 hours doing something “Python” related and ~579 hours working on Django course content (or a Django based web project). Those are the two largest categories. The rest of the Activities aren’t as data dense.
I am trying to plot these activities as line graphs over time. The x-axis is the continuous element of time. The y-axis are the summed duration time deltas for every entry. If I take just the ‘Python’ activity and plot every instance as a line graph over the ~5 year span/period, the data points are noisy, busy, and hard to follow. But they still plot successfully. Here is my initilization of the data:
That kind of works.
But what I really want to do is use pandas’
Here was a separate attempt at a different stage in my coding session this morning where I tried to plot the “Python” activity with the original noisy data points which I tried to simulatenously contrast with a rolling window:
How do I tell pandas to show the quarterly and half year averages of the timedelta data point for the “Python” activity. Thanks!
I’ve got a data set in CSV format which spans from July 2019 up to December 2023. It is a collection of productivity intervals that I’ve meticulously recorded while doing social science research like Philosophy as well as learning how to program Python, Django, some DevOps, and a little algorithmic design. When I finish spending 25 minutes on my lunch hour on my tablet watching Udemy course content teaching the basics of Binary Search Trees, at the end of my lunch break, using an app called aTimeLogger on my Android phone I enter the “Activity” type (discrete subjects such as “Python”, “algorithms”, “Django”, “sysadmin”, or even “Magick” / “writing”). I also enter the date, the start time, and the end time. After entering a start time and end time, a time delta is automatically calculated. Then I write an annotation (1-2 sentences) which is sort of like a mental note for my future reference. See attached for the data set. Take note: I purged the “Comment” (annotation) data objects (string fields) from the data set for the sake of this forum thread.
For additional context, over the term of the data set, I’ve spent a total of ~378 hours doing something “Python” related and ~579 hours working on Django course content (or a Django based web project). Those are the two largest categories. The rest of the Activities aren’t as data dense.
I am trying to plot these activities as line graphs over time. The x-axis is the continuous element of time. The y-axis are the summed duration time deltas for every entry. If I take just the ‘Python’ activity and plot every instance as a line graph over the ~5 year span/period, the data points are noisy, busy, and hard to follow. But they still plot successfully. Here is my initilization of the data:
import pandas as pd import matplotlib.pyplot as plt bulk_df = pd.read_csv("data/all-comments-removed.csv",parse_dates=["From","To",]) bulk_df['Duration'] = pd.to_timedelta(bulk_df['Duration'])Here is some general info about my dataset:
bulk_df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 2583 entries, 0 to 2582 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Activity 2583 non-null object 1 Duration 2583 non-null timedelta64[ns] 2 From 2583 non-null datetime64[ns] 3 To 2583 non-null datetime64[ns] dtypes: datetime64[ns](2), object(1), timedelta64[ns](1) memory usage: 80.8+ KBHere is my ‘Python’ activity data parsed:
# Create a DataFrame python = bulk_df[bulk_df["Activity"] == "Python"] #.rolling(window=90).mean() # Plotting python.plot(x='From', y='Duration', kind='line', marker='o',figsize=(14, 8))Here is the output so far:
That kind of works.
But what I really want to do is use pandas’
rolling()
method and pass in integers like ‘182’ days for a half year and ‘90’ for 3 months (quarter year). I tried that . You can see my code snippet above where I tried to use .rolling()
. My Jupyter Notebook didn’t like it so I commented it out. I am clearly doing something wrong.Here was a separate attempt at a different stage in my coding session this morning where I tried to plot the “Python” activity with the original noisy data points which I tried to simulatenously contrast with a rolling window:
# Convert timedelta to hours bulk_df['Duration_hours'] = bulk_df['Duration'].dt.total_seconds() / 3600 # Create a DataFrame python = bulk_df[bulk_df["Activity"] == "Python"] # Plotting # python.plot(x='From', y='Duration_hours', kind='line', marker='o') # Calculate the rolling mean with a window of 2 rolling_mean = python['Duration_hours'].rolling(window=2).mean() # Plotting python.plot(x='From', y='Duration_hours', kind='line', marker='o',figsize=(14, 8)) rolling_mean.plot(x='From', y='Duration_hours', kind='line', marker='o', label='Rolling Mean')Here was the strange output:
How do I tell pandas to show the quarterly and half year averages of the timedelta data point for the “Python” activity. Thanks!
Attached Files
all-comments-removed.csv (Size: 144.36 KB / Downloads: 3)