December 27th, 2018
If you have ever been in a checkout line and looked at the cover of any magazine, you've no doubt seen headlines along the lines of, "Lose 5 lbs in 7 days." There are very few places that you can go where you aren't exposed to advertising trying to get you to lose weight. The constant barrage telling people they need to lose weight as well as increasing population size makes "Lose Weight" one of the most often made New Year's resolutions.
To help people try and lose weight, multiple times throughout the year, the subreddit r/loseit organizes weight loss competitions as a way to motivate the community members during their attempts at losing, maintaining, or even sometimes gaining weight. These challenges last between 6 and 10 weeks and pit teams of people to compete against each other to see who can lose the most weight during the challenge. The first challenge starts shortly after New Year's day is the most signed up challenge of the year. In anticipation of the 2019 New Year's Resolution Challenge, I wanted to go back and look at previous challenges to see how successful participants were in meeting the goals that they set for themselves.
During challenge signups, participants enter information about current and past weight-related measurements and what they would like to achieve during the competition. These goals are both related to the amount they want their weight to change, as well as information about some of the motivations they have for losing weight. After signing up, participants are randomly assigned to a team that they will be a part of for the duration of the challenge. Each week participants enter their current weight into the global spreadsheet. These spreadsheets are what I use for my analysis.
Using the Reddit's API, I was able to query the last 1,000 posts on r/loseit that contained the keyword "loseit challenge tracker" and create a list of all Google Sheets urls. From there I narrowed the list down to just the challenge trackers. Using the Google Sheets API, I was able to open and download the spreadsheets into a Pandas DataFrame. With the data in Python, I began cleaning the data.
I started with transforming the column names so they would be consistent across all of the challenges. Now that the data was consistently organized, I went through to try and fill in some of the missing or inconsistent information entered. Some entries I was able to correct for by using from a different challenge they participated in.
In the challenges I analyzed, people were dropped from the competition if they missed two consecutive weigh-ins. To account for dropout, I removed all participants who missed the final weigh-in for their challenge unless they had an entry for the prior week. In these cases, I used the prior week's weigh-in as their final weight.
Doing some simple exploration through some of the features, it was clear that there were instances where information was being entered incorrectly into the spreadsheet eg. people losing 100lbs in a 6-week challenge. For categories like this, I tried to set reasonable upper and lower bounds for the data I used in the analysis.
At the time of my analysis, there had been 10 previous r/loseit challenges with a total of 76 teams.
|Challenge Name||Number of Participants who Finished|
|Spring Into Summer Challenge||1063|
|Mythical Creatures Spring Challenge||1022|
|Super Mario Brothers Super Challenge||1013|
|Super Hero Summer Challenge||966|
|New New Year New Goals Challenge||860|
|Lord Of The Rings Summer Challenge||858|
|The Summer Challenge||791|
|Sci-Fi Movies Challenge||744|
|Autumn Animal Challenge||697|
From participating in a few different challenges, I knew that many people have done multiple challenges. I started by looking at how many people have done more than one challenge, and who has participated in the most challenges. From the data we find that there have been 1539 people who have participated in more than one challenge (and using the same username) and 4568 people who have only participated in a single challenge so far -- again this doesn't account for people who sign-up using multiple accounts.
Next, I wanted to look at the gender distribution of the participants. From the data, it is obvious that r/loseit challenges have a very large gender imbalance. Around 71% of the participants in the challenges are women, 20% are male, 7% are unknown, and just under 1% of participants identify as other. Because there is such a large imbalance, for most of my analysis I will try and look at how the numbers vary between gender -- if they do at all.
Looking at the age distributions, we find that the most common age is 26, the oldest participant is 76, and the youngest has been 13. Separating the distribution by gender, we see that there is not a huge difference between them.
Breaking the participants down by height, we find that the average height is 66.5 in, the shortest participant is 52.0 in, and the tallest person is 82.6 in. As expected, when separating by genders we see that the average height is greater for men than it is for women.
At the beginning of the challenges, the average weight is 196.3 lbs. The highest starting weight so far has been 546.8 lbs. and the lowest has been 90.3 lbs. So really no matter how much you have to lose (or gain) these challenges are more than happy to you participating!
When signing up for the challenges, the average amount that people want to lose during the challenge is 10.0 lbs. This entry had a few obvious outliers that are incorrect entries and some that would be not as obvious, so rather than using the mean I use the median which will be more robust to outliers.
At the end of the challenge, the average weight loss across all participants is 6.0 lbs. The winner for losing the most weight during a challenge lost 45.7 lbs!
In these plots, we see that there is a slight correlation between starting BMI and the amount of weight that is lost during the challenges. This relationship seems independent of gender.
In addition to the physical measurements people enter when they sign up, they are also are given the option of adding a non-scale victory (NSV) they would like to achieve by the end of the challenge. They can also enter links to food and activity trackers. I wanted to look at how many people provide NSVs and links to activity and food trackers.
Has NSV: 78.82% - Gives activity tracker: 20.78% - Gives food tracker: 57.00% - No Info: 12.32%
With so many people providing NSVs, I wanted to create a word cloud to visualize some of the most common words that people use when they describe their reasons for wanting to lose weight.
From the plots below, we see that including an NSV, activity tracker, or food tracker does not have much of a correlation with weight loss. One thing that I would be curious to see, is if the inter-team challenges/participation is correlated with weight loss.
In the description for r/loseit, it says, "A place for people of all sizes to discuss healthy and sustainable methods of weight loss. Whether you need to lose 2 lbs or 400 lbs, you are welcome here!" and based on the challenge data this is shown to be true. While most people want to lose weight, there were a couple that wanted to gain weight and still felt like they could be a part of the challenges.