Unmasking Reddit's Snoovatars
Subreddits with more masked Snoovatars tend to be more liberal, those with fewer, more conservative
Confession: I spend far too much time on Reddit. Formerly known as “The Front Page of the Internet”, since rebranded to “Dive Into Anything”, both highlight the infotainment aspect of the website I find endlessly entertaining.
Reddit’s mascot has long been the “Snoo” a humanoid alien with an angler-fish-like antenna. They’ve recently begun to allow personalized Snoos (called Snoovatars) with customizable skin colors, hair types, and accessories. Since the start of the pandemic, Reddit has added a mask accessory.
Given the controversy surrounding masks, much of which appears to fall along party lines, I thought it would be interesting to see how masking behavior differs across the different Subreddits.
I got ~80,000 Snoovatars tied to Redditor accounts (I will be using “Snoovatar” and “Redditor” interchangeably), and labeled them as “masked” (4,329 Snoos) and “unmasked” (77,149 Snoos). I then determined which Subreddits had the highest and lowest masking frequency among their posters, and if Subreddits representing states that voted for Biden had more masked users than those that voted for Trump.
IMPORTANT CAVEAT: Reddit is a big place! 80,000 Snoovatars/Redditors is a lot, and I think you’ll find I’m picking up some trends, and while this MAY be representative, it is by NO MEANS conclusive.
Mask Concentration by Subreddit
Across the ~80,000 Snoovatars, there were posts to 147,915 Subreddits, 92% of which had posts from 19 Snoovatars or fewer, which I ignored for the Subreddit analysis. The distribution of mask wearing across the remaining 12,422 subreddits is bi-modal, with most Subreddits having no masks, while the rest is a skewed, long-tailed distribution:
Eighty-two subreddits had a masking concentration >=20%, with 23 subreddits having a masking concentration >=25%:
TABG - 33%
GamerGhazi - 30%
Leica - 30%
antimaskers - 29%
McFarlaneFigures - 29%
ResidentEvil2Remake - 29%
lupinthe3rd - 29%
PokemonSwordShield - 27%
NFCNorthMemeWar - 27%
emacs - 26%
fresno - 26%
CATHELP - 26%
AsABlackMan - 26%
signal - 26%
SoundsLikeMusic - 25%
skyrimvr - 25%
FifaMobile - 25%
FoodLosAngeles - 25%
megaconstrux - 25%
RussiaLago - 25%
CalPolyPomona - 25%
htgawm - 25%
McLounge - 25%
While some of these top Subreddits (antimaskers is a Subreddit dedicated to pro-masking) made intuitive sense, I was surprised to find such a heavy enrichment of video-game-related subreddits. I was curious to see what the masking concentration was on a smattering of Subreddits with strong opinions, ranging from a strongly pro-Trump Subreddit (TheTrumpZone), to a very liberal Subreddit (AOC), with a few other religious Subreddits in the mix. Unsurprisingly, AOC had the greatest fraction of masks of the Subreddits, whereas TheTrumpZone, and other conservative subreddits tended to have fewer masks than the overall Subreddit average:
This suggests that perhaps we can use the masking concentration as a proxy for how liberal/conservative a Subreddit is. To see if this trend held across other Subreddits, I plotted the masking concentrations for Subreddits centered on several contentious issues, the first being subreddits with the word “Trump” in their name:
TheTrumpZone is the only pro-Trump Trump-related Subreddit, and it’s the only Subreddit in the collection with below-average masking concentration. Next, looking at Subreddits with the word “Covid” in their name:
With the exception of covidlonghaulers, we see a similar trend, as ChurchOfCOVID is strongly anti-vaccine, and downplays COVID’s impact, whereas CovidVaccine has more pro-vaccine content, though it’s predominantly about self-reported negative reactions to the COVID vaccine. I then looked at Subreddits with the word “politic” in the title:
Again, many conservative subreddits fell generously below the average mask fraction, whereas liberal subreddits tended to have a higher masking concentration. At first I was surprised to find TexasPolitics had the highest concentration, though it became clear that it is a very liberal Subreddit, and not representative of the state’s conservative bent. Finally, I plotted the subreddits with either “vacc” or “vaxx” in the title:
Again, we get a similar result, with the Subreddits having lower masking fractions (CovidVaccine and DebateVaccines) being far more vaccine-hesitant than the Subreddits with a higher Snoovatar masking fraction.
Mask Concentration by State Subreddit
I was curious to see whether “Blue” states had a higher fraction of masks than “Red” States, (red or blue being defined by the margin of victory in the 2020 election, data gotten from the NYT). I got the percentage of masked Snoovatars by state Subreddit, normalized by the total number of Snoovatars for the state Subreddit, and plotted the result. I only did this for contiguous states, making it easier to see in the plot. On the left is the 2020 election result, and on the right is the relative Snoovatar masking concentration by state Subreddit:
Of the 48 contiguous states, South Dakota is (by far) the most masked, and Alabama is the least (not a single mask across 53 Snoovatars). There is no discernible pattern to the masking concentration, nor is there any correlation correlation between Biden’s margin of victory and the level of mask-wearing by Snoovatars:
Furthermore, if you break states into “Blue” states (Biden win) and “Red” states (Trump win), there’s no statistically significant difference in the mask-wearing in the two groups (even if you ignore South Dakota, the p-value is only 0.07):
It’s worth noting that because there are only fifty states and the variance is high, the effect size would have to be large for us to detect it. The fact that TexasPolitics is a liberal subreddit suggests that voting pattern may not be indicative of Subreddit bent. I used Machine Learning (ML) to determine the political skew of the Subreddit. Briefly, (note, these sample sizes are very small, so this result is highly preliminary) I pulled 100 current, hot posts from each Subreddit, encoded their titles using Sentence Bert, and used a logistic regression model trained on 200,000 labeled Republican/Democrat tweets on the title embeddings (note - the model’s not much better than a coin-flip, with an accuracy ~70%). I kept the titles with a class probability >=90%, and split the state Subreddits into more conservative or liberal depending on their liberal/conservative posts. Remarkably, this created a very similar map to the actual 2020 election (middle plot vs. left plot), suggesting that most state Subreddits align with their actual state’s political ideology. Due to the similarity to the electoral map, it had no relationship with the masking map.
Clearly, the distribution of masked vs. unmasked subreddits is broad, and the signal is weak. Most subreddits have very few unique posters, of those with 20 or more Snoovatars most have zero masked posters, the highest concentration of masked posters (for common subreddits) is only 1/3, and the average mask concentration is ~6%. Because the signal is weak, it’s important to be careful about our conclusions!
However, we are able to find recurring patterns in the data. It appears that conservative Subreddits and Subreddits related to Trump, COVID disbelief and vaccine hesitancy tend to have posts from fewer individuals who have masked Snoovatars whereas liberal Subreddits and Subreddits related to pro-vaccine topics tend to have posts from a larger fraction of masked Snoovatars. While not surprising, it’s important to point out that this was not an exhaustive analysis, and there are certainly other latent variables worth exploring.
Finally, there seems to be no connection between a state’s voting pattern and their masking concentration. I hypothesize that this is because the topics in a given state Subreddit are far more broad than politics and COVID (e.g. they may encompass state images, state events, state tips, sports events, etc), and therefore the trend is likely too weak for something with single-digit signal to pick up. However, it IS true that we can (nearly) replicate the 2020 voting map with only 100 post titles from each Subreddit, which throws doubt on this hypothesis.