In January 2021, the GameStop shares traded on the New York Stock Exchange experienced a classic ‘short squeeze’ [1,2]. As the price sharply jumped higher, traders who had bet that its price would fall (i.e. who ‘shorted’ it) were forced to buy it in order to prevent even greater losses, thus further promoting the price rally [2–4]. Victims of the squeeze were professional hedge funds, and particularly Melvin Capital Management who lost 53% of its investments for a total estimated 4.5 billion USD . The short squeeze was initially and primarily triggered by users of the subreddit r/wallstreetbets (
These events garnered huge attention from the media, professionals and financial authorities. Notably, the US Treasury Secretary Janet Yellen convened a meeting of financial regulators including the heads of the Securities and Exchange Commission, Federal Reserve, Federal Reserve Bank of New York and the Commodity Futures Trading Commission to examine the GameStop squeeze . Cindicator Capital, a fund specialized in digital assets, published a hiring call for a sentiment trader (i.e. a trader trying to gain an advantage by reading the signals about how other investors are feeling about a particular stock) with 3 years of active trading experience and having been a member of WallStreetBets for more than a year with karma—a Reddit measure of ‘how much good the user has done’ for the community—of more than 1000 . Finally, the House Committee on Financial Services of the US Congress held a hearing titled Game Stopped? Who Wins and Loses When Short Sellers, Social Media, and Retail Investors Collide to discuss the events . They called as witness Reddit user Keith Gill, known as
In this paper, we analyse discussions on
|a||8 Dec 2020|
|b||new board||11 Jan 2021|
|c||Citron prediction||19 Jan 2021||Citron Research, a popular stock commentary website, published a piece predicting the value of the
|d||Elon Musk’s tweet||26 Jan 2021||Business magnate Elon Musk tweeted ‘Gamestonk!!’ along with a link to
We show that a sustained commitment activity systematically pre-dates the increase of GameStop share returns, while simple measures of public attention towards the phenomenon cannot predict the share increase. Additionally, we also show that the success of the squeeze operation determines a growth of the social identity of
1.1. The GameStop saga
By the end of January 2021, Melvin Capital, which had heavily shorted GameStop, declared to have covered its short position (i.e. closed it by buying the underlying stock). As a result, it lost 30% of its value since the start of 2021, and suffered a loss of 53% of its investments, i.e. more than 4 billion USD.
Reddit is a public discussion website structured in an ever-growing set of independent subreddits dedicated to a broad range of topics. Users can submit new posts to any subreddit, and other users can add comments to existing posts or comments, thus creating nested conversation threads. One such subreddit is
The topics of discussion on the forum are varied, but there are some common patterns of behaviours which are also described in the FAQ . When submitting a post, a user can apply a category tag called ‘flair’, which serves as an indication of its content. The allowed flairs, together with a short description, are reported in table 2. The community takes flairs seriously and strictly enforces them (e.g. the FAQ report that misusing important flairs can lead to getting permanently banned). It is thus very common to find posts containing screenshots of an open position on a risky bet tagged with a YOLO flair, all interspersed with unhinged humorous posts and memes.
|YOLO (You Only Live Once)||YOLO flair is for dank trades only. The minimum value at risk must be at least $10 000 in options, or $25 000 in equity.|
|DD (due diligence)||The research you have done on a specific company/sector/trade idea. This is a high effort text post. It should include sources and citations. It should be a long post and not just a link to a submission.|
|discussion||An idea or article that you would like to talk about. Needs to be more involved than ‘up or down today?’|
|gain||Use this flair to show off a solid winning trade. Minimum gain is $2500 for options, $10 000 for shares. You must show or explain your trade. If you have to say something like ‘position in comments’ then it’s a bad screenshot.|
|loss||Show off a brutal, crushing loss. Minimum loss $2500 for options, $10 000 for shares. You must show or explain your trade. If you have to say something like ‘position in comments’ then it’s a bad screenshot.|
The discussion within the subreddit follows a simple post-comment dynamic, where each post separately grows its multi-level comment tree. Each interaction, it being a post or a comment, can additionally receive ‘upvotes’ and ‘downvotes’. While ‘upvoting’ or ‘downvoting’ represents a typical ‘slacktivist’ practice for anonymously expressing one’s position, other users can also choose to ‘award’ prizes to more emphatically recognize a post or comment.
2.1. Collective attention, commitment and identity on WSB
In this work, we mark as commitment events the posts in which the authors provided proof of their financial stake in the GameStop stock. Commitment is established by including specific flair categories (YOLO, gain and loss) and by using computer vision techniques to identify relevant screenshots taken from any online trading applications and posted on
Figure 1 compares the daily returns of
The posting activity in the
), the activity grows noticeably. Posting activity raises exponentially after the second price spike on 26 January (
) and it culminates on 28 January, 2 days after the stock evaluation reached its maximum. Public attention to the
The evolution of commitment over time differs considerably from the growth of collective attention. Figure 1b shows the number of daily commitment events measured by counting ‘gain’, ‘loss’, ‘YOLO’ posts (i.e. posts with one of these flairs) and the screenshots that
The presence of commitment in the absence of returns raises the question of whether commitment was supported by other processes endogenous to the
2.2. Commitment and reach of core versus peripheral authors
The sustained flow of commitment events during the weeks preceding stock price surge indicates the presence of a minority of committed users. As the interaction between the committed minority and the rest of the community is crucial to the success of a collective action [10,13,14], we study the dynamics of social interactions between committed individuals and other
We quantify this structural change by reconstructing networks over a rolling time window of 7 days, and looking at the evolution of two key topological quantities of these networks in time. First, we observe that the heterogeneity of the distribution of the nodes’ out-degree (i.e. the number of different users each user replies to)  increases threefold in the span of 20 days after event c, thus reflecting the simultaneous emergence of super-hubs of discussion together with users engaging only in isolated interactions. Second, the direct reciprocity of interaction (i.e. the fraction of replies that are reciprocated within the time window considered) gets roughly halved in the same time span (figure 2c). This signal, combined with the increase in expressions of group identity (figure 1c), is compatible with the emergence of generalized reciprocity , a norm according to which individual messages are not expected to receive direct responses; comments are not perceived as pieces of a conversation but rather as contributions to a collective discussion from which everyone benefits. Indeed, the increase in size of the discussion naturally leads to a more fragmented conversation where direct replies are less frequent, and thus to a decrease in reciprocity.
The complex and dynamic nature of the social network raises the question of what is the typical position of committed users in the network, and whether this position changes over time. To answer this question, we operationalize the notion of network position with the concept of Dk,l–core shell [28,29]: the set of nodes in which every node is connected with other members of the set with at least k outgoing links and l incoming ones. This measure is a good indicator of a node’s centrality because it directly gauges embeddedness (the density of connections around it), and it is a good proxy for reachability (how quickly it can be reached from any other node of the network).
For each temporal slice of the network, we perform its Dk,l-core decomposition (see Methods), and we measure the level of commitment exhibited in each Dk,l=k-core shell. Borrowing from previous work , we estimate the potential influence that commitment events have on the community at large by measuring not only the volume of commitment events in a shell but also the number of people that these events reach—namely the number of
Figure 3 shows how commitment activity and its reach are distributed between users in the core of the network (high D-core shells) and peripheral users (low D-core shells), as a function of time. First, in figure 3a,b, we show the fraction of commitment activity and reach that are generated by nodes belonging to an increasingly large number of D-core shells, taken from the core to the periphery. To disentangle the effect of the network’s evolution from the distribution of commitment and reach on the network, and meaningfully compare commitment distribution over networks reconstructed in different periods, we contrast the observed commitment activity and its reach with a null model benchmark which preserves the network’s topology but randomizes the distribution of commitment over it (see Methods). In figure 3a,b, the curve being higher (lower) than the benchmark indicates that commitment volume or reach are generated predominantly by nodes in the core (periphery) of the network. For example, in the network of interactions between 19 December and 26 December, central nodes are those who pledge more commitment to the
To get a comprehensive picture of the coreness of committed users over time, we measure the difference between the area below the observed curve and the area below the benchmark curve at a given temporal slice, for all the slices computed on networks reconstructed by using a rolling time window of 7 days. Results are robust to the slicing strategy chosen for constructing the networks (see electronic supplementary material, §A.6, figures S7 and S8), to the size of the slices (see electronic supplementary material, §A.7, figures S9 and S10), and to D-core selection (see electronic supplementary material, §A.8, figures S11 and S12). Figure 3c shows the value of such difference as a function of time. Relative to the benchmark model, both commitment and reach are concentrated in the network’s core until event c (19 January). From that moment onward, the commitment activity obtains a larger reach within the periphery. While always remaining more concentrated in the core, the commitment activity spreads more and more towards the periphery following event c. Therefore, the committed minority which may have triggered the first price increase in the
In this paper, we showed that the collective action originated on Reddit and culminated in the successful short squeeze of GameStop shares was driven by a small number of committed individuals. We operationalized financial commitment on Reddit as providing proof of stakes in a given asset, often in the form of a screenshot. We then showed that events of commitment pre-dated the initial surge in price, which in turn attracted more participants to the GameStop discussion and thus triggered new events of commitment. Finally, we described how initial committed users were part of the core of the network of Reddit conversations, and that the social identity of the broader group of Reddit users grew as the collective action unfolds.
Our study focused on a single, unprecedented, event of financial collective action. While this is certainly a limitation, as more events would allow us to corroborate or falsify our findings, a prompt investigation of the GameStop events was in order. The main contribution of our paper is framing a novel collective coordination phenomenon of unprecedented nature within a well-established sociological framework. While we do not explicitly validate any hypothesis, we discard other possibly competing explanations, such as social identity playing a role in the coordination dynamics.
To this aim, we leverage well-established methods from machine learning and network science, to introduce a novel approach to study financial coordination on social media. Specifically: (i) we operationalize financial commitment of Reddit users by classifying their posts (we produce a new ground truth and train a new classifier to do so) and (ii) we analyse the position of committed user in the discussion network and introduce a new network null model to assess fairly any core-periphery shift over time. While the way in which we classify commitment is specific to an individual event, the methods we propose generalize to any platforms in which commitment and social interactions can be measured.
The events that unfolded over the course of the few weeks that we analysed in our study caused sustained effects on the market. Seven months later (at the time of writing this manuscript), the value of the GameStop stocks had risen by 1000% compared with the beginning of 2021. The price increase inflicted enormous financial losses to multiple hedge funds, one of which was forced to shut down .
The influence of retail investors in equity markets is rapidly growing, and now accounts for almost as much volume as hedge and mutual funds combined . This rise has been mainly driven by the emergence of commission-free trading platforms that offer the possibility to trade fractions of shares, so that users can start trading even with very small amounts. Moreover, these platforms allow investors to use leverage, by buying and selling options and accessing to cheap margin loans from brokerages, in a gamified user experience. This ‘democratization of trading and investing’ is unlikely to disappear any time soon , so other financial collective actions might be coordinated in the future, possibly through different social media channels.
In this perspective, beyond the role of committed individuals in promoting the coordinated action, our findings have other potential implications to be tested in future research. (i) The fact that initial committed individuals were part of the core of the Reddit discussions implies that the system may be resilient against adversarial attacks where freshly created ‘committed’ bots try to influence the community. (ii) The finding that identity was not the driver of the collective action but, on the contrary, a by-product of it may imply that successive actions that leverage it might be easier to coordinate. (iii) The change in network structure ensuing from the arrival of new users, who joined the discussion motivated by the initial success of the squeeze, and the corresponding shift of the bulk of commitment and reach from the core to the periphery of the network, highlights the role of the system’s openness and the hierarchies that catalyse a successful collective action.
Taken together, our findings highlight that financial collective action cannot be reduced to the impact of social coordination on financial markets. The effect—and, particularly, the success—of an action have profound consequences on the membership, structure and dynamics of the original group, whose evolution may have in its turn consequences on future actions. Thus, the initial committed individuals trigger a behavioural cascade which is self-sustaining and transforms the group itself. More events and data are needed to clarify this interplay between bottom-up processes of social coordination and financial markets, and this is a direction for future work. Our results represent a first step in this direction, and we anticipate that, as financial collective action is expected to acquire even more importance in the future, they will be of interest to researchers, industry professionals and regulators.
We used two main sources of data: the activity on the subreddit r/wallstreetbets and the price of GameStop shares, ticker
Reddit is organized in communities, called subreddits, that share a common topic and a specific set of rules. Users subscribe to subreddits, which contribute to the news feed of the user (their home) with new posts. Inside each subreddit, a user can publish posts (also called ‘submissions’), or comment on other posts and comments, thus creating trees of discussion that grow over time. Users can attach flairs to posts: a set of community-defined tags to define the semantic scope of the post, thus facilitating content search and filtering. Users can assign awards to posts or comments to recognize their value. Awards are sold by Reddit for money, they come in a variety of types, and some of them reward the recipient with money or perks such as access to exclusive subreddits.
We collected all posts and comments submitted to the r/wallstreetbets subreddit from 1 January 2016 up to the beginning of February 2021. We did so by querying the Pushshift API , which stores all Reddit activity over time—using the PMAW wrapper  (see electronic supplementary material, §A.1 for more details). The API returns rich metadata, including the timestamp of submission, the identity of the authors, its text content, and the awards each submission and comment received. In total, we retrieved 1 132 897 posts and 29 566 180 comments submitted to the subreddit by 1 364 080 different authors. We specialize only to posts related to GME by searching for posts containing either in the title or in the text-body the word ‘GME’ or ‘Gamestop’ (lowercase occurrences included) and all the comment trees associated with those submissions. This selected set consists of 129 731 posts and 2 575 742 comments. The period over which our study focuses its attention (from 27 November 2020 to 3 February 2021) includes 99% of the posts and 98% of the comments submitted since 1 January 2016 until 3 February 2021.
We retrieved GameStop daily prices from Yahoo Finance, using the Python library yfinance , and computed the daily price return as the daily relative change, r(t) = (p(t)/p(t − 1)) − 1, where p(t) is the Open price at day t.
4.2. Quantifying commitment
One of the widely shared norms in the
We use three flairs to mark posts containing a proof of position: the gain and loss flairs mark gains or losses for a minimum of 2500 USD, and the YOLO flair indicates investment positions with a minimum value at risk of 10 000 USD. Flair-tagged submissions are moderated and are approved only if a relevant screenshot is attached. While it is mandatory for users to attach investment screenshots to have flairs approved, they can also attach screenshots to their submissions without using any flair.
As we are interested in capturing any signal of commitment, regardless of their magnitude, we resort to machine vision to identify commitment screenshots based on their visual content only. We retrieve all the screenshots attached to any of the submissions in our dataset by querying all URLs terminating with common image extensions (e.g. .png, .jpg). Out of this set, we randomly sample 3745 images and manually inspect them. We mark as positive all the screenshots which display gains, losses or orders, and as negative all the remaining images, which include a broad variety of content ranging from screenshots of stock prices to memes.
We label 1042 positive examples and 2703 negative examples. We use this set of labelled images to train a supervised model. Among several classifiers available off-the-shelf that we test, the most accurate is a PyTorch  implementation of DenseNet , a deep neural network architecture designed for image classification. We initialize DenseNet with weights pre-trained on ImageNet , a widely used reference dataset of 1.2 million labelled images. We then fine-tune the neural network (i.e. update its weights) by training it further by feeding it 70% of our labelled images. During fine-tuning, we use the Adam optimizer  to minimize cross-entropy loss. We then measure the classifier’s performance on the remaining 30% of the examples by using precision (the fraction of pictures that the classifier labelled as positive that are actually positive), recall (the fraction of positive pictures that the classifier labelled correctly) and F1 score (the harmonic mean between precision and recall). On our validation set, the classifier achieves a precision of 0.85, a recall of 0.73 and an F1 score of 0.77.
We run the classifier on all images from
|event type||count||unique count||authors||unique authors|
|YOLO||23 230||21 455||20 107||18 484|
4.3. Quantifying identity
To capture linguistic expression of identity, we use two methods. First, we resort to a simple word count approach using linguistic inquiry word count (LIWC). LIWC is a lexicon of words grouped into categories that reflect social processes, emotions and basic functions. It is based on the premise that the words people use provide clues to their psychological states. In particular, the abundant use of words in the LIWC category we (i.e. first-person plural subject pronoun) related to the use of words from the LIWC category I (i.e. first-person singular subject pronoun) is a validated indicator of group identity . Therefore, we measure identity as the fraction of pronoun we against the number of both we and I pronouns occurring in each submission text body. The results obtained with this particular estimator of identity are robust when compared with two alternative methods, which we discuss in the electronic supplementary material.
4.4. Discussion network on WSB
We reconstructed the network of social interactions on
The time at which posts and comments are published can be used to obtain a description of social interaction dynamics. We modelled such dynamic through temporal slicing. In particular, we considered a rolling time window of 7 days and shift it by 2 hours throughout the whole timespan of our dataset, for a total of 1092 windows. For each time window, we constructed a network using posts and comments published during that time window. We tested alternative temporal slicing strategies, and discussed them in electronic supplementary material, §A.6 as well as different sizes of the window with which networks are built in electronic supplementary material, §A.7.
For each slice, we characterized nodes with a number of features, including their age (the time elapsed since their first interaction within the community), in- or out-degree (number of incoming or outgoing edges), their commitment (number of commitment events) or the reach of their commitment (number of users who comment on their commitment events). We also ran Dk,l-core decomposition [28,41] on the network of each temporal slice. The algorithm partitions nodes by their core shell (or core number), i.e. the shell k, l, defined as the maximal subgraph in which every vertex has at least out-degree k and in-degree l. The Dk,l-core decomposition algorithm takes edge directionality into account, but also provides a wider space of cores to explore. To this extent, in the main manuscript, we explore the case in which Dk,l=k while in the electronic supplementary material, we complement the analysis by providing the results for Dk,0, Dk,1, D0,l and D1,l (see electronic supplementary material, §A.8).
4.4.1. Null model for random commitment activity
When computing the commitment of nodes as a function of their core number, to assess if committed users are more central or peripheral in the network, it is important to compare with a null model which takes into account the network’s topology. For this reason, we consider a null model of random commitment in which committed events are reshuffled randomly over the whole network, while the network’s structure is preserved. The empirical commitment of nodes with core number k is then compared with a uniform distribution of commitment across nodes, which is equivalent to averaging the results over an infinite number of random shuffles.
All data used in this work are publicly available on Reddit and can be accessed via the Reddit API service (https://www.reddit.com/wiki/api-terms). Code and processed data to reproduce figures are made available at https://doi.org/10.5281/zenodo.5783894.
L.L.: conceptualization, formal analysis, investigation, methodology, software, writing—original draft, writing—review and editing; L.M.A.: conceptualization, formal analysis, investigation, methodology, software, writing—original draft, writing review and editing; L.A.: conceptualization, formal analysis, investigation, methodology, software, writing original draft, writing—review and editing; G.D.F.M.: conceptualization, formal analysis, investigation, methodology, writing—original draft, writing—review and editing; M.S.: conceptualization, investigation, methodology, validation, writing-original draft, writing—review and editing; A.B.: conceptualization, investigation, methodology, validation, writing-original draft, writing—review and editing.
All authors gave final approval for publication and agreed to be held accountable for the work performed therein.
We declare we have no competing interests.
L.M.A. is partly supported by the Carlsberg Foundation through the COCOONS project (CF21-0432). G.D.F.M. and M.S. acknowledge support from Intesa San Paolo Innovation Center. A.B. acknowledges the 100683EPID Project “Global Health Security Academic Research Coalition” SCH-00001-3391.
Electronic supplementary material is available online at https://doi.org/10.6084/m9.figshare.c.5915102.
© 2022 The Authors.
Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/, which permits unrestricted use, provided the original author and source are credited.
Barberis N, Thaler R. 2005A survey of behavioral finance. Princeton, NJ: Princeton University Press. Google Scholar
GameStop short squeeze—Wikipedia. 2021 See https://en.wikipedia.org/wiki/GameStop_short_squeeze (accessed on 14 June 2021). Google Scholar
2 short sellers admit defeat, bail out at huge loss as GameStop share surge hits 1000%, CBC News. 2021 See https://www.cbc.ca/news/business/gamestop-wednesday-1.5889652 (accessed on 14 June 2021). Google Scholar
Chung J. Melvin capital lost 53% in January, hurt by GameStop and other bets—WSJ. See https://www.wsj.com/articles/melvin-capital-lost-53-in-january-hurt-by-gamestop-and-other-bets-11612103117 (accessed on 14 June 2021). Google Scholar
Lawder D, Hunnicutt T. 2021Exclusive: treasury’s Yellen calls top regulator meeting on GameStop volatility, consults ethics lawyer. Reuters. See https://www.reuters.com/article/us-usa-treasury-yellen-gamestop-exclusiv-idUSKBN2A306A (accessed on 5 February 2021). Google Scholar
Kochkodin B. 2021Quant fund looks to wallstreetbets to hire sentiment traders. See https://www.bloombergquint.com/markets/quant-fund-looks-to-wallstreetbets-to-hire-sentiment-traders. Google Scholar
Game stopped? Who wins and loses when short sellers, social media, and retail investors collide, Congress.gov, library of congress. See https://www.congress.gov/event/117th-congress/house-event/111207?s=1&r=1 (accessed on 14 July 2021). Google Scholar
Democrats eye game-like trading apps at house hearing on markets—bloomberg. See https://www.bloomberg.com/news/articles/2021-03-17/democrats-eye-game-like-trading-apps-at-house-hearing-on-markets (accessed on 20 December 2021). Google Scholar
Schelling TC. 2006Micromotives and macrobehavior. New York, NY: WW Norton & Company. Google Scholar
Tajfel H, Turner JC, Austin WG, Worchel S. 1979An integrative theory of intergroup conflict. Organ. Identity Read. 56, 9780203505984-16. Google Scholar
Mach Z. 1993Symbols, conflict, and identity: essays in political anthropology. Albany, NY: SUNY Press. Google Scholar
Cook-Huffman C. 2008The role of identity in conflict. In Handbook of conflict analysis and resolution, vol. 19. London, UK: Routledge. Google Scholar
J Newman ME. 2010Networks: an introduction. Oxford, UK: Oxford University Press. Crossref, Google Scholar
Fletcher L. 2021Hedge fund that bet against GameStop shuts down. See https://www.ft.com/content/397bdbe9-f257-4ca6-b600-1756804517b6 (accessed on 26 April 2021). Google Scholar
Rise of the retail army: the amateur traders transforming markets. See https://www.ft.com/content/7a91e3ea-b9ec-4611-9a03-a8dd3b8bddb5 (accessed on 20 December 2021). Google Scholar
Baumgartner J, Zannettou S, Keegan B, Squire M, Blackburn J. 2020The pushshift reddit dataset. In Proc. of the Int. AAAI Conf. on Web and Social Media, vol. 14, pp. 830–839. Burnaby, Canada: AAAI, PKP Publishing Services. Google Scholar
Paszke A2019Pytorch: an imperative style, high-performance deep learning library. In Advances in neural information processing systems (eds H Wallach, H Larochelle, A Beygelzimer, F d’Alché-Buc, E Fox, R Garnett), vol. 32. Red Hook, NY: Curran Associates, Inc. Google Scholar
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. 2017Densely connected convolutional networks. In Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, pp. 4700–4708. Manhattan, NY: IEEE. Google Scholar
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. 2009Imagenet: a large-scale hierarchical image database. In 2009 IEEE Conf. on Computer Vision and Pattern Recognition, pp. 248–255. Manhattan, NY: IEEE. Google Scholar
Wasserman S, Faust K. 1994Social network analysis: methods and applications. Cambridge, UK: Cambridge University Press. Crossref, Google Scholar