Health on social media: mediatization of subjective experience?

Article information

Health New Media Res. 2021;5(2):220-250
Publication date (electronic) : 2021 December 31
doi : https://doi.org/10.22720/hnmr.2020.5.2.220
1Institute for Social research
2Institute for Social research
Address correspondence to Bernard Enjolras, Institute for Social research, Munthesgt 31, 0208 Oslo, Norway E-mail: bernard.enjolras@samfunnsforskning.no

Abstract

Social media arguably have transformed the way people communicate about health. Research has documented that social media offer a forum for the expression of subjective experience, feelings, and personal management of illness, marking a groundbreaking shift from illness as a private experience, to the personalization of public health debates. Most studies, however, are focused on specific patient groups and forums dedicated to health and do not cover a broader range of postings, political debate, and information sharing on social media. With this backdrop, this paper explores the bigger picture of health communication on social media by harnessing machine learning methods including supervised classification and topic modeling applied to 280,000 Norwegian social media posts published on Twitter, Online Forums, and Instagram during the period 2012-2018. The results show that only one-third of the posts can be characterized as personalized communication. Furthermore, there are important differences across social media platforms—the forums being most personalized and Twitter being less personalized. The topic analysis reveals that health communication on social media reflects three sets of concerns—illness or health conditions, the health care system and its professionals, and life-styles issues—that display different levels of personalization across platforms and over time.

Introduction

Concern about health and illness is a fundamental part of life and society. Media representations play a fundamental role both for people’s perceptions of health and their health-related behavior (Barry et al., 2011; Lalazaryan et al., 2014). The rise of social media in the twenty-first century’s second decade has transformed the ecosystem of communication. Whereas the traditional news media previously was a main source of popularized health information, studies confirm that social media platforms now take a center stage as providers of knowledge, advice, networks, and support related to health. They supplement but do also partly replace traditional media outlets (Reuters, 2018). In particular, one aspect of the representation of health on social media has attracted scholarly attention. An extant field of research has documented that subjective experiences and personal management of illness abound in different types of forums, chat rooms, discussion threads, and support groups (e.g. Balfe et al., 2017; Househ, 2011; Myrick et al., 2016; Roblin, 2011; Seale, 2003; Sosnowy, 2014; Wentzer & Bygholm, 2013).

It is argued that this personalization of health marks a groundbreaking transition. With social media as a catalyst, illness has changed location from a largely private to an increasingly public experience (Conrad et al., 2016; 2018), where media platforms offer new resources for self-narration (identity maintenance through narrative), self-representation, and self-maintenance) (Couldry & Hepp, 2017). As such, social media have provoked an increasing blurring of the boundaries delimiting the public and the private spheres (ibid).

Most of these studies, however, by design, focus only on those practices that are directly tied to the sharing of personal experience. Few studies try to capture the variety of social media use or the composition of different types of health-related content on social media, ranging from lay individuals to health providers and health authorities interest groups and commercial enterprises (for exceptions, see Freberg et al., 2013; Lovejoy & Saxton, 2011; Moorhead et al., 2013). The opportunity to analyze the enormous quanta of social media posts through big data analysis and machine-assisted learning is rarely used.

Indeed, even if illness in some sense is, and always has been, a deeply personal experience, and the maintenance of health ultimately is dependent on individual choices, health is vitally also a social and political issue, closely related to socioeconomic factors and environmental contexts. How to understand and manage health and illness concerns different health policy alternatives and economic priorities related to different types of health service systems, public versus private spending, types of medical interventions and prevention programs as well as ethical reflections over the potential limits of medical technology and bodily interventions. These are approaches to health and illness that target the audience as citizens as opposed to merely patients and consumers, building on political and social frames in addition to lifestyle, consumer-oriented and biomedical frames (Hallin & Briggs, 2017).

Studies of health coverage in the traditional news media provide evidence that even if health news tend to be biased in certain directions (e.g a focus on acute illness rather than chronic conditions) and even if a long term trend of personalization of health coverage has been a significant characteristic also in traditional news media, the “health repertoire” of traditional news media carries a broad range of subjects and formats. They provide not only consumer-oriented coverage or personal experiences in the form of human-interest stories, but critical coverage and debates over health-related issues based on a citizen perspective such as disputes over the financing of health care, policy debates, and ethical dilemmas (Briggs & Hallin, 2016; Eide & Hernes, 1997; Seale, 2002).

For people today, keeping track of new posts in their social media feed has become just as much a part of their daily habits as following the news in mainstream media (Levy et al., 2014). In a hybrid media landscape, where stories circulate between platforms and media outlets, the distinction between traditional media and social media platforms is softening (Moe et al., 2019). Social media seems to be both very personal and unique for every user while at the same time also serving as vital access to a steady stream of multipurpose information. On this background, this paper aims to explore the bigger picture of how health and illness are discussed on social media: What topics are central in the enormous corpus of postings, sharing, commenting, and liking on social media related to health and illness? We argue, following De Maggio et al. (2013), that this type of aggregate large N study is needed in order to discuss and test meta sociological theories related to media and health, both the arguments holding that new media have become an increasingly vital part of the larger media discourse and to what degree a proposed movement toward personalization and emotionalization of health and illness experience can be empirically observed across a large collection of social media posts.

More precisely, the paper answers the following questions: What kind of messages do social media users post on social media when they communicate about health, how did health communication evolve over the period 2012-2018 and across the main social media platforms? To what extent does personalization characterize health communication on social media over time?

To answer these questions, we rely on a large collection of social media posts across different social media platforms (big data) and harnesses machine learning techniques– supervised classification and topic modeling – to make sense of the data. As emphasized by DiMaggio et al. (2013), textual analysis has traditionally played a central role in the study of culture. The digitization of huge quantities of texts and the development of machine learning techniques have opened up new possibilities for the study of culture through machine learning algorithms. Topic modeling, in particular, appears well suited for this purpose. DiMaggio et al. (2013) identify several strengths of this approach for the study of culture. First, topics may be viewed as “frames” or semantic contexts that favor particular interpretations. Second, topic modeling captures polysemy and disambiguate terms on the basis of their context (other terms), insofar as meaning is relational. Third, topic modeling captures the fact that single texts are often characterized by “heteroglossia”, i.e. the co-presence of competing ‘‘voices’’ or perspectives.

The paper is structured as follows. The first section introduces the theoretical framework linking social media, processes of mediation, and the increasing personalization of health communication in public. The second section presents the data and the methods harnessed for the analysis of a large corpus of social media posts. The third section presents the results of the topic modeling analysis and the regression analysis of the relationship between topics and personalization of health communication over time. The last section discusses the results and provides a conclusion.

Literature Review

Analytical framework

The most recent literature on media discourses of health does reflect the digital transformation of communication and the advent of social media platforms. These studies focus mainly on the potential of social media for individual support and patient groups (Barker, 2008; Ziebland & Wyke, 2012), health information seeking (Anker et al., 2011), and the impact of internet use on disease experience and patient-doctor relationship (Broom, 2005). Another question is to what degree social media buttress or strengthen this tendency, and further, to what extent social media focus on health as a strictly personal question, and in that sense reduce the many dimensions of health in the public debate, as discussed below.

Until recently, the research on media representations of health concerned the coverage in traditional mass media, like television, newspapers and magazines. A wide range of studies have investigated how the news media cover health. One central line of research, within the health communication field, has focused on how the media distort health information through overly simplified or sensationalist reports favoring dramatic and acute illness over more widespread chronic or nonlife-threatening diseases (Seale, 2002). Many studies have focused on the coverage of single diseases like cancer, AIDS, or influenza pandemics (e.g. Clarke & Everest, 2006; Lupton, 2013). Broader content analysis of health coverage is not widespread, but those present confirm that health news provides a broad menu to the audience, including different genres, formats and focuses, and a wide array of sources (Eide & Hernes, 1997; Hallin et al., 2013; PEW, 2008; Picard & Yeo, 2011; Stroobant et al., 2016; Stroobant et al., 2018). A general finding is that even though some type of health problems gets a disproportionally amount of coverage, the traditional news media cover a wide range of different types of health issues, from lifestyle-related challenges through to medical interventions and the critical coverage of public or private health services. Common diseases like cancer, heart and lung affliction, and more recently, mental diseases get heavy attention, spread in different sections and formats, including op-eds, debate, news and human-interest stories.

The coverage of health has traditionally been dominated by expert and elite sources from the medical profession and science. Yet, it is a trend that patient and consumer interest groups, along with ordinary persons such as patients and their kin constitute an increasingly important group of sources (Hinnant et al., 2013). In tandem with this trend, human interest stories with a focus on the experiences and feelings of those personally inflicted by a disease have increased their prevalence (Stroobant et al., 2016). As for the tendency to personalize issues of health, this has been a central characteristic of traditional news coverage of health for long before the rise of social media. Yet, knowledge about how social media contribute to the spread of individual and emotional experiences of health and illness is still scattered.

Social Media, Mediatization and the Personalization of Health Communication in Public

Insofar as social media are characterized by specific capabilities or affordance, we can expect that these affordances shape the nature of these interactions. Indeed, social media make available particular potentialities for communicative action, what Hepp (2013) calls “the molding force of the media”, i.e. their capacity of shaping action and practices. Two particular outcomes of social media affordances are the blurring of the boundaries between the private and the public as well as the transformation of the conditions of self-formation and maintenance. The combining influence of boundaries blurring and of new conditions for the maintenance of the self might be thought as paving the way for an increasing personalization and publicization of the expression of experiences related to health and illness on social media.

Social media have experienced tremendous development since the mid-2000s. Social media sites provide a digital architecture for interactive communication which is best described along with three types of integrated “affordances” (Boyd, 2010): profiles, friends lists, and tools of communication. Profiles constitute the space where gathering and conversation take place. Social media users control, to some extent, their profile by regulating who can access their profile. Profiles may be public (as is the case with Twitter) or semi-public (Facebook). Friends’ lists materialize and display publicly the social graph and the audience of the social media user. They have a social and strategic function: in choosing who to confirm as a friend, social media users consider both the costs and benefits of rejecting a person. Friends’ lists are the “imagined audience” or “public” of the social media user. Tools of communication allow generally public, semi-public, and private forms of communication. Public and semi-public tools of communication (such as comment on a person’s wall on Facebook, or addressing a tweet to a given user) enable mediated public encounters. In addition, these communication tools enable combinations of communicative patterns ranging from one-to-one and many-to-many. Another fundamental characteristic of social media is that they link people within a digital network. Social networks are important because individuals and groups derive benefits from their underlying social structure.

One of the powerful functions fulfilled by networks is to bridge the local and the global, allowing local phenomena to spread across the entire network and to produce global effects. However, this bridging ability is dependent upon the structural characteristics of the network. Social media are part of and constitute a powerful tool in enhancing a more general development generated by the digitalization of media referred to as “media convergence” by Jenkins (2006). A dimension of convergence concerns the alteration of the ways news and entertainment consumers relate interactively with media and are becoming part of media production. Social media users, because of the affordances of social media sites, are able, in a decentralized mode, to produce information (news or entertainment), to transform existing information, and circulate information. Whereas traditional ways of consuming media were passive, social media enable the active consumption and social sharing of media content.

The early days of internet and digital communication platforms were marked with great optimism about the future of a new and open public debate, involving the free float of information between democratized networks of users. As this technology has developed, steadily increasing its reach while refining affordances that tempt user activity and enable enormous revenues for platform companies, research has become cautious and critical (Van Dijk, 2014; Van Dijk & Poell, 2016). The tendency for social media to encourage postings and reactions that express emotional states and personal reactions is emphasized by central scholars within the research field. This is a general tendency and not restricted to health issues (Hermida, 2016). Yet, social media as a place for the sharing of a wider range of information and as a platform for political debate and different types of democratic or civic participation has also been widely discussed and scrutinized (Valenzuela, 2013). A development where social media increasingly has become central for news distribution and consumption has actualized the role of social media as an all-in-one public and personal sphere at once (Kümpel, Karnowski, & Keyling, 2015). Social media platforms have indeed become the place many people get access to news and a platform where both media organizations, interest groups, and ordinary users share news content (Costera Meijer & Groot Kormelink, 2015; Kalsnes & Larsson, 2018; Levy et al. 2018).

Boundary blurring

Social media sites hence become a continuation of private space (home) where we “invite” friends and share a common sociability. Paradoxically, however, these private inter-courses take place within a public or semi-public space, a fact that may serve to displace the distinction between the private and the public.

Personalization of health communication

The new wave of mediatization spurred by social media has contributed to the blurring of the boundary between the private and the public and the transformations of the conditions of self-formation and maintenance. Within such a mediatized context, we expect health communication mediated by social media to become increasingly personalized. Indeed, as put by Conrad and Stults (2010) “the Internet has changed the experience of illness” (p. 180). Social media, according to Conrad et al. (2016: 27), have transformed the experience of health and illness by (1) serving as an information source for patients, (2) in becoming a repository of experiential knowledge, (3) in facilitating communication and support among individuals affected by a particular condition, (4) in shaping social movements (e.g. advocacy) around illnesses and collective illness identity, and (5) in playing a role in the changing nature of the doctor-patient.

To this list, one might add that social media have opened a channel for representing in public the healthy or sick self. Several of these communicative practices suppose on behalf of social media users the public expression of subjective personal experience. Departing from (1) the documented increasing personalization of health communication that actually preceded social media in both news media and health campaigning and (2) the specific affordances of social media that functions as catalysts for personalized and emotionalized posts, we expect to find that health issues related to personal experiences will dominate on these platforms. An alternative hypothesis recognizes the limits of such personalized health discourses The multiple frames and focus of media representations of health in legacy media as well as the increasingly all-purpose and hybrid function of social media indicate that topics of health and illness also on social media will tap into dimensions related to critical debate, citizens perspectives and ethical reflection beyond personal experiences and witnessing. The remainder of this paper is devoted to inquiring into the extent to which these hypotheses are empirically corroborated by a large-scale analysis of social media communication.

Data and Method

We harness methods based on machine learning such as supervised classification and topic modeling applied to Norwegian social media data (Twitter, Online Forums, and Instagram) during the period 2012-2018, for discovering cultural patterns of social media use and communication about health across social media platforms.

Data Collection and Sampling

To access social media data we used the capabilities available through Crimson Hexagon® Platform. Crimson Hexagon® offers access to social media data as well as for analytic tools that allow accessing and processing large amounts of social media data (Facebook public pages, Twitter, Instagram, YouTube, online discussion forums), both historical data back to 2008 and contemporary data-streaming.

The platform allows customizing different monitors based on specific search terms. It offers capabilities for applying to the search results an algorithm to classify social media posts in different categories. The platform also enables uploading for further analyzes a selection of posts. The data consist of posts selected on the basis of a list of search terms and containing at least one of those terms: “health, health care, illness, cancer, medicine, hospital, healthy, healthy diet, health and care, healthy lifestyle, good health, public health, health policy, doctor, patient pharmaceutical industry. The data consists of social media posts written in Norwegian during the period January 1st, 2012 to September 26th, 2018, published on Twitter, Facebook open pages, Instagram, Blogs, Forums, and whose senders were located in Norway. Altogether the data is composed of 782 836 social media posts across six social media platforms (blogs, comments, forums, Twitter, Instagram, Facebook open pages). As shown by table 1, Twitter with 52.5% of posts, Forums with 41.7% of posts, and Instagram with 3.4% of posts represent the main social media sources containing at least one of our search terms. Blogs, comments, and Facebook open pages represent together less than 3% of all posts. Among the Forums, the most important ones are “kvinneguiden.no” (the forum of an online magazine for women) with 17% of posts, “vgd.no” (the forum of a national newspaper) with 11% of posts, “diskusjon.no” (the forum of a technical magazine) with 4% of posts, “baby.no” (the forum of the website on parenting) with 4% of posts, and “hegnar.no” (the forum of an online financial newspaper) with 1% of posts. The remaining 90 online forums constitutive of the source “Forum” represent together 4.7% of the posts.

Data Sources and Number of Posts by Source

Figure 1 shows the volume of health-related social media posts resulting from our search in Crimson Hexagon® databases varies over time but shows a growing tendency, especially since mid-2017. It is unclear to us whether this growth reflects an increasing interest in health issues on social media or whether it reflects the inclusion of new social media sources (especially Forums) in Crimson Hexagon® databases, due to improvements in data collection and the inclusion of new data sources.

Figure 1.

Volume of Health-Related Social Media Posts Over Time

In this article, we use machine learning, including a classification algorithm and topic models applied to social media data from three distinct platforms—Twitter, Online Forums, and Instagram—in order to identify patterns of social media use and communication related to health.

Classification

Text classification is a machine learning technique that involves organizing text documents in different categories based on particular features. It is common to distinguish between machine learning techniques using unsupervised and supervised learning. With supervised learning, the algorithm has been trained to identify the features that serve to categorize the material in different classes through a pre-coded training set. First, a quantity of material is manually coded into categories. Then, the algorithm uses the coded material (training set) to learn how to classify the material into these categories.

Crimson Hexagon® implements a supervised classification algorithm developed by Hopkins & King (2010) and freely available in R, (Hopkins et al., 2013). The algorithm is designed to estimate the proportion of documents belonging to certain categories in a corpus. It is based on a machine learning classification algorithm I, but unlike traditional classification algorithms, it uses a method to estimate the proportions of specific categories, taking into account incorrect classifications and correcting for bias. The purpose of the analysis is to classify social media posts about health into meaningful types. The first phase of the process consists in training the algorithm. To this purpose, we have coded manually the posts constituting the training set into two meaningful categories and an “off-topic” category for the posts that are not related to health and illness. The posts have been coded into three categories. The first category is labeled “Personal experience related to health”, which is personal in nature and contains posts that express a subjective experience related to health and illness. That includes telling about experienced symptoms or feeling when being sick, experience with the healthcare system and health personal, experience related to medical treatment, personal description of one’s illness, feeling of being healthy related to food or physical activity, etc. The common denominator of this category is the expression of a personal and subjective experience. The second category, “Non-personal messages” is composed of posts that do not express a subjective experience. These posts are often related to actual issues on the political agenda or in the public debate, to objective information, opinion, and advice about health issues, and are of impersonal nature. Finally, the posts that are not related to health and illness were coded “off-topic”.

Once the algorithm is trained, it is applied to the totality of the data set (from which the training and test sets are excluded). This results in an automatically classified data set according to the categories defined in the training process. To evaluate the results of a machine learning classification algorithm, it is common to measure the algorithm’s accuracy and recall. Given that a post can be classified as true positive (actually positive and classified as positive), false positive (actually negative but classified as positive), false negative (actually positive but classified as negative), or true negative (actually negative and classified as negative), the algorithm’s accuracy measures the proportion of true predictions. The algorithm applied to a pre-coded test set (a sample of posts) gave the following measures for accuracy: 88% for the category “Personal experience”, and 95% for the category “Non-personal messages about health-related issues”, giving an overall accuracy of 90.5%.

The purpose of the classification is to generate the dichotomous dependent variable that will be used later in this paper in the regression analysis in order to identify which topics are more personalized. Based on the classification algorithm, posts expressing personal health experiences constitute 37% of the posts, whereas the remaining posts belong to the category “Non-personal messages about health-related issues” and represent 63% of posts.

Figure 2 displays the evolution over time of the volume of social media posts according to these two categories. The volume of posts expressing personal health experience appears relatively stable over the main part of the period but increases significantly starting from 2017. The volume of non-personal posts appears to decrease slightly over time. Here too, it is unclear whether variations over time reflect variations in behavior or variations in the data-collection process.

Figure 2.

Proportion of social media posts by category over time

Since we have no certitude about the constancy over time of the data-collection process, especially the inclusion of new data sources in Crimson Hexagon® databases, we designed a procedure to keep the data sources constant over time in order to apply topic modeling to the data. We have selected a random sample of posts from the three most important social media sources, namely Twitter, Online Forums, and Instagram. For each month between January 2012 and September 2018, we have randomly sampled approximately 1000 posts monthly from each social media source. This yielded a sample of 284586 social media posts that constituted our corpus, i.e. the basis for the topic modeling analysis. Since the sample is randomly selected, it is representative of each data source for each month during the period. At the same time, since the sample is constant over time for each source and each source represents approximately 1/3 of the data, variations (in terms of category proportions and overtime) reflect changes in behavior and not changes affecting the collection of data. As shown in figure 3, the sampling procedure over-samples the year 2012 and under-samples the year 2018.

Figure 3.

Proportion of posts in the sample compared to the total number of posts by year

Topic modeling

Topic modeling is an algorithmic method for finding topics in large and unstructured text data collections. Topic modeling consists in applying a suite of algorithms to discover the hidden thematic structure in large collections of unlabeled textual data. The results of topic modeling algorithms can be used to summarize, visualize, explore, and interpret a corpus.

Topic modeling considers documents (in our case social media posts) as a mixture of topics, in which a topic is a probability distribution over words, allowing words with similar meanings to be clustered. Hence, a topic model takes a collection of texts as input and produces a set of “topics” (i.e., groups of words that are associated under a single theme) and assesses the degree to which each document exhibits those topics. The simplest topic model is latent Dirichlet allocation (LDA), which is a probabilistic model of texts assuming that there are a fixed number of patterns of terms tending to occur together in documents (topics) and that each document in the corpus exhibits the topics to a varying degree (Blei et al., 2003). A topic is a probability distribution over terms. In each topic, different sets of terms have a high probability, and the topics are typically represented by a list of those terms. A topic model finds the sets of terms that tend to occur together in the corpus. They constitute “topics” because terms that frequently occur together tend to be about the same subject. The objective of topic modeling is to extract latent semantic topics from large volumes of textual documents (i.e., corpora).

In our topic model analysis, we use the STM framework (Roberts et al. forthcoming). The STM is a mixed-membership topic model (like LDA) with extensions that facilitate the inclusion of document-level metadata. STM differs from other topic-modeling techniques like LDA in allowing document-level covariates to be included in the model as a method for pooling information (Nielsen et al., 2015). The inclusion of this information within the model facilitates hypothesis testing. For estimating the model, we use a software that is freely available in the R-package STM.

The topic modeling method presents at least several advantages compared to manual content analysis methods. First, whereas manual methods require a subjective assessment of the content of each document, topic modeling is based on objective and quantifiable criteria for identifying topics (probability distribution of words). Second, the algorithmic nature of the method and its automaticity allows researchers to process large quantities of documents. Third, manual methods suppose the use of an a priori coding scheme in order to guide the identification and classification of documents into topics. In contrast, topic modeling identifies latent topic categories that are present in the corpora, making the process replicable. Finally, the unit of analysis in topic modeling being the topic and not the document, topic modeling allows mapping the prevalence over time of the topics present in a corpora and their relative weights (number of documents with including a given topic divided by the total number of documents).

The topic analysis consists of four steps. Firstly, the data have been preprocessed— in order to exclude stop-words, numbers, punctuation, to convert characters to lower cases, and to stem the words having a common root— and the data frame linking the documents (social media posts) to the available metadata (classification of the post in terms of personalized posts and dates of the posts) has been constituted. Secondly, the documents were converted into a document-term matrix in order to proceed to the estimation of the topic model. Thirdly, we estimated the topic model using the R package STM. The fitted models were displayed and visualized using the R package stminsights. Finally, we validated, selected, and interpreted the topic model presented below, using the capabilities (diagnostic for the evaluation of different models, an inspection of terms and documents by topic, proportions for each topic, visualization plots) of the interactive application enabled by the R package stminsights.

Results

Topic Model: what are people communicating about?

Topic modeling allows us to identify across thousands of posts the main topics that people are concerned with when they communicate about health-related issues on social media. In the analyses reported here, each text is a social media post that includes at least one of the search terms related to health and illness. The analysis was conducted on a random sample of 284586 social media posts covering the period January 2012 - September 2018. By doing so we assume a single underlying structure characterizing discourse about health in illness across social media platforms. This enables us to examine variations across social media platforms and over time, at the expense of investigating variation in the topic structure across sources.

Table 2 displays the 15-topic solution, listing the highest-ranked terms for each topic, three sets of topics can be distinguished. The first set of topics is related to a given illness or condition and includes the following topics: topic 4 “Illness Women Children”, topic 12 “Cancer”, topic 6 “Mental Illness”, topic 7 “Pregnant”, and topic 11 “Pregnancy” and topic 8 “Baby”. The second set of topics is related to the healthcare system and its professionals and policies, and includes topic 2 “RGP Doctor”, topic 5 Medical Research, topic 9 “Health Finance”, “Religious circumcision”, and topic 15 “Hospital”. The last set of topics concern health and lifestyle issues, and includes topic 1 “Sexuality”, topic 3 “Healthy life”, and topic 10 “Hiking”. One topic, topic 14, is not relevant in terms of health communication and is related to the expression “anonym user” that occurs in many posts on forums.

Topic Model Solution- 15 Topics*

The “Public health report” for Norway, published by the Norwegian Institute of Public Health (2019) is structured around the following classification: “Health and disease”, “Mental health”, “Infectious diseases”, “Lifestyle”, and “Environment and health”. Among the most prevalent topics on social media, three of the themes structuring the public health report are well represented (namely, health and disease, mental health, and lifestyle), whereas the themes related to “Infectious diseases” and “Environment and health” are not prevalent on social media. Within the theme “Health and disease”, the public health report includes several conditions (asthma and allergy, cardiovascular disease, diabetes, dementia, and cancer). Only the condition of cancer seems to be prevalent on social media. Simultaneously, topics related to pregnancy and women’s and children’s health appear to be very salient on social media. The same applies to lifestyle-related themes, including sexuality, diet, and exercise, which are prevalent in a significant proportion of posts.

The topics related to the health care system are relatively heterogeneous. The topic “RGP Doctor” appears to consist mainly of posts where people report their experience with health services and complain about waiting. The topic “Medical research” includes mainly posts referring to new research findings and their implication for health and disease treatments, whether by linking to news on medical research or by giving advice relating to these research findings. The topic “Health finance” relates to the ongoing policy debates about the shortage in public funding to given conditions and to the public hospital reform entailing the closing of some local hospitals. Finally, the topic “Religious circumcision” includes posts arguing for divergent views concerning a debate, which periodically flares up, on a ban on ritual circumcision of boy children among the Jewish and Muslim population.

In sum, the topic modeling analysis shows the prevalence of some themes and the absence of many other health-related themes among the most salient topics posted on social media. Infectious diseases and environmental health issues do not appear among the themes most prevalent on social media. Among the non-infectious diseases, cancer appears to occupy a special place. Issues related to pregnancy, as well as women and children’s health are often posted on social media. Public health policy issues are concentrated on some selected themes that are very salient in the media including waiting lists to get access to public health services, the reform of public hospitals, and the issue of ritual circumcision of boy children. Some lifestyle topics (such as diet and exercise, and sexuality) are very salient on social media, whereas others, such as alcohol and psychoactive substances, smoking, and overweight do not appear among the most prevalent topics.

Regression Analysis: to what extent are the messages personalized over time and across platforms?

The regression analysis allows us to investigate the contribution of each topic (topic proportion) to the personalization of expression in social media about health and illness issues. In addition, by including the variable “Year” and the variable “Source” in the model we are able to assess how personalization varies over time and across social media platforms. Table 3 shows the results of a logistic regression with the category of post “Personal Expression” as the dependent variable and the topics, years, and sources of posts as independent variables. Model (1) does not include the variable “Source”, whereas model (2) includes the variable “Source”.

Logistic regression – Personal Expression in topics (Standardized Coefficients)

The positive coefficients indicate that the variable is positively associated with personalization of expression about health issues in social media, whereas a negative coefficient indicates the opposite. The results displayed in table 3 show that topic 1 “Sex”, topic 7 “Pregnant”, topic 11 “Pregnancy”, topic 13 “Religious Circumcision” and topic 14 “Anonymous User” are more likely to contain the personal expression of health and illness issues than the other topics. The results show a pattern whereby personalization decreases over time in the first phase (up to 2016) and then increases in recent years. This might indicate that people are becoming more used to social media and more prone to expose their personal experiences in public. The likelihood of occurrence of personalized posts varies across platforms. In addition, model (2) shows that personalization of expression occurs more in Forums than is the case on Instagram and on Twitter.

Figure 4 displays the marginal effects of topics, for each year, on the probability of the post to containing a personal expression related to health issues.

Figure 4.

Average Marginal Effects of Topics and Personal Expression over time

Figure 4 shows that some topics, such as topic 4 “Women and children illness” and topic 12 “Cancer”, are less likely to be personalized than the other topics. Topics related to sexuality, pregnancy, mental illness, lifestyle, and ritual circumcision are more likely to include expressions of personalized subjective experience. Figure 4 also provides a visualization of the contribution of each topic to the personalization of expression on social media over time. The degree of personalization of topic 1 “Sexuality”, topic 7 “Pregnant”, topic 11 “Pregnancy”, and topic 13 “Religious Circumcision” tends to increase over time. This indicates people’s increasing propensity, when it comes to those topics, to express personal subjective experiences on social media. Conversely, the degree of personalization of topic 4 “Women and children illness” and topic 12 “Cancer” has been decreasing since 2016, indicating that people are less inclined to express personal subjective experience on social media in relation to these topics.

Figure 5 displays the marginal effects of interactions between topics and social media platforms, revealing which topics are relatively more likely to be posted on Forums, Instagram, or Twitter. This allows investigating the degree of personalization of health communication of each topic relatively to media platforms.

Figure 5.

Average Marginal Effects of Interaction between topics and social media platforms

As shown by figure 5, some topics such as medical research, children issues, and cancer, are less likely to be posted on Instagram than on Forums or on Twitter. Conversely, lifestyle and pregnancy-related topics are more likely to be posted on Instagram. These topics are also likely to be more personalized. Topics related to cancer, women and children illness and religious circumcision are more likely to be posted on Forums. In sum, subjective expressions of personal experience are more likely to be posted on Forums and on Instagram, and less likely to be posted on Twitter, but Forums and Instagram attract different health-related topics.

Discussion and Conclusion

In this paper, we have investigated three questions related to the communication of health and illness issues on social media. First, we have inquired into the kind of messages posted on social media when people communicate about health and illness. Using a classification algorithm we have identified the proportion of two categories of health-related social media posts, “Personal expression of health experiences” (37% of posts), and “Non-personal messages about health” (63% of the posts)

This confirms the expectation that a significant share of the communication on social media is personalized. We have also found evidence, as expected, that this tendency toward personalization is growing over time as social media become ubiquitous and integrated into daily life. Yet, we do not find that personalized messages totally dominate communication about health on social media, rather, and in line with our alternative hypothesis, we find evidence that health discourses on social media tap into other dimensions of health-related user information and advice about health, health policies and health politics.

Furthermore, as shown by the regression analysis, there are important differences across social media platforms when it comes to the personalization of expression, with the forums being most personalized and Twitter being less personalized. The topic analysis gives a rough overview of the most central themes that occur in the large corpus of posts related to health on social media. The results show that people are concerned, when they communicate on social media about health, about three sets of topics related to (i) illness or health condition, (ii) the health care system and its professionals, and (iii) life-styles issues. These concerns, as captured through the topic model, have been relatively constant over the period 2012-2018, with the exception of the communication about cancer which displays important variations over time.

Our empirical analysis of a large data set of social media posts presents some limitations. First, the universe that is investigated in this article is defined by the search words that have been used to collect the data. The list of search words is by necessity limited and somewhat arbitrary. The results need consequently to be interpreted in light of this extensive, but still limited universe. Second, the concept of “social media” covers a range of online platforms that differ in their affordances and are likely to attract different types of users and forms of communication. Our sample is limited to Forums, Instagram, and Twitter and excludes, therefore, other social media platforms such as Facebook, WhatsApp, etc., where different topics might be discussed in more private settings, allowing for a higher degree of personalization. Third, if topic modeling allowed us to investigate the main topics within a large corpus of social media posts it lacks nuances and granularity, entailing that the method is not well suited to serve as the basis for an in-depth interpretation of the communication on social media platforms. To this purpose, other methods need to be harnessed. However, we have shown that a variety of trends and differences characterizing health communication on social media can be detected despite these limitations. While far from perfect, we believe that this kind of machine-based analysis can complement more focused in-depth studies, with some unique advantages such as the ability to cover a large body of communication messages over time and across social media platforms.

Again, the topic analysis suggests that posts on social media cover a wide range of issues, and that information about many dimensions of health and illness are represented on social media, both questions related to the health care system and health policies; focus on concrete health conditions and different types of treatment as well as issues related to lifestyle and ethical issues. Further nuancing a hypothesis of the all-pervading status of personalized messages of health on social media, is the finding that the role of social media as catalysts for personalized and emotionalized expression appears to be limited to specific life-style related topics and occurs mainly on certain types of social media forums. These findings point to the limits of such personalized health discourses in the hybrid media system. Health and illness on social media is as much a political concern beyond personal experiences as it is a subject for subjective expression and emotionalization.

Notes

I.

The classification algorithm is a multinomial logit classifier. The “inputs” are the “bag-of-words” extracted from the preprocessed social media posts. Social media posts are first preprocessed— in order to exclude stop-words, numbers, punctuation, to convert characters to lower cases, and to stem the words having a common root—and then converted into their “bag-of-words” representation, where the (frequency of) occurrence of each word is used as a feature for training the classifier. The “output” is a binary variable indicating the category of appurtenance of the post, in this case whether the post belongs to the category “personal experience”, “non-personal messages”, or “off-topic”.

References

1. Anker A.E., Reinhart A-M., Feeley T.M.. 2011;Health information seeking: A review of measures and methods. Patient Education and Counseling 82:346–354. http://dx.doi.org/10.1016/j.pec.2010.12.008.
2. Balfe M., Keohane K., O'Brien K., Sharp L. J. E.. 2017;Social networks, social support and social negativity: A qualitative study of head and neck cancer caregivers' experiences. European Journal of Cancer Care 26(6)e12619. http://dx.doi.org/10.1111/ecc.12619.
3. Barker K.K.. 2008;Electronic support groups, patient-consumers, and medicalization: The case of contested illness. Journal of Health and Social Behavior 49:20–36. http://dx.doi.org/10.1177/002214650804900103.
4. Barry M. M., Domegan C., Higgins O., Sixsmith J.. 2011. A literature review on health information seeking behaviour on the web: A health consumer and health professional perspective. https://www.ecdc.europa.eu/sites/default/files/media/en/publications/Publications/Literature%20review%20on%20health%20information-seeking%20behaviour%20on%20the%20web.pdf.
5. Blei D. M., Ng A. Y., Jordan M. I.. 2003;Latent dirichlet allocation. Journal of Machine Learning Research 3:993–1022.
6. Boyd D.. 2010. Social network sites as networked publics: Affordances, dynamics, and implications. In : Papacharissi Z., ed. Networked self: Identity, community, and culture on social network sites p. 39–58. Routledge.
7. Briggs C. L., Hallin D. C.. 2016. Making health public: How news coverage is remaking media, medicine, and contemporary life Routledge.
8. Broom A.. 2005;Virtually he@lthy: The impact of Internet use on disease experience and the doctor-patient relationship. Qualitative Health Research 15(3):325–345. http://dx.doi.org/10.1177/1049732304272916.
9. Clarke J. N., Everest M. M.. 2006;Cancer in the mass print media: Fear, uncertainty and the medical model. Social Science & Medicine 62(10):2591–2600. http://dx.doi.org/10.1016/j.socscimed.2005.11.021.
10. Conrad P., Bandini J., Vasquez A.. 2016;Illness and the Internet: From private to public experience. Health 20(1):22–32. http://dx.doi.org/10.1177/1363459315611941.
11. Conrad P., Stults C.. 2010. Internet and the experience of illness. In : Bird C., Conrad P., Fremont A., et al, eds. Handbook of medical sociology p. 179–191. Vanderbilt University Press.
12. Couldry N., Hepp A.. 2017. The mediated construction of reality John Wiley & Sons.
13. Costera Meijer I., Groot Kormelink T. J. D. J.. 2015;Checking, sharing, clicking and linking: Changing patterns of news use between 2004 and 2014. Digital Journalism 3(5):664–679. https://doi.org/10.1080/21670811.2014.937149.
14. DiMaggio P., Nag N., Blei D.M.. 2013;Exploiting affinities between topic modeling and the sociological perspective on culture: Application to newspaper coverage of U.S. government arts funding. Poetics 41(6):570–606. https://doi.org/10.1016/j.poetic.2013.08.004.
15. Eide M., Hernes G.. 1997. Død og pine! Om massemedia og helsepolitikk Oslo: Fafo.
16. Freberg K., Palenchar M. J., Veil S. R.. 2013;Managing and sharing H1N1 crisis information using social media bookmarking services. Public Relations Review 39(3):178–184. https://doi.org/10.1016/j.pubrev.2013.02.007.
17. Guo L., Vargo C.J., Pan Z., Ding W., Ishwar P.. 2016;Big social data analytics in journalism and mass communication: Comparing dictionary-based text analysis and unsupervised topic modeling. Journalism & Mass Communication Quarterly 93(2):332–359. https://doi.org/10.1177/1077699016639231.
18. Hallin D. C., Brandt M., Briggs C. L.. 2013;Biomedicalization and the public sphere: Newspaper coverage of health and medicine, 1960s-2000s. Social Science & Medicine 96:121–128. https://doi.org/10.1016/j.socscimed.2013.07.030.
19. Hepp A.. 2013. Cultures of mediatization John Wiley & Sons.
20. Hermida A.. 2016. Tell everyone: Why we share and why it matters Anchor Canada.
21. Hinnant A., Len-Ríos M. E., Young R.. 2013;Journalistic use of exemplars to humanize health news. Journalism Studies 14(4):539–554. https://doi.org/10.1080/1461670X.2012.721633¨.
22. Hopkins D., King G.. 2010;A method of automated nonparametric content analysis for social science. American Journal of Political Science 54(1):229–247. https://doi.org/10.1111/j.1540-5907.2009.00428.x.
23. Hopkins D., King G., Knowles M., Melendez S.. 2013. Readme: Software for automated content analysis. Versions 2007-2013. URL: https://gking.harvard.edu/readme.
24. Househ M.. 2011;Sharing sensitive personal health information through Facebook: The unintended consequences. Studies in Health Technology & Informatics 169:616–620. https://doi.org/10.3233/978-1-60750-806-9-616.
25. Househ M., Borycki E., Kushniruk A. J. H.. 2014;Empowering patients through social media: the benefits and challenges. Health Informatics Journal 20(1):50–58. https://doi.org/10.1177/1460458213476969.
26. Jenkins H.. 2006. Convergence culture. Where old and new media collide New York University Press;
27. Jerzak C.T., King G., Streshnev A.. 2018. An improved method of automated nonparametric content analysis in social science. Working Paper https://gking.harvard.edu/category/research-interests/applications/automated-text-analysis.
28. Kalsnes B., Larsson A.. 2018;Understanding news sharing across social media: Detailing distribution on Facebook and Twitter. Journalism Studies 19(11):1669–1688. https://doi.org/10.1080/1461670X.2017.1297686.
29. Kümpel A. S., Karnowski V., Keyling T. J. S.. 2015;News sharing in social media: A review of current research on news sharing users, content, and networks. Social Media and Society 1(2):2056305115610141. https://doi.org/10.1177/2056305115610141.
30. Lalazaryan A., Zare-Farashbandi F. J., Management D.. 2014;A review of models and theories of health information seeking behavior. International Journal of Health System & Disaster Management 2(4):193–203. https://doi.org/10.4103/2347-9019.144371.
31. Levy D., Newman N., Fletcher R., Kalogeropoulos A., Nielsen R. K.. 2014. Reuters Institute Digital News Report 2014. Report of the Reuters Institute for the Study of Journalism Available online: http://reutersinstitute.politicsox.ac.uk/publication/digital-news-report-2014.
32. Lovejoy K., Saxton G.. 2012;Information, community, and action: How nonprofit organizations use social media. Journal of Computer-Mediated Communication 17(3):337–353. https://doi.org/10.1111/j.1083-6101.2012.01576.x.
33. Lucas C., Nielsen R., Roberts M., Stewart B., Storer A., Tingley D.. 2015;Computer-assisted text analysis for comparative politics. Political Analysis 23(2):254–277. https://doi.org/10.1093/pan/mpu019.
34. Lupton D.. 2013. Moral threats and dangerous desires: AIDS in the news media Routledge.
35. Moe, H., Ytre-Arne, B., Hovden, J. F., Sakariassen, H., Uberg-Nærland, T., Figenschou, T. U. & Thorebjørnsrud K. (2020: Informerte borgere (Informed citizens). Universitetsforlaget.
36. Moorhead S. A., Hazlett D. E., Harrison L., Carroll J. K., Irwin A., Hoving C. J.. 2013;A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. Journal of Medical Internet Research 15(4)e85. https://doi.org/10.2196/jmir.1933.
37. Myrick J. G., Holton A. E., Himelboim I., Love B. J.. 2016;# Stupidcancer: Exploring a typology of social support and the role of emotional expression in a social media community. Health Communication 31(5):596–605. https://doi.org/10.1080/10410236.2014.981664.
38. Norwegian Institute of Public Health, (2019). Public health report. https://fhi.no/en/op/hin/.
39. PEW. (2008). Health news coverage in the U.S. Media. https://www.journalism.org/2008/11/24/health-news-coverage-in-the-u-s-media/.
40. Picard R. G., Yeo M. J.. 2011;Medical and health news and information in the UK media: The current state of knowledge. A report of the Reuters Institute for the study of Journalism https://reutersinstitute.politics.ox.ac.uk/sites/default/files/2017-11/Media%20and%20UK%20Health.pdf.
41. Roberts M., Stewart B., Tingley D.. 2019;stm: R package for structural topic models. Journal of Statistical Software 91(2):1–40. https://doi.org/10.18637/jss.v091.i02.
42. Roblin D. W. J.. 2011;The potential of cellular technology to mediate social networks for support of chronic disease self-management. Journal of Health Communication 16(sup1):59–76. https://doi.org/10.1080/10810730.2011.596610.
43. Seale C.. 2002. Media and health Sage.
44. Seale C.. 2003;Health and media: An overview. Sociology of Health & Illness 25(6):513–531.
45. Sosnowy C.. 2014;Practicing patienthood online: Social media, chronic illness, and lay expertise. Societies 4(2):316–329. https://doi.org/10.1111/1467-9566.t01-1-00356.
46. Stroobant J., De Dobbelaer R., Raeymaeckers K.. 2016. Research report: Health news media monitoring. Quantitative study of Belgian health news in newspapers, magazines, on television, radio and online https://biblio.ugent.be/publication/8539542.
47. Stroobant J., De Dobbelaer R., Raeymaeckers K.. 2018;Tracing the sources: A comparative content analysis of Belgian health news. Journalism Practice 12(3):344–361. https://doi.org/10.1080/17512786.2017.1294027.
48. Valenzuela S. J.. 2013;Unpacking the use of social media for protest behavior: The roles of information, opinion expression, and activism. American Behavioral Scientist 57(7):920–942. https://doi.org/10.1177/0002764213479375.
49. Van Dijck J.. 2014;Datafication, dataism and dataveillance: Big Data between scientific paradigm and ideology. Surveillance & Society 12(2):197–208. https://doi.org/10.24908/ss.v12i2.4776.
50. Van Dijck J., Poell T.. 2016;Understanding the promises and premises of online health platforms. Big Data & Society 3(1):2053951716654173. https://doi.org/10.1177/2053951716654173.
51. Wentzer H. S., Bygholm A. J.. 2013;Narratives of empowerment and compliance: Studies of communication in online patient support groups. International Journal of Medical Informatics 82(12):e386–e394. https://doi.org/10.1016/j.ijmedinf.2013.01.008.
52. Ziebland S., Wyke S.. 2012;Health and illness in a connected world: How might sharing experiences on the Internet affect people’s health? The Milbank Quarterly 90(2):219–249. https://doi.org/10.1111/j.1468-0009.2012.00662.x.

Article information Continued

Figure 1.

Volume of Health-Related Social Media Posts Over Time

Figure 2.

Proportion of social media posts by category over time

Figure 3.

Proportion of posts in the sample compared to the total number of posts by year

Figure 4.

Average Marginal Effects of Topics and Personal Expression over time

Figure 5.

Average Marginal Effects of Interaction between topics and social media platforms

Table 1.

Data Sources and Number of Posts by Source

Sources Posts Percent
Blogs 6833 0.87
Comments 458 0.06
Forums 326782 41.74
Twitter 411069 52.51
Instagram 27212 3.48
Facebook 10482 1.34
Total 782836 100.00

Table 2.

Topic Model Solution- 15 Topics*

Topic Top words Proportion
12 Cancer Health, cancer, medicine, patient, oncology 0.239
9 Health finance Money, public health, fee, taxes, budget 0.104
11 Pregnancy Pregnant, answer, gynecologist, spontaneous abortion, egg donation 0.099
13 Religious circumcision Circumcision, Jewish, religious, Bible, Muslim 0.084
3 Healthy life Healthy, food, exercise, eat, diet 0.066
15 Hospital (in the news) Hospital, man, girl, women, news, police 0.052
5 Medical research (reference to) Cancer, research, pharmaceutics industry, cannabis, drug 0.051
6 Mental illness Psychic, ADHD, Ritalin, Amphetamine, psychiatry, drug 0.051
4 Illness women & children Disease, children, women, young, school 0.046
1 Sexuality Sex, date, Tinder, feel, orgasm, miss 0.045
7 Pregnant Week, pregnant, day, cycle, gynecologist, midwife 0.045
2 RGP Doctor Doctor, hospital, appointment, pain, waiting 0.043
10 Hiking Walk, stopover, tent, trip, days 0.039
14 Not Relevant (Anonym user) 0.021
8 Baby Baby nest, diaper change, baby blanket, allergy, blood circulation 0.009
*

The interpretation of topics is based on the top 100 words and a reading of a sample of documents. Here we report the most significant top words.

Table 4.

Logistic regression – Personal Expression in topics (Standardized Coefficients)

Model (1) Model (2)
1 Sexuality 0.097*** 0.116***
2 RGP Doctor -0.035*** 0.087***
3 Healthy life -0.048*** 0.196***
4 Illness women & children -0.325*** 0.038**
5 Medical research -0.096*** 0.145***
6 Mental illness 0.020 0.135***
7 Pregnant 0.100*** 0.126***
8 Baby 0.028** 0.043***
9 Health finance -0.002 0.239***
10 Hiking -0.018 0.088***
11 Pregnancy 0.154*** 0.172***
12 Cancer -0.783*** 0.443***
13 Religious circumcision 0.199*** 0.216***
14 Not Relevant (Anonym user) 0.121*** 0.082***
15 Hospital (in the news) -0.245*** 0.208***
Year=2012 0.000 0.000
Year=2013 -0.893*** -0.986***
Year=2014 -1.197*** -1.387***
Year=2015 -1.577*** -1.543***
Year=2016 -1.367*** -0.933***
Year=2017 -0.401*** -0.452***
Year=2018 1.022*** 0.911***
Forums 0.000
Instagram -0.471***
Twitter -3.850***
Observations 284250 284250