The research action is based on an inductive approach because the goal of this work is to create an initial and exploratory study to characterize a CS project to infer more general mechanisms that can be applied to a variety of projects. This case study includes interesting issues about the Chimp & See project and describes the methodology used and the results obtained.
The chimp & see project
Zooniverse is one of the larger citizen science web portals. It has been launched in 2009 by the Citizen Science Alliance (CSA), which has board members from various institutions such as the Adler Planetarium or the Johns Hopkins University. Until now, the platform has more than 1.6 million registered volunteers participating in citizen science activities. In June 2020, the platform registered 99 active, 122 paused and 46 finished projects. The projects encompass crowdsourced CS activities with active participation, where the volunteers annotate and classify entities among other types of activity. A whole description of the typology of the CS activities in Zooniverse has been published by Michalak (2015).
Chimp & See, has been started in 2015 by the Max Planck Institute for Evolutionary Anthropology as one of the projects on the Zooniverse platform. The goal of the project is to gain a better understanding of chimp culture, population size and demographics in specific regions of Africa. The type of activity as an annotation and classification task, where the volunteers identify species on video. The web-based platform features a video player with interactive tools to annotate parts of the video, for example to highlight certain behavioral patterns of chimpanzees. The volunteers are not required to have a certain knowledge in the field. They receive instructions for the annotation in an interactive web tutorial.
Goals and indicators
The Zooniverse projects are per se crowdsourced activities with active participation (Haklay, 2013). From an external perspective, it is neither obvious nor trivial, how the given roles correspond to the actual roles the volunteers take in the discourse or communication in the forum (talk pages). The behavioural patterns and communication structures that can be observed might give some clues about the general structure and how the analysis of the discourse can be attributed as a characteristic of a citizen science project. The goal of this work is to create an initial and exploratory study to characterize a CS project to infer more general mechanisms that can be applied to a variety of projects.
To characterize the communication structure and particularly the role of certain users in the discourse, we use centrality measures such as (weighted) In-Degree, (weighted) outdegree, and eigenvector centrality to measure different types of importance. Additionally, descriptive statistics about the distribution across the different roles give important insights about the communication structure, particularly who initiates communication and who replies to enquiries.
How did we get the results?
The forum data of the Chimp & See talk pages have been processed to create a dataset for the analysis. For this purpose, 3218 forum threads with 24531 individual posts have been processed. The forum involved a total of 575 unique user accounts, which represents 10.1% of all the active volunteers of the Chimp & See project. The number of accounts splits up in the following (system) roles: 8 moderators, 25 scientists, 542 volunteers. The time window of the forum discussion that has been processed was from 2015-04-03 until 2019-07-05. Three sub forums have been analysed: help, science, and chat (“community building”). The average length of a discussion thread is 6.5 posts, with a variation depending on the specific forum (help: 5,7; chat: 5; science: 8,7). The overview page that contains all sub forums served as a seed for the crawler.
For the technical implementation of the data collection and processing, the following pipeline with the techniques and tools described in section 2.1.1 has been created using the Python programming language. The crawler uses Selenium with a headless browser to access page content from the forum and BeautifulSoup to extract relevant data (paging for multi-page threads included). Table #1 shows all the fields extracted using the crawler.
Afterwards, the NetworkX library has been used to extract the social network from the retrieved forum data. For an ex-post analysis (centrality measures) and visualisation of the extracted network with dynamic graph layouting, Gephi has been used. The network is created as a directed graph, where nodes represent the users, and the colouring of nodes indicates the prescribed role (moderator, scientist, volunteer). Edges between users are established either when a user replies to a post, or when a user mentions someone using the designated @ character. A weight is assigned to each edge representing the number of replies and mentions. The weight of a node is the outdegree. To characterize the dynamics, time slices are selected for each year.
The previously outlined example illustrates a first case study about how to assess communication structures in online communities of citizen science projects. Using the methods of SNA shows the importance of certain roles in the mediation of citizen science activities. Semantic technologies help to better understand the artefacts that are created within the communities, both in the discourse and the core activity itself. To effectively analyse and compare the discourse across the different citizen science projects, epistemic network analysis might produce more and deeper insights.
- Haklay, M. (2013). Citizen science and volunteered geographic information: Overview and typology of participation. In Crowdsourcing geographic knowledge (pp. 105-122). Springer, Dordrecht
- Herodotou, C., Aristeidou, M., Miller, G., Ballard, H., & Robinson, L. (2020). What Do We Know about Young Volunteers? An Exploratory Study of Participation in Zooniverse. Citizen Science: Theory and Practice, 5(1).
- Michalak, K. (2015). Online localization of Zooniverse citizen science projects–on the use of translation platforms as tools for translator education. Teaching English with Technology, 15(3), 61-70.
- Muthukadan, B. (2018). Selenium with Python. Retrieved: https://bit.ly/2F2VDmz
- Richardson, L. (2020). BeautifulSoup Retrieved: https://bit.ly/2ZkItYG