We take a daily sample of 200 respondents from the Russian population using Random Device Engagement. The sample is limited to Russian speakers who are 18 or older. During the first several weeks of the survey, we observed a significant gender imbalance in the samples and began quota sampling on gender to acheive a sample that represents the demographic balance of the Russian population as published in the initial reports of the 2022 Russian census. The balance across other demographics including age, education, and geography have remained balanced enough to not require additional quotas, although we do take these demographics into account when constructing weights.
In the following sections, we provide an introduction to RDE sampling and present the methods that we have used to validate the survey sample. These sections focus only on issues related to acheiving a nationally representative sample and discussions of issues like preference falsification are deferred to later sections.
In publishing our results, we seek to be as transparent as possible. As a result, we will be frequently updating this page as we take additional steps to validate the survey.
RDE Sampling: An Introduction
Random Device Engagement uses ad-ids to randomly select respondents through their devices (e.g., mobile phones, tablets, computers). Ad-ids are unique identifiers that marketing companies use to match metadata generated online to a specific user, which allows them to provide more personalized advertisements to consumers. Once a device is selected, an ad is pushed to the user’s device in the app she is already using by offering an organic incentive to take the survey. For example, a user who is reading news on a news app may receive an ad that offers a free premium news article as compensation for taking the survey. This sampling method is the online corollary to random digit dialing and has shown to perform better than many traditional sampling methods ( Rothschild and Konitzer 2022).
Ideally, a survey will draw a true probability sample from a population. In a probability sample, the probability that any given individual is sampled is known, which allows the researchers to recover a nationally representative estimate. For any survey method this ideal is exactly that - an ideal. Many problems get in the way, from unreachable portions of the population to people refusing to take the survey. This problem has become increasingly acute over the last several decades as response rates to all survey modes have declined substantially. This means that there is no longer a gold-standard for public opinion data.
Validating that a sample is nationally representative is an extremely difficult task, primarily because we often know very little about the background population. As a result, validation exercises are normally left to gather the best evidence they can. Beyond the validation work done by Rothschild and Konitzer 2022 on the general performance of RDE samples, we compare our survey sample to other surveys that ask the same questions as we do during the same time period. The agreement of several survey modes should increase our confidence in all of the samples referenced, including Russia Watcher’s.
VCIOM Tracking Survey
VCIOM’s tracking survey has overlapped our survey on several dates. Because they collect their estimates of support on a single day, we compare their estimates with the output of our dynamic model. We also note that VCIOM asks a binary question about support where we ask for the level of support on a four-point likert scale. Also note that VCIOM reports the percentage of "yes" and "no" responses after dropping non-respondents from the sample and the proportion of non-responses separately. For comparability, we report our estimates the same way in the tables below. VIOM's report from which these data are drawn can be found here.
May 26, 2022
|Likely to support||Unlikely to support||Don't know|
June 12, 2022
|Likely to support||Unlikely to support||Don't know|
June 25, 2022
|Likely to support||Unlikely to support||Don't know|
So far, we have identified one Levada survey that overlapped ours in time frame with the dates listed and a question that matches one of ours. Both surveys ask how closely the respondents are following the special military operation between July 21 and 27. As an important note, we have found that the distribution of this question changes almost daily, suggesting that some differences between the surveys could be driven by the concentration of Levada’s respondents on several of the days of the survey. Levada's summary of the data can be found here.
|Very Closely||Closely||Without much interest||I don't follow at all||Haven't heard anything about it|
Each daily questinonaire consists of 25 questions. A subset of those are asked daily, an additional subset are rotated in and out on a schedule, and a final subset are saved to ask about developing events. Following ethics guidelines, respondents are presented with an informed consent document prior to taking the survey. Approximately 2\% of respondents decide not to continue with the survey after viewing the consent form.
A common problem with online surveys is ensuring high data quality as it is easier for respondents to answer questions without paying attention in these questionnaires than in some other sampling methods. To eliminate respondents who are not paying attention to the survey, attention check questions are included throughout the questionnaire and respondents who consistently fail attention checks are removed from the survey.
While some sampling methods perform better than others, none are perfect. To acheive a more nationally representative estimate, we reweight our data based on population characterstics that we can observe. There are various methods of constructing survey weights, and we primarily rely on two methods depending on the use-case. When we conduct analyses indended to describe the relationships between variables or when change over time is less important and we are able to average responses over a reasonably large period, we use traditional rake weights. When we need an estimate of opinion on a given day, rake weights are insufficient for a sample size of 200 to reduce the margin of error enough to observe meaningful change. For these cases, we built a dynamic multilevel regression with poststratification (DMRP) model. Both of these methods are described in detail below.
We choose rake weights because they have been shown to be highly effective at adjusting samples. An excellent description of the rake weighting procedure by Battagilia, Hoaglin, and Frankel can be found here. We weight on age, education, gender, and region using data from the 2010 Russian census summary tables.
Following Gelman, Lax, Phillips, Gabry, and Trangucci’s (2018) work, we build a dynamic multilevel regression and poststratification (MRP) model to build fine estimates of daily public opinion. MRP models break a population down into demographic cells and borrow information across cells to build estimates of public opinion within each group. For example, a survey in the united states may break down the population by gender, age, education, occupation, and state. One cell would contain female factory workers in Ohio between the ages of 25 and 30 who hold a PhD. The survey would likely not observe anyone in this cell, but it would observe other people between the age of 25 and 30, other females, other females in this age bracket, other people with a PhD, etc. By borrowing data from related groups, we build an estimate of support within each demographic cell. In the poststratification step, we use census data to multiply the cell-level estimate by the census count in each cell and sum over the population to create a nationally representative estimate accounting for demographic sampling bias. We refer readers to their full paper linked above for an in-depth description of the modeling framework.
Testing Preference Falsification
To mitigate potential preference falsification, we ask about support for the “special military operation” using periodic list experiments. List experiments are a well-tested way of estimating the true level of agreement with a sensitive statement by guaranteeing that the value of support of individual respondents cannot be directly observed (Corstange 2009; Imai 2011; Blair and Imai 2012). They do so by providing a control list of statements on non-sensitive topics and a treatment list that includes the same statements along with the sensitive statement. Both groups are asked to give the number of the statements that they agree with, rather than answering each individually. A valid estimate of support for the conflict across the sample can be obtained by comparing the outcomes of both groups without requiring any individual respondent to reveal their true preference (e.g., Frye et al 2017; Chapovski and Shaub 2022). This design is increasingly popular in surveys conducted in authoritarian regimes as a way of obtaining accurate estimates of responses to sensitive questions (Hale 2021, Pop-Eleches and Way 2021). We also ask the direct version of the question to the control group after the list experiment in order to obtain an estimate of the number of respondents who are falsifying their preferences. This allows us to monitor the degree to which Russians feel that they must hide their true thoughts about the war in response to repressive events or social pressures.
In addition to list experiments, we test for preference falsification by asking politically sensitive questions in multiple ways, for example: “Do you support Russia’s decision to conduct a special military operation in Ukraine?” and “Would you call off Russia’s special military operation in Ukraine if you could?” We also present respondents with an informed consent document prior to taking the survey. This consent form, which was approved by the Princeton Institutional Review Board, outlines the potential risks respondents may face when sharing their opinions online and informs them that they may leave the survey or choose “don’t know” or “prefer not to answer” at any time. Respondents who are afraid to answer politically sensitive questions thus have the option to exit the survey. Approximately 2% of respondents decide not to continue after viewing the consent form. While it is possible that some of the remaining respondents provide disingenuous answers, we believe that the aforementioned controls reduce this risk substantially, meaning that Russia Watcher yields highly reliable data.
Because proving that preference falsification does not exist on our surveys involves proving a null hypothesis, it is difficult if not impossible to do so conclusively. However, the bulk of evidence described above suggests that if preference falsification is occurring, it is likely minimal and the estimates produced from the data are largely reliable.