The dynamic nature of the population means that researchers need to be aware of the timing at which those two data sources were collected.

If the two sources are too far apart in time, there may be differences in the data due to the dynamic nature of the population. Because of the privacy issues noted above, researchers must come up with innovative ways to identify which individuals might be in both data sources. No gold standard currently exists for population size estimation. The particular approach chosen should depend on the population of interest and resources available.

Approaches for gathering data on a hidden population include a general population survey with screening questions.

## Statistical Training Courses | PSRTI Website

This might not work well for hidden populations who are reluctant to self-identify. Other approaches include venue-based and time-location sampling, online sampling, and respondent-driven sampling see Chapter 4. Geneva: World Health Organization. The first method McLaughlin described is capture-recapture. The basic procedure involves mapping all sites where the population of interest could be found similar to venue-based sampling.

They try to make the interaction memorable in some way, perhaps by distributing a small item or token.

### Statistical Sampling

It is hoped that in the future, if they are asked whether or not they received this item, they would remember it. Researchers record three pieces of information: the size of the first sample, the size of the second sample, and the overlap. Researchers count the number of people who received the memorable item during the first sample as the overlap. The size of the hidden population is estimated as the product of the sizes of the two samples divided by the number of individuals in the overlap.

A lot of overlap may indicate that the population size may not be that much larger than the sample sizes. But if the two samples are different from each other, then the population size may be quite a bit larger than the sample size. McLaughlin pointed to many assumptions associated with capture-recapture.

For example, every member of the hidden population has an equal chance of being sampled. She noted in venue-based sampling, this is likely not to be true. Some people are likely to be frequent visitors to the venues, others may never visit them. Another key assumption is that the matching is reliable, and that everyone in the second sample will recall participation in the first sample and will respond honestly. Another assumption, perhaps the most troubling, is that the two samples are independent.

If one is returning to the same venues, it is likely the same collection of individuals frequent those venues. This simple method can provide a ballpark estimate for population size. Multiplier methods also rely on two data sources. The first data source will not be a random sample, but a count of population members who received a service. An object distribution could also be used as the first source of data. Rather than a list of names, only a count of individuals is sought. The sample may not actually be representative, given the nature of the population, but the best representative sample possible.

### Graduate Student Information

As part of the survey, individuals would be asked whether they received the service or the object. This approach also has a relatively simple form for the. This approach also poses challenges, McLaughlin said. One assumption is that the two data sources are independent, but this may not be so.

When respondent-driven sampling is used, many assumptions about the population are made, and uncertainty and bias may propagate when the data are used for something else. The timing between the service or object distribution and the survey is important to consider.

Finally, she noted, it is assumed that everyone receiving the service or object is a member of the hidden population. A count of the number of people who received an STI test in the first source does not guarantee that all of those people are also, for example, sex workers. Obtaining the initial data source may be a challenge. McLaughlin explained different variations of network scale-up methods and ongoing research into new variations.

It is assumed that the proportion of respondent contacts who are members of the hidden population is equal to the population proportion. An estimate for the size of the hidden population would be 2 percent of the size of the general population. This approach requires the assumption that people in the general population are aware of whether or not their friends are members of the hidden population.

## Looking for other ways to read this?

This can be a problematic assumption, particularly for populations that are especially hidden. This approach also assumes that network connections are formed at random. McLaughlin described a generalized scale-up estimator that relies on two data sources. It pairs a general population survey with a hidden population survey, such as respondent-driven sampling, and tries to match up the accounts.

The people in the general population will say they know a certain number of people in the hidden population.

Respondents to the survey of the hidden population are asked how many people in the general population would have reported them as being a member of the hidden. Estimating population size using the network scale up methods. The Annals of Applied Statistics, 9 3 The second data source is an attempt to improve estimation. However, it still assumes an aggregate awareness about visibility in the hidden population, and this can be problematic.

She acknowledged Krista Gile as a coauthor on the research. As a result, it may be more cost-effective than using an approach that requires two data sources. It can also be done retroactively because no new questions need to be added to the survey instrument. The only information needed is the network size of an individual, which is already routinely collected during respondent-driven sampling. This method uses a Bayesian framework where prior information about population size can be incorporated. It is a good way to combine data from different sources.

For example, if there was a previous estimate from a capture-recapture method, it could serve as a prior estimate for SS-PSE. The Bayesian approach also provides a statistical model for uncertainty. People who are more visible, meaning they have larger network sizes, may be more likely to visit venues or interact with researchers.

## Methods of sampling from a population

They are also more likely to be sampled and to be sampled earlier in respondent-driven sampling. If network size decreases over waves, the population is likely being depleted and population size is not likely to be much larger than the sample size. If the frequency of larger network sizes does not decrease, one conclusion is that there are still many people not sampled and the population size is likely to be larger than the sample size. The network sizes in respondent-driven sampling may not contain a lot of information about the population size, and this method relies on the quality of the respondent-driven sampling data.

- ANYSCALE LEARNING FOR ALL.
- Research, no motion: How the BlackBerry CEOs lost an empire;
- Modelling Population Change From Time Series Data.
- Ascension (Alfonzo Series Book 2)?
- Fugue D Major Op.137 - Cello.
- BIT 5724: Managerial Statistics.

If everyone had exactly the same network size all across the waves, there is no information about how much bigger the population would be. McLaughlin also noted that this method is not good at handling inconsistent prior values. She has seen examples where. Estimating the size of hidden populations using respondent-driven sampling data: Case examples from Morocco.

Epidemiology, 26 To conclude, McLaughlin discussed directions for research in population size estimation. The choice of a particular method would depend on the population of interest and the resources available.