Science topic

# Sample Size - Science topic

The number of units (persons, animals, patients, specified circumstances, etc.) in a population to be studied. The sample size should be big enough to have a high likelihood of detecting a true difference between two groups. (From Wassertheil-Smoller, Biostatistics and Epidemiology, 1990, p95)

Questions related to Sample Size

1)How it will be when 2000 students (same) participated in three different studies and all three studies are in the inclusion criteria?

2) How in case of the single large study had divided into three different studies with same 2000 sample size and all three are considering for reviewing?

It will be a great help for the further development of my study.

I request for advice on the procedures for generating a sample size for an Interrupted Time Series design for an Implementation Science study. I will also be grateful if some references on this topic are shared with me.

If the research design is qualitative, data collection is via semi structured interviews and analysis include thematic analysis and fuzzy . What should be the optimal sample size? Is there any rational behind the number of interviews or any reference?

Hey all,

I have a question regarding my optimal sample size. Is the formula that I added in the attachments sufficient to determine my sample size or should I also do something with the amount of variables that I include in my regression. Namely, I thought that there was a rule of thumb that you should have at least 10 respondents for each variable in your regression. Moreover, I don't know whether you need 10 respondents for each factor or for each variable (f.e, the variable loyalty consists of 5 factors, does that imply 10 or 50 respondents?).

Thanks in advance for answering my question!

Kind Regards,

Bram

I am looking for some valid references for the appropriate sample size for doing a research study based on phenomenology.

I am to calculate sample size for a case control study and I am not sure whether to use the crude odds ratio or the adjusted odds ratio of other studies to estimate my sample size

My sample sizes are 41 and 12 respectively and are normally distributed, continuous data, and randomly selected. However the means for both sample sizes (I even did a combined sample of 41 and 12) are above the mean score that is being compared to. Both have a standard deviation of around 20. I am using SPSS. My data: administered a survey to two groups (two languages) and language one, 41 people replied and language two, 12 people replied. Thus, my first sample size is statistically significant and my second sample size is not.

I am comparing to a mean of 60 and the sample size of 41 yields a mean of 80 and the sample size of 12 yields a mean of 88. When running a one sample t test respectively on both sample sizes, my significance is < .05 which means H0 is rejected (means are the same in comparison to the compared value). Yet doing a two sample t test yields a significance that is > .05 which means H0 is accepted but this would not make sense since a two sample t test gives me a mean that is much higher than 60. Any advice on how to proceed with statistical analysis?

the two tailed/independent samples t test on SPSS tells me in the equal variances assumed row that the significance is > .05. The row beneath it is equal variances not assumed and there is no value for f or significance. To my understanding, if Levine's says significance > .05 I use equal variance assumed and that same significance is telling me that it is not significantly different to the mean value of 60 since it is > .05. This still does not make sense. In this case what am I concluding in respect to the mean value I am comparing my data to?

is there a way of arriving at a sample size without calculation in a case where the researcher does not know the size of the population

I am planning to conduct a genetic diversity & population structure study in African zebu cattle. I will use 77k SNP markers to genotype the population. What would be your though about ideal sample size?

Thanks

A. Ali (PhD)

I am trying to estimate the sample size necessary for the RNA-seq study. There's no previous studies that I can refer to for an estimation. In this case, how should I approach planning for sample collection? Thank you for your help.

I am trying to run analysis in smart pls.every time I try to run algorithm or bootstrap I get the following error:

" sample size too small.there must be atleast 301 cases/ observations."

My sample size is 275.I also checked my data for duplucation in spss and found 0 duplicate data.kindly help in this regard.

Thank you

I need to do healthcare providers' compliance to the standard performance with observation, but difficult to get a reference for sample size determination.

I have polymer nanocomposite made through injection molding route. Nanomaterial's were added into the polymer then by melting and injection molding the samples were taken out in ASTM Standards. What are all the methods to prepare solid sample for testing it into SEM, FESEM, AFM and TEM. What should be the minimum sample size required for doing these testing for a non conductive material? Like to know about it in detail kindly share some pdf research papers/ book for reference.

Hi,

I am looking at which age group it is more plausible that an individual belongs to based on which social group she/he trusts the most.

The population that I want my sample to be representative of is 8.189.892 and is aged between 18 and 100+. I need to calculate the correct sample size (simple random sampling).

Do I pick confidence interval/margin of error and confidence level just voluntarily and calculate the correct sample size based on them? If so, how do I go about it? or:

Do confidence interval/margin of error and confidence level need to be calculated first? and if so how do I go about it?

Is there any other variable that needs to be taken into account when calculating a sample size? and if so which, and how do I go about it?

As it might be vivid I am a beginner and pretty much every solution that I have found online goes about calculating one from the other but I don't have any variable (except for the total population size) as long as confidence interval and confidence level cannot just be picked by free choice (like 2% of confidence interval and 99% confidence level for example).

Cheers,

Viktor

Better if I can get an idea about how to calculate the sample size or what would be the minimum number of participants to be recruited in such intervention?

I plan to conduct prospective cohort study but there have not been similar studies with my proposed title.

Need to know who originally formulated the sample size determination formula n= N/(1 + Ne^2). Your insights will be helpful.

Hi,

Does anyone may have suggestions to compute constrained maximum likelihood estimate (CMLE) instead of ML estimate in a Wald Test In Mplus using MODEL TEST COMMAND? My latent class are unequal in terms of sample size and some have small sample size, and this method seems to be more adapted for this type of design.

Here is an example of my syntax for Wald test:

Model Constraint:

New(P1vs2 P1vs3 P2vs3);

P1vs2 = P1 - P2;

P1vs3 = P1 - P3;

P2vs3 = P2 - P3;

Model Test:

0 = P1vs2 - P1vs3;

0 = P1vs2 - P2vs3;

Thanks

I want to know how to determine the sample size while performing a microbiome study in humans.

I am performing Factor analysis on a dataset with 16 variables and 22 observations. I realize the sample size is small but I am getting high factor loadings. The problem is that the correlation among variables under one factor is very high with correlation coefficients of more than 0.7. Also some of the variables under one factor are highly correlated with variables under another factor. I don't want to drop too many variables because they are important from the study aspect. I read that the correlation pattern among variables in case of small sample size is not reliable. If that is the case then can I go with the further analysis with the existing high correlation. Would there be a problem of multicollinearity?

Hi,

I am going to intervene mice with drug and want to see is there any effects of intervention compared to sham controls in mouse model. Do you have any idea about the priori power calculation tools/methods used in animal interventional studies?

Many thanks,

Nirmal

Can anyone share a research paper where the power calculation was conducted for sample size and as for dataset 'Demographic and Health Survey (DHS)' was used?

The outcome variable is not a big concern. Just need the clarification of methodological section where power calculation was performed using DHS dataset.

Thanks in advance!

I need the latest recommendations for using Krejcie and Morgan's (1970) formula.

When I have a binary outcome variable and a binary predictor, I am able to calculate the sample size needed using either G*Power (under z tests -- logistic regression) or R (wp.logistic). However, I do not know how to tackle this when the predictor variable has three groups/categories. Does anyone know how to go about these calculations, and using what software? Thanks in advance.

I would like to calculate the required sample size for a main study based on the results from my pilot study in a psychological experiment.

The hypotheses indicate a repeated measures multiple mean comparison, hence, a repeated measures ANOVA (one-way). In the pilot study, the normality assumption is violated, therefore, I used a Friedman test instead.

My question: How can I now calculate the required sample size for my main study? I already found the rule of thumb to add 15% of the calculated sample size for the corresponding parametric analysis. Yet, because the normality assumption is violated, I cannot trust this parametric analysis. I am therefore looking for an equation to calculate the required sample size with the output of the Friedman test.

Thanks in advance for your help

I'm looking for a way to measure significance + variance homogeneity/heterogenity

First of all I tested my data for normality and it's not always parametrical, so I think that I should use different tests for parametrical/nonparametrical data (fig 1 and 2).

As I understood I can't use repeated measures ANOVA if I have unequal sample size so any tips would be highly apprecitated

Dear all,

My goal is to predict night light intensity based on day time satellite imagery. I am using data from

**VIIRS**and**Landsat 8**sensors. My study area is shown in the attached image and I am following the methodology of the attached paper. According the authors their day time images are of 400*400 pixel (dimensions). Also, they classified the nighttime images into 3 categories (low, medium, high intensity lights). In my study area very few day time pixels corresponds to high intensity lights compared to the other 2 categories (e.g. I can find only 40 day time pixels for category 3 but I can sample way more for the other two categories). How should I sample (select) my day time images for each category, taking into account that each sample will have different dimensions (e.g. I can find only 40 day time pixels for category 3 but I can sample way more for the other two categories)? These samples are going to be my input to the VGG-16 architecture.Goodnight!

I have questions regarding the definition of sample size.

The aim of the study is to evaluate if the mother's diet will contribute to the offspring's obesity rates.

The experiment in question has the following design:

Six types of diet and a control diet will be used in the treatments of female mice. After the experiment and the mating of the females, we only need to use the male individuals of the offspring. Therefore, I need to define the number of female mice that will be used in each of the treatments, and define the number of male individuals that will be selected from each offspring, to carry out the index tests.

For ethical reasons, I need to use as few individuals as possible and I need statistical bases to define this number. A big problem is that we cannot predict how many males will come from each problem.

Can anybody help me?

I am writing a qualitative research paper on EFL graduate students' academic writing challenges in a university in Turkey where English is the medium of instruction. The research instrument is a semi-structured interview, and thematic analysis (TA) will be implemented. Based on what should I choose the sample size? What is the best/ideal sample size to reach the principle of saturation?

Hello,

Does anybody have any statistical suggestions for justify the pertinence to make comparisons between groups with different sample size?

I've conducted latent class analysis and i would like to compare these classes on a distal variable. Chi-square tests were used to compare classes. Results are good but i'm looking for some references that could help me justify the pertinence of these comparisons even though classes differ frome each other in terms of sample size.

Thanks!

I am a Medical Sonographer and will be performing a clinical based research project. I need to calculate the sample size needed for patient recruitment however in my literature review I have not found any previous studies on my research topic to assist with sample size calculation. I will be able to recruit 200 patients in my clinical site. How do I justify that 200 patients are sufficient to allow data analysis, draw conclusions etc?

data sample: 206

dependent - continuous sample. normaly distributed

independents - sample sizes are are different between categories.

do i use parametric or non-parametric to analyze?

What is your opinion on how much patients I need for a early feasibility study (ESF) to access in human device functionality of a prototype?

Hello,

I'm trying to estimate a minimum sample size and I'm reading Kline's book on structural equation modeling, where he references the N:Q ratio:

"In ML estimation, Jackson (2003) suggested that researchers think about minimum sample size in terms of the ratio of cases (N) to the number of model parameters that require statistical estimates (q)"

I have my model set up in AMOS and it shows the parameters for the weights, covariances, and variances. I understand where they come from, but I'm confused about how to use the N:Q rule because Kline says that its the parameters

*that require statistical estimates.*Does this mean the parameters that I am interested in for my prediction (i.e. the weights and/or covariances) ? Or does it mean the parameters that the entire statistical model requires to make an estimate of fit (i.e. weights + covariances + variances)?Thanks so much in advance!

I want to model mode choice of movement challenged persons (MCPs) using random forrest decision tree. My sample size is 400. It is very difficult to reach to MCPs and collect more samples. Will 400 samples be enough for random forrest decision tree?

Is there any specific sample size to apply random forrest decision tree on a data.

Hello everyone, I have a question regarding the sample size and the response rate. My population size is around 22000 students. I sent a survey to all of them via email. Around 6% of them (n=1300) participated. My question is, since I have a large sample size, is a low percentage like 6% response rate will create an issue for my research?

Hello all, I have a question about sample size calculation want to ask for help. My project is about"prevalence and associated factors of burnout among the end of life care nurses in the North East of England", which is a cross-sectional study and I'm going to use linear regression to analyse data. However, it is my first time conducting research and I have little experience and knowledge about statistic...Now I'm stuck in the power calculation. My supervisor required me to figure out how many respondents indeed I need first, but I'm quite confused about it. I tried reading books from the uni library and searching videos on youtube...but I still feel all information I received are really messy and unorganised. May I ask for any advice and suggestion about how to figure out the sample size calculation and power calculation? Just need some advice, like from where I can get good resources to learn from the very beginning (I have limited time now)...or steps that I need to do to go through this pathway. Could you please give me some help? Thanks a million for any advice and instruction.

I have sample a of 138 observations (cross sectional data) and running OLS regression with 6 independent variables.

My adjusted R2 is always coming negative even if I include only 1 independent variable in the model. All the beta coefficients as well as regression models are insignificant. The value of R2 is close to zero.

My queries are:

(a) Is negative adjusted R2 possible? If yes how should I justify it in my study and any references that can be quoted to support my results.

(b) Please suggest what should I do to improve my results? It is not possible to increase the sample size and have already checked my data for any inconsistencies.

Hello,

I have multiple choice items in my survey questionnaire and I have a problem with figuring out which statistical analysis is the best since I don't have a hypothesis because my questionnaire is exploratory.

I have 4 devices and I have 9 feelings (the same feelings repeated for each device) and I want to discover which feeling is the most prominent across all devices.

Example of the item: (below example for the same subject)

Device A: picked feelings 1,4,9

Device B: picked feelings 5

Device C: picked feelings 4,5,7,8

Device D: picked feelings 5

The other item is I have 5 choices of what they expect of specific training. The choices varied between one subject to another.

Example of the item: (below example for different subjects)

Subject 1: picked expectation 1, 3, 2

Subject 2: picked expectation 1,2,3,4,5

Subject 3: picked expectation 3

I have used Friedman's test to rank each feeling item and expectations item, however, I want to know if there's a statistical significance of these items.

My sample size is 101. However, since these items are multiple choice they are not equal. For example Device A has 200 responses while Device B has 109 responses.

Thank you in advance.

Hi,

I want to conduct a study comparing the adverse effects (our primary outcomes) of an old drug on another disease. There is no previous experience on treating the disease using the old drug for a specific cohort (such as highly severe group). In this study, we are planning to recruit 20-40 participants, give them the treatment and follow them for x months (3-6). Then we will assess the adverse effects (A, B and C [all binary outcomes] in which A is the most important one.) and other outcomes (maybe continuous) at two times (maybe first at 3 months and second at 6 months). For the control group, we plan to match control group from our medical database based on patients characteristics.

However, I am not sure if it is feasible to conduct this study as a matched case control study. And if it’s okay, how can I determine the sample size and matching ratio between case and control groups. If it’s not correct, what type of study do I need to conduct. Any thoughts would be very appreciated!

Thanks in advance!

I would like to conduct multigroup moderation analysis using AMOS. I have two samples. the first one n =73 whereas the second is 216. Are the two sample sizes appropriate for such analysis?

It should be noted that my overall or total sample size is 289. the two samples are categorized according to the quality of ties with organizational leadership. the smaller sample size represents the group which has a negative tie with leadership. usually negative ties are much smaller in number than the positive ties.

Hi,

I am using the Demographic and Health Survey (DHS) data of a country. My purpose is to find out the prevalence and risk factors of 'Intimate Partner Violence'. Now, one of my supervisors has asked me for the 'power calculation'. But I don't know how to reply to him since I did not need to estimate the sample size by my own.

Actually, as far I know, the design of DHS samples is determined by many factors, including criteria for the standard errors of estimates of the main indicators within the sample strata, which are usually the combinations of level 1 administrative units and urban/rural residence.

So, 'how can I reply to my supervisor about this power calculation in terms of DHS dataset', can anyone help me to find it out?

Dear colleagues,

I applied the Granger Causality test in my paper and the reviewer wrote me the following:

*the statistical analysis was a bit short – usually the Granger-causality is followed by some vector autoregressive modeling...*What can I respond in this case?

P.S. I had a small sample size and serious data limitation.

Best

Ibrahim

Hi everyone,

I tested an SEM model with 2 IV, 4 mediators and 1 DV on a sample of 1000 participants (see attached figure). Could you please help me to find an estimation for a good sample size using power analysis for this multiple-mediator model.

Best,

Robin

My research is to determine hypoglycemia due to ADR prevalence with total sampling method. Why total sampling needs a minimum sample size ?

What was the study power in this sample size?

I received this comment from a reviewer. I used a questionnaire to collect the data from bio-medical students (790) to know their study experience during covid-19. So, do I need to calculate study power? I was not doing any medical trials etc. So, please suggest to me how to deal with this issue?

I am conducting a diagnostic accuracy study which is being held in two different study sites within a particular region in Ghana.

Using the formula comprehensively explained by Negida et. al. (2019) https://europepmc.org/article/pmc/6683590, I would need prevalence (from a previous study) to arrive at my final sample size.

My question is should I look for a single prevalence from a study held in the region or do I have to look for individual prevalence specific to each study site?

How do i determine the sample size for a study looking at the treatment outcomes of mental health patients in a community house that is a step down from inpatient unit.

The study will look at outcome measures from when participants first enter the house as well as at discharge(up to 14 days) and at 3 months(after community follow-up).

It will also look at the client satisfaction survey results completed at these time frames.

We will be using a convenience sample.

Currently there are 8 people in the houses at a time.

Also if anyone knows, how would you determine if someone is too unwell and should be excluded from the study. Most patients are fairly stable in the house but require additional support prior to discharge to the community. Ive read a number of studies but no one clearly defines how they make that decision.

Thank you for taking the time to help

Kind regards Margaret

I need citation from the previous studies about to include the Pilot study sample size with the final sample size

thank you

I am planning a cross-over design study (RCT) on effect of a certain supplement/medicine on post-exercise muscle pain. There hasn't been any similar study to recent date on the effect of this medicine (or similar medicines) on post-exercise muscle pain. However, some studies have been conducted for effect of this medicine on certain conditions such as hypertension.

As long as I have been searching formulas for estimating sample size, they need information (such as standard deviation, mean, effect size, etc.) from some similar kind of studies which was conducted before.

Is there anyway to estimate a sample size for my RCT with the aforementioned conditions?

Hi, so I conducted a 2*3 factorial experiment with pretest and mediators. the sample size is 422. I ran the model in AMOS and it returns the following error message:

An error occurred while attempting to fit the model.

The sample moment matrix is not positive definite. It could fail to be positive definite for any of the following reasons:

1. The sample covariance matrix or the sample correlation matrix contains a date entry error.

2. The observed variables are linearly dependent (perhaps because the sample size is too small).

3. The sample covariance matrix or sample correlation matrix was computed form incomplete data using the method of

“pairwise deletion”.

4. The sample correlation matrix contains correlation coefficients other than product moment correlations (such as tetrachoric correlations).

For maximum likelihood estimation only, it maybe appropriate to check “allow non-positive definite sample covariance matrices” in the “Analysis Properties” window, or to use the non-positive method.

I was using raw data so 1 and 4 should not be a problem. I also checked the correlation matrix - there's no correlation higher than .85, so I assumed it was fine.

I added factors listwise and finally located one factor - the posttest measurement of a mediator (3-items). Adding this factor yielded the error message. The other factors (including its pretest) are fine.

I guess it is probably because the model is underpowered.

I was wondering if I can change the latent mediator into a manifest item, for both of its pre- and post-tests, and include it into the model? Would there be any articles that I can read or find a solution?

Thank you very much in advance!

We are planning to conduct matched case control study design to identify determinants of fetal macrosomia of neonates delivered. We are trying to get maximum sample size for this study

In my master's thesis, I am doing a mixed method research where I have a quantitative analysis of a survey with 21 items measuring a total of 8 elements of a theoretical model. The sample size is approximately 77.

I have been told that due to my mixed methods approach, can get enough analysis in sticking to some descriptives, and potentially a regression.

My question is therefore if I can combine items that are intended to measure the same construct into one variable and use as dependent variable without having run some kind of factor analysis prior? I do not want the quantitative part to dominate my research and would therefore prefer sticking to only a few quantitative analyses, but of course not run any tests that ignore critical prerequisites.

Hi Reserchers,

I am doing a computer science dissertation on the topic '' Automate text tool to analysis reflective writing''.

The hypothesis set is ‘To what extent is the model valid for assessed reflective writing?’ I just want from the questionnaire( closed ended questions and one open question) to validate the proposed model.

I have used the used the 5 point likert scale for analysing the data, option given strongly agree, agree, neutral, disagree, strongly disagree. The sample size is 10 participants. I have chosen my participate based on their experience, career and knowledge of the reflective writing.

1) Which statistical analysis tool shall I use to analyse 10 sample size to validate the model? Please show me step by step on how to analyse the data?

2) What would be the associated hypothesis?

3) Can I use Content Validity Index with 10 sample size participants on the questionnaires using 5 point likert scale?

4) this step on my research Is it qualitative method or quantitative method?why?

**If you have any suggestion on my hypothesis, the sample size and the tool I need to analyse?**

**Thank you in advance !**

I need to determine sample size for my research by using an online power calculator. Can anyone recommend a best and easy to use calculator? I am a beginner at statistics.

Hi all, I am currently doing a study on cognitive behavioural therapy(CBT) on POI patients, and my null hypothesis is that CBT improves QOL in these patients. my primary outcome would be the general health(GH) portion of the SF-36 questionnaire. Based on previous research done, the standard deviation for GH of PCOS patients were 21.74, and the minimal clinically significant difference is 5 points. Assuming 80% power and 5% significance level, do I have enough information to calculate sample size? Ive read through multiple papers but all of them have different methods to calculate. Do I have to take into account intraclass correlation or variance inflation factor?

when I calculate the sample size to find the mean in ross-sectional study. I recommended use the formula: n=(Z^2 × σ^2)/d^2

Z = 1.96

σ = mean of previous study

C

**an you help me to explain d and how to determine d?**Many statistical tests require approximate normality (normal distribution should be seen approximately). On the other hand, normality tests such as Kolmogorov-Smirnov and Shapiro-Wilk are sensitive to the smallest departure from a normal distribution and are generally not suitable for large sample sizes. They can not show approximate normality (Source: Applied Linear Statistical Model). In this case, the Q-Q plot can show approximately normal.

Based on what is written in the book "Applied Linear Statistical Model", a severe departure from normality is only considered, in this case, parametric tests can no longer be used. But if severe departure is not seen, parametric tests can be used.

What method do you know to detect

**approximate normality**in addition to using a Q-Q plot?I am doing a retrospective quantitative cohort study. Need assistance in calculation a sample size.

This study will rate some (N number) of clinical photographs into 3 categories by different raters (n)? How do I calculate the sample size? Will it be based on N or n. Is there an online sample size calculator? Can anyone help me in working out a solution

For my research, I have to determine the sample size using G*Power. As I am not interested in conducting "pilot" research to see what sample size will be needed, I have to do a priori analysis. However, I am not used to G*Power, and I am a little bit confused. Firstly, my model consists of one IV with two levels & a moderator. Should I consider the total sample size that G*Power comes up with as the sample size per level (group) of the IV or the total for the two levels? Also, is there a "universal" effect size

*d*that I can use as there is no previous research related to mine that can suggest an effect size?I really appreciate any help you can provide.

Can anyone please recommend any technique or a good reading to estimate sample size for quantitative survey based study in the field of organizational psychology. I need to collect matching data from employees and their supervisors and planning to use Structural Equation Modeling (SEM) for analysis.

Hello! I'm running a Friedman two-way analysis because my sample is not normally distributed.

I've performed the analyses on different groups, paired on two periods. One of them, although having a considerable difference between the two periods, is not significant (Friedman 1.3 on 1 degree of freedom). I wonder if this is because this group is smaller (n=22) in regards to the others.

I've looking for evidence on Friedman robustness according to sample size but I haven't found anything substantial.

Thanks!

I am about to use this formula for a Discrete Choice experiment with 4 attributes each contains 3 Level. Thus, I have some question that I am cordially requesting to get an answer for :
- How to fixe the true proportion p as the the true choice proportion of the relevant population?
- What relation can be provided between the DCE parameters (attributes, profiles ..) and the sample size calculation formula in term of fixing the minimum sample size required for the full study?

Dear All,

I am working on a data having cost of care as DV. This is a genuinely skewed data reflecting the socioeconomic gap and therefore healthcare financing gap among population of a developing country. Because of this skewness, my data violated normality assumption and therefore was reported using median and IQR. But I will like to analyze predictors of cost of care among these patients.

I need to know if I can go ahead and use MLR or are there alternatives?

The sample size is 1,320 and I am thinking of applying Central Limit theory.

Thanking you for your anticipated answers.

Dimeji

I would like to compare the food security status of two group of people. Here, I will use household hunger scale. sample size is large. then which statistics will be most suitable?

I have two sample size (both relatively small anyway) n=4 and n=19 what statistical tests can I do to compare them effectively accounting for the gap in between the number of samples.

I was initially running a multinomial logistic regression, with multiple predictors. However, the standard error turned out to be huge for the parameters. So, I ran a simple logistic regression with just one predictor, but the standard error was still huge. The possible reasons in my opinion are, that they are bad predictors for the outcome, or that the sample size is small.

While using Propensity Score Matching Approach for an impact assessment of a programme inyervention, what is the acceptable ratio of the sample size for paticipant and non-participant?. Some have advocated for ratio 1:3 for participant to non-participant to allow for easy matching. I need your input please. The sample size of program participant in my study is 2,300. Thanks

Hello everyone,

I am validating an existing questionnaire (it has been validated in different countries and languages) in my country. Unfortunately, the data that we have gathered are unequal; 80% are females and 20% are males. Does this affect the validation process or am I good to proceed with data analysis?

Thanks,

Sara

The study would have N number of examiners who will be provided with a set of photographs of x number. They will be asked to identify whether the photograph which will depict a corneal ulcer is caused by either fungi, bacteria or a parasite. The response can only be one. How many N examiners will I need so that the study has a power of 80% and alpha of 0.05 that the answers provided are not due to chance? Here it is probably related to Fleiss K which I should take as 0.6. Therefore do I need to find for sample size either Number of examiners, or number of photographs, given categories (K) as 3.

As I understand, with a large sample size chi square presents almost always statistical significance, so that we should not use this fit index to base on our acceptance/rejection decisions. What about chi square to degrees of freedom ratio ( χ2/df )? Are there circumstances where its use is advisable and circumstances where it is not?

Please, do not hesitate to suggest me very technical literature on this. I am curious about it and want to learn.

I'm part of a team designing an observational study in which we're going to compare 2 models of intraocular lenses. The main outcome is contrast sensitivity.

I'm having some difficulty in understanding how we are going to handle this variable in statistical analysis and looking at the literature hasn't helped me much. It's also important because, being the main outcome, our sample size calculation depends on that.

Has anyone worked with something similar and can give me some pointers?

We plan to use a Likert scale survey to explore the impact of an intervention on staff attitudes. How do calculate the sample size for paired responses (before and after implementation)? I assume we will use Wilcoxon signed rank test instead of paired T-test to analyse the difference as the data tend to be skewed. Please share any online calculators that I can use or information that I need to calculate the sample size, thanks!

Hydro chemical data analysis for the significance test.

I need help from the experts... my research will be conducted in health facilities that deals with tuberculosis patients. the study population is among TB patients newly diagnosed in health facilities. I planned to conduct a quasi-experimental study to assess the effectiveness of intervention in improving adherence to anti-tuberculosis medication where the outcome variable is dichotomous (adhered/ non adherence) based on calculation of percentage of medication ingested. intervention will be conducted at 8 weeks, with pre and post test evaluation. How do I calculate the sample size?

I have visualized the alteration of the cell wall during drought stress with the confocal microscope. Now, I want to measure the cell size at different time points. Is there anyone who knows how many cells I should measure?

I have studied Arabidopsis at 7 time points (days 0,1,2,4,7,15)

Considering the cut-off, removing items with low loadings may be the best strategy when conducting the Exploratory Factor Analysis. But, is the "Scale purification” ( the process of eliminating items from multi-item scales) sensitive to the sample size and the sample characteristics?

Hi everybody,

here is my problem:

I have several regions which are represented by different isotope values and I would like to detect the variability between these regions and the values. Basically, the data looks like

a b c

0.788 0.759 0.797

0.786 0.756 0.798

...

I have performed the Bonett.Seier.test and it gives me a p-value for a,b. So, I assume this test can be used for any distribution and is not strictly tied to normal distribution, right?

Any suggestions what would be (more) useful?

thanks in advance!

Michael

_____

PS: here is the simply code:

#package

library(intervcomp)

#data

data <- read.csv("mydata.csv", header=TRUE)

#test

Bonett.Seier.test(data$a,data$yb"two.sided",0.05)

# output:

$Statistic

[1] -1.676137

$p.value

[1] 0.09371146

$Estimate

[1] 2.212373

[1] 0.8741855

[1] 5.599036

I'm planning to conduct a Pilot Feasibility Study of a digital health intervention for individuals with low back pain. However, I wonder how I have to calculate the sample size, as feasibility will be the primary outcome.

Any help will be really appreciated!

I just read here that normality tests like the Kolmogorov-Smirnov test and Shapiro-Wilks test are basically useless: https://www.spss-tutorials.com/spss-shapiro-wilk-test-for-normality/

"Limited Usefulness of Normality Tests

The Shapiro-Wilk and Kolmogorov-Smirnov test both examine if a variable is normally distributed in some population. But why even bother? Well, that's because many statistical tests -including ANOVA, t-tests and regression- require the

**normality assumption**: variables must be normally distributed in the population. However,**the normality assumption is only needed for small sample sizes**of -say- N ≤ 20 or so. For larger sample sizes, the sampling distribution of the mean is always normal, regardless how values are distributed in the population. This phenomenon is known as the central limit theorem. And the consequence is that many test results are unaffected by even severe violations of normality.So if sample sizes are reasonable, normality tests are often

**pointless**. Sadly, few statistics instructors seem to be aware of this and still bother students with such tests. And that's why I wrote this tutorial anyway.Hey! But what if sample sizes are small, say N < 20 or so? Well, in that case, many tests

*do*require normally distributed variables. However,**normality tests typically have low power in small sample sizes.**As a consequence, even substantial deviations from normality may not be statistically significant. So when you really need normality, normality tests are unlikely to detect that it's actually violated. Which renders them pretty useless."Is this true? Thanks!

Hi,

For my mastersthesis, I am doing an OLS-regression (sample size 75 with 4 independent variables). There seems to be a heteroskedasticity problem and I tried to fix this with robust std errors but afterwards my F-statistic was not significant, before it was.

Anyone tips to fix this?

I have 28 parameters and therefore I figure that I'll need 280 participants at a minimum? Any thoughts appreciated. Thanks

Please guide me with references that how much minimum sample is required to conduct Bibliometric anaysis.

Hi,

I'm making simulations to compute required sample size according to different pre-determined proportions (sensitivity) and different marginal errors (i.e. the half width confidence interval). Pre-determined proportions ranged from 0.65 to 0.99, and marginal errors from 0.05 to 0.20

For example, for a pre-determined proportion of 0.70, with a marginal error of 0.10, the required sample size will be: [1.96²x0.70x(1-0.70)]/0.10²=81

My issue is about high proportions and high marginal errors. For example, if i want to calculate the required sample size for a pre-determined proportion of 0.99 with a marginal error of 0.20, it would mean the lower confidence interval will be 0.79 and the upper one will be 1.19, which does not make sense (a proportion can not be superior to 1).

I think there is a specific formula to compute required sample size in this situation but I hav not found it yet.

Do you have any ideas on how to do this?

Thank you for your advise.