Definition/Introduction

T-test was first described by William Sealy Gosset in 1908, when he published his article under the pseudonym 'student' while working for a brewery.[1] In simple terms, a Student's t-test is a ratio that quantifies how significant the difference is between the 'means' of two groups while taking their variance or distribution into account.

Issues of Concern

Selecting appropriate statistical tests is a critical step in conducting research.[2] Therefore, there are three forms of Student’s t-test about which physicians, particularly physician-scientists, need to be aware: (1) one-sample t-test; (2) two-sample t-test; and (3) two-sample paired t-test. The one-sample t-test evaluates a single list of numbers to test the hypothesis that a statistic of that set is equal to a chosen value, for instance, to test the hypothesis that the mean of the set of numbers is equal to zero. As an example, consider the following question: what is the average serum sodium concentration in adults? Currently, 140 mEq/L serves as an approximate center of a reference range of 135 to 145 mEq/L; thus, the null hypothesis is that the average serum sodium concentration in adults is equal to 140 mEq/L.

If you believe these numbers are wrong (alternate hypothesis) or you want to test the original hypothesis, you could collect blood from a set of subjects, measure the sodium concentration in each sample, and then take the mean of this set. If the mean is 140.1 mEq/L, you probably do not have convincing evidence that the numbers mentioned above are faulty (since 140 and 140.1 are fairly close). Thus, you would fail to reject the null hypothesis. However, if your sample has a mean of 70 mEq/L, this could be preliminary evidence (assuming, of course, rigorous methodology) and could end up rejecting the null hypothesis. The decision-making process would be trickier if the mean of the sample were 134 or 150 mEq/L. The t-test can be used to reduce subjective influence when testing a null hypothesis. Before testing a hypothesis, researchers should choose the alpha and beta values of the test. Loosely, the alpha parameter determines the threshold for false-positive results (e.g., if the actual mean serum sodium concentration is 140 mEq/L, but the t-test rejects the original hypothesis in favor of your new hypothesis), and the beta parameter determines the threshold for false-negative results (e.g., if true mean serum sodium concentration is 200 mEq/L, but the t-test fails to reject the old hypothesis). Methods of selection of alpha and beta are outside the scope of this article.

While the one-sample t-test allows you to test the statistic of a single set of numbers against a specific numeric value, the two-sample t-test allows testing the values of a statistic between two groups. In this case, a research question could be: do children and adults have the same mean serum sodium concentration? Testing this hypothesis would require sampling two groups, a group of adults and a group of children, and comparing the mean serum sodium concentrations between these two groups in a manner analogous to the one-sample t-test described above. The paired t-test is used in scenarios where measurements from the two groups have a link to one another. In the example above concerning the mean serum sodium concentration of children and adults, the implicit assumption was that all the measurements would all be completed at one point in time in a set of children and a distinct set of adults. However, it would also be possible to measure serum sodium concentrations in a set of children, wait a few years until they are adults, then measure the serum sodium concentrations again. Here, each adult sodium concentration corresponds to exactly one child sodium concentration. A paired two-sample t-test can be used to capture the dependence of measurements between the two groups.

These variations of the student's t-test use observed or collected data to calculate a test statistic, which can then be used to calculate a p-value. Often misinterpreted, the p-value is equal to the probability of collecting data that is at least as extreme as the observed data in the study, assuming that the null hypothesis is true.[3] This concept is best illustrated by examples, as in the questions that accompany this article. Often, a threshold value is set prior to the study (equal to the alpha mentioned above); if the resulting p-value is below the preset threshold, there is sufficient evidence to reject the null hypothesis.

In the above scenarios, before using any form of the t-test, one must ensure that the assumptions for the test have been met. This article does not list or explain these assumptions in detail. Please follow the guidance of a trained statistician when designing research studies and conducting data analysis.

Clinical Significance

Given the rate of research progress, disease management (medical or surgical) continuously evolves. To follow the framework of evidence-based medicine, physicians must be able to read and critically evaluate primary literature.[4][5] The ability to do this successfully requires at least a basic foundation of knowledge in statistics, including common biases (e.g., nonresponse bias), standard study designs (e.g., randomized controlled trials), and common statistical pitfalls researchers face (e.g., statistically significant results that are not clinically significant).[6][7] Understanding a student’s t-test is a start to clinicians gaining this necessary foundation of knowledge.


Details

Updated:

1/16/2023 8:12:53 PM

References


[1]

Drummond GB,Tom BD, Statistics, probability, significance, likelihood: words mean what we define them to mean. The Journal of physiology. 2011 Aug 15     [PubMed PMID: 21844004]


[2]

Beath A,Jones MP, Guided by the research design: choosing the right statistical test. The Medical journal of Australia. 2018 Mar 5;     [PubMed PMID: 29490219]


[3]

Andrade C, The {i}P{/i} Value and Statistical Significance: Misunderstandings, Explanations, Challenges, and Alternatives. Indian journal of psychological medicine. 2019 May-Jun;     [PubMed PMID: 31142921]

Level 3 (low-level) evidence

[4]

Ioannidis JP, Why Most Clinical Research Is Not Useful. PLoS medicine. 2016 Jun;     [PubMed PMID: 27328301]


[5]

Ioannidis JP, Why most published research findings are false. PLoS medicine. 2005 Aug;     [PubMed PMID: 16060722]


[6]

Amrhein V,Greenland S,McShane B, Scientists rise up against statistical significance. Nature. 2019 Mar;     [PubMed PMID: 30894741]


[7]

Lang T, Twenty statistical errors even you can find in biomedical research articles. Croatian medical journal. 2004 Aug;     [PubMed PMID: 15311405]