We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure no-reply@cambridge.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
A recent debate on implicit measures of racial attitudes has focused on the relative roles of the person, the situation, and their interaction in determining the measurement outcomes. The chapter describes process models for assessing the roles of the situation and the person-situation interaction on the one hand and stable person-related components on the other hand in implicit measures. Latent state-trait models allow one to assess to what extent the measure is a reliable measure of the person and/or the situation and the person-situation interaction (Steyer, Geiser, & Fiege, 2012). Moreover, trait factor scores as well as situation-specific residual factor scores can be computed and related to third variables, thereby allowing one to assess to what extent the implicit measure is a valid measure of the person and/or the situation and the person-situation interaction. These methods are particularly helpful when combined with a process decomposition of implicit-measure data such as a diffusion-model analysis of the IAT (Klauer, Voss, Schmitz, & Teige-Mocigemba, 2007).
Adequate measurement of psychological phenomena is a fundamental aspect of theory construction and validation. Forming composite scales from individual items has a long and honored tradition, although, for predictive purposes, the power of using individual items should be considered. We outline several fundamental steps in the scale construction process, including (1) choosing between prediction and explanation; (2) specifying the construct(s) to measure; (3) choosing items thought to measure these constructs; (4) administering the items; (5) examining the structure and properties of composites of items (scales); (6) forming, scoring, and examining the scales; and (7) validating the resulting scales.
This chapter focuses on experimental designs, in which one or more factors are randomly assigned and manipulated. The first topic is statistical power or the likelihood of obtaining a significant result, which depends on several aspects of design. Second, the chapter examines the factors (independent variables) in a design, including the selection of levels of a factor and their treatment as fixed or random, and then dependent variables, including the selection of items, stimuli, or other aspects of a measure. Finally, artifacts and confounds that can affect the validity of results are addressed, as well as special designs for studying mediation. A concluding section raises the possibility that traditional conceptualizations of design – generally focusing on a single study and on the question of whether a manipulation has an effect – may be inadequate in the current world where multiple-study research programs are the more meaningful unit of evidence, and mediational questions are often of primary interest.
The procedure for a preliminary ruling is central in the ‘complete system of remedies’ offered by the Union to its citizens. Since Article 263 TFEU grants only a very reduced standing to ‘non-privileged applicants’, Article 267 TFEU became the main gate for individuals to bring their claims against the EU before the European Court of Justice. Yet, claims for breaches of fundamental rights by the Union are not at all common in the procedure for a preliminary ruling. This chapter investigates the (real) use and (realistic) potential of Article 267 TFEU as a means for the protection of fundamental rights against breaches by the EU institutions. The chapter maps all instances in which individuals used the procedure for a preliminary ruling to bring a claim against the Union for breaches of their fundamental rights since the coming into force of the Treaty of Lisbon. Using this mapping exercise, the chapter identifies how individuals raise this type of claims in the procedure, discusses the accessibility of the procedure for individual applicants, and assesses the shortcomings of the procedure as a means to redress breaches of fundamental rights by the Union. It argues that these shortcomings have to do with the structure and design of the procedure itself.
Parrots are popular companion animals but show prevalent and at times severe welfare issues. Nonetheless, there are no scientific tools available to assess parrot welfare. The aim of this systematic review was to identify valid and feasible outcome measures that could be used as welfare indicators for companion parrots. From 1,848 peer-reviewed studies retrieved, 98 met our inclusion and exclusion criteria (e.g. experimental studies, captive parrots). For each outcome collected, validity was assessed based on the statistical significance reported by the authors, as other validity parameters were rarely provided for evaluation. Feasibility was assigned by considering the need for specific instruments, veterinary-level expertise or handling the parrot. A total of 1,512 outcomes were evaluated, of which 572 had a significant P-value and were considered feasible. These included changes in behaviour (e.g. activity level, social interactions, exploration), body measurements (e.g. body weight, plumage condition) and abnormal behaviours, amongst others. Many physical and physiological parameters were identified that either require experimental validation, or veterinary-level skills and expertise, limiting their potential use by parrot owners themselves. Moreover, a high risk of bias undermined the internal validity of these outcomes, while a strong taxonomic bias, a predominance of studies on parrots in laboratories, and an underrepresentation of companion parrots jeopardised their external validity. These results provide a promising starting point for validating a set of welfare indicators in parrots.
This chapter is written for conversation analysts and is methodological. It discusses, in a step-by-step fashion, how to code practices of action (e.g., particles, gaze orientation) and/or social actions (e.g., inviting, information seeking) for purposes of their statistical association in ways that respect conversation-analytic (CA) principles (e.g., the prioritization of social action, the importance of sequential position, order at all points, the relevance of codes to participants). As such, this chapter focuses on coding as part of engaging in basic CA and advancing its findings, for example as a tool of both discovery and proof (e.g., regarding action formation and sequential implicature). While not its main focus, this chapter should also be useful to analysts seeking to associate interactional variables with demographic, social-psychological, and/or institutional-outcome variables. The chapter’s advice is grounded in case studies of published CA research utilizing coding and statistics (e.g., those of Gail Jefferson, Charles Goodwin, and the present author). These case studies are elaborated by discussions of cautions when creating code categories, inter-rater reliability, the maintenance of a codebook, and the validity of statistical association itself. Both misperceptions and limitations of coding are addressed.
Demoralization, a prevalent form of psychological distress, significantly impacts patient care, particularly in terminally ill individuals, notably those diagnosed with cancer. This study aimed to assess psychometric properties of Farsi version of Demoralization Scale-II (DS-II) in Iranian cancer patients.
Methods
This study was descriptive-analytical cross-sectional research. The statistical population was cancer patients who sought treatment at Imam Khomeini Hospital in Tehran throughout the 2021–2022. In the initial phase of the study, a preliminary sample comprising 200 patients was carefully selected through convenience sampling. After applying these criteria, 160 patients satisfactorily completed the questionnaires, forming the final study sample. They completed series of questionnaires that included sociodemographic information, DS-II, Scale of Happiness of the Memorial University of Newfoundland, and Beck Depression Inventory (BDI-II). The evaluation included exploratory factor analysis, confirmatory factor analysis (CFA), assessments of convergent validity, and internal consistency reliability.
Results
The CFA revealed a 2-factor model consistent with the original structure. The specific fit indices, including the Comparative Fit Index, Root Mean Square Error of Approximation, and Goodness-of-Fit Index, were 0.99, 0.051, and 0.86, respectively. Significant correlation coefficients (p < 0.05) were found between the DS-II and the Beck Depression and MUNSH Happiness scales. The internal consistency of the DS-II, as measured by Cronbach’s alpha, yielded values of 0.91 for the meaning and purpose factor, 0.89 for the coping ability factor, and 0.92 for the total score.
Significance of results
The Farsi version of DS-II has demonstrated reliability and validity in evaluating demoralization among cancer patients in Iran. This tool can offer valuable insights into the psychological problems of terminally ill patients. Further research opportunities may include conducting longitudinal studies to track demoralization over time and exploring the impact of demoralization on the overall well-being and care of terminally ill patients in Iranian society.
The aim of this study was to develop the Nurse Competency Assessment Scale in Disaster Management (NCASDM) and to conduct psychometric evaluation.
Methods
It is a scale development study. Research data were collected between January and May 2023. In the sample of the study, as stated in the literature, it was aimed to reach at least 10 times the number of draft scale items (n = 600). The psychometric properties of the scale were tested with 697 nurses working in four different hospitals. A three-stage structure was used in the analysis of data: (1) creating the item pool, (2) preliminary evaluation of items, (3) refining of the scale and evaluation of psychometric properties. The content validity, construct validity, internal consistency, and temporal stability of the scale were evaluated according to the scale development guidelines.
Results
The scale items were obtained from online, semi-structured, in-depth individual interviews conducted with nurses who experienced disasters or worked in disasters. The content validity index of the scale was found to be 0.95. According to the exploratory factor analysis, it was found that the scale consisted of 43 items and two subscales, and the subscales explained 79.094% of the total variance. The compliance indices obtained as a result of confirmatory factor analysis were acceptable and at good levels.
Conclusions
The NCASDM was found to be a psychometrically valid and reliable measurement tool. It can be used to evaluate the competency of nurses related to disaster management.
Many books have been written on the topic of second language assessment, but few are easily accessible for both students and practicing language teachers. This textbook provides an up-to-date and engaging introduction to this topic, using anecdotal and real-world examples to illustrate key concepts and principles. It seamlessly connects qualitative and quantitative approaches and the use of technologies, including generative AI, to language assessment development and analysis for students with little background in these areas. Hands-on activities, exercises, and discussion questions provide opportunities for application and reflection, and the inclusion of additional resources and detailed appendices cements understanding. Ancillary resources are available including datasets and videos for students, PowerPoint teaching slides and a teacher's guide for instructors. Packed with pedagogy, this is an invaluable resource for both first and second language speakers of English, students on applied linguistics or teacher education courses, and practicing teachers of any language.
In Chapter 12, the author discusses approaches to judging the effectiveness of both criterion-referenced and norm-reference performance assessments. The chapter includes guidelines for helping create or evaluate performance assessments for particular contexts. It also describes how to use statistical techniques for the same purpose. The chapter presents actual ratings from a classroom-based group oral discussion test and shows how teachers used statistics to determine both score dependability and reliability. The author discusses how to calculate coefficient agreement to help determine the dependability of the assessment and Cronbach’s Alpha to help determine its reliability. A major point of the chapter is that when certain conditions exist, test users can exploit an assessment to determine test takers’ mastery of language criteria (criterion-referenced purpose) and to compare their abilities (norm-referenced purpose). The author provides an appendix that shows readers how to use Excel software to calculate the statistics in the chapter.
In Chapter 4, the author introduces the concept of validity. The chapter begins with an exploration of approaches to defining a construct. These approaches include using language theory, a language needs analysis, corpora, and curriculum objectives to help test developers determine what specific language ability they desire to measure. The chapter emphasizes the importance of alignment, which relates to how well the test content and test taker response processes match the construct’s content and the response processes that the test aims to measure. The author uses a detailed example of assessing children’s ability to communicate on a playground in a second language. The major point of the example is that the assessment should require children to use the same kinds of language they use when they communicate on the playground. This alignment helps ensure that the assessment measures the targeted language ability and will lead to positive washback on teaching and learning.
This methodological study aimed to establish the validity and reliability of the Turkish version of the Information Concealment Scale for Caregivers of palliative care patients.
Methods
The study was conducted between January and June 2023 with 155 caregivers who cared for patients hospitalized in the palliative care units of 2 hospitals in Istanbul, Turkey. Exploratory factor analysis and confirmatory factor analysis were performed for validity analysis. Cronbach’s α, item-total correlation, intraclass correlation coefficient (ICC), and Pearson correlation analysis were used for reliability analysis.
Results
Of the participants, 54.2% were female and 69% were married. The mean age was 37.96 ± 12.25 years. According to the exploratory factor analysis, the scale consisted of 3 subscales and 15 items. The first subscale of the scale was expressed as “misrepresentation of the disease’; the second subscale was defined as “concealment of information”; the third subscale was defined as “misrepresentation of the real situation.” As a result of the modifications made in confirmatory factor analysis, the goodness-of-fit values were as follows: CMIN/DF(X2/Sd) = 175.16/815 = 2.16; GFI = 0.88; CFI = 0.91; RMSEA = 0.079; RMR = .070; NFI = 0.90. The Cronbach’s α values of the subscale were between 0.79 and 0.87. ICC values were between 0.90 and 0.95 at a confidence interval of 95%. A positive correlation was determined between the subscales.
Significance of results
It was determined that the Turkish version of the Information Concealment Scale was a valid and reliable tool for caregivers.
Existing self-rated depression measurement tools possess a range of psychometric drawbacks, spanning a range of validity and reliability constructs. The gold standard self-rated depression scales contain several variable items that are often non-specific, require respondents to have a certain level of language understanding and limited scoring options resulting in low sensitivity. The Maudsley three-item visual analogue scale (M3VAS) was developed to address these challenges.
Aims
This study aimed to translate the M3VAS into Chinese and test its reliability and validity.
Method
First, both M3VAS scales (assessing current severity and change in severity) were translated according to a standardised protocol to finalise the Chinese version. Reliability and validity were then examined among 550 young people with moderate to severe depression (patient health questionnaire-9 (PHQ-9) score ≥15) in a cross-sectional opportunistic questionnaire survey.
Results
The content validity of each item (six items, across both scales) ranged from 0.83 to 1.00. Exploratory factor analysis denoted a total of two common factors, with a variance contribution rate of 64.34%. The total score correlated positively with the PHQ-9 total score (r = 0.241, P < 0.01). The Chinese version of the M3VAS had good reliability and validity values, and the confirmatory factor model fit well.
Conclusions
The psychometric properties of the Chinese version of the M3VAS suggest that this scale can feasibly evaluate depression among young people in China.
Despite the growing interest in the prevalence and consequences of loneliness, the way it is measured still raises a number of questions. In particular, few studies have directly compared the psychometric properties of very short measures of loneliness to standard measures.
Methods
We conducted a large epidemiological study of midwife students (n = 1742) and performed a head-to-head comparison of the psychometric properties of the standard (20 items) and short version (3 items) of the UCLA Loneliness Scales (UCLA-LS). All participants completed the UCLA-LS-20, UCLA-LS-3, as well as other measures of mental health, including anxiety and depression.
Results
First, as predicted, we found that the two loneliness scales were strongly associated with each other. Second, when using the dimensional scores of the scales, we showed that the internal reliability, convergent-, discriminant-, and known-groups validities were high and of similar magnitude between the UCLA-LS-20 and the UCLA-LS-3. Third, when the scales were dichotomized, the results were more mixed. The sensitivity and/or specificity of the UCLA-LS-3 against the UCLA-LS-20 were systematically below acceptable thresholds, regardless of the dichotomizing process used. In addition, the prevalence of loneliness was strikingly variable as a function of the cut-offs used.
Conclusions
Overall, we showed that the UCLA-LS-3 provided an adequate dimensional measure of loneliness that is very similar to the UCLA-LS-20. On the other hand, we were able to highlight more marked differences between the scales when their scores were dichotomized, which has important consequences for studies estimating, for example, the prevalence of loneliness.
The Personal Need for Structure (PNS) scale assesses individuals’ tendency to seek out clarity and structured ways of understanding and interacting with their environment. The main aim of this study was to adapt the PNS scale to Spanish and assess its psychometric properties. There are two versions of the PNS scale being used, which vary in the number of dimensions (1 vs. 2), and in the number of items (12 vs. 11; because one version excludes Item 5). Therefore, an additional aim of this study was to compare the two existing versions of the PNS scale. This comparison aimed to address the debate regarding the inclusion of Item 5, and the number of dimensions that comprise the PNS scale. A sample of 735 individuals was collected. First, through an approach combining exploratory and confirmatory analyses, evidence was found in favor of the scale being composed of two related but distinguishable factors: Desire for Structure and Response to the Lack of Structure. Scores on these subscales showed acceptable internal consistency and test-retest reliability. Evidence supporting the invariance of the internal structure across sociodemographic variables such as gender and age was found. Validity evidence was also analyzed by examining the relationships with other relevant measures. The results indicated that Item 5 can be excluded without reducing scores validity or reliability, which supports preceding research in the literature. In conclusion, the PNS scale was satisfactorily adapted to and validated in Spanish and its use in this context is recommended.
The focus of the chapter is on turning concepts into measurable variables, also known as operationalization.Best practices in created variable based on the concept are covered.The differences between nominal, ordinal, and interval level variables are discussed along with their relevance for statistical tests later covered in the book.The importance of measurement validity and reliability and their implications for research provide another focus of the chapter.
Datasets serve as crucial training resources and model performance trackers. However, existing datasets have exposed a plethora of problems, inducing biased models and unreliable evaluation results. In this paper, we propose a model-agnostic dataset evaluation framework for automatic dataset quality evaluation. We seek the statistical properties of the datasets and address three fundamental dimensions: reliability, difficulty, and validity, following a Classical Test Theory (CTT). Taking the named entity recognition (NER) datasets as a case study, we introduce nine statistical metrics for a statistical dataset evaluation framework. Specifically, we investigate the reliability of a NER dataset with three metrics, including Redundancy, Accuracy, and Leakage Ratio. We assess the dataset difficulty through four metrics: Unseen Entity Ratio, Entity Ambiguity Degree, Entity Density, and Model Differentiation. For validity, we introduce the Entity Imbalance Degree and Entity-Null Rate to evaluate the effectiveness of the dataset in assessing language model performance. Experimental results validate that our evaluation framework effectively assesses various aspects of the dataset quality. Furthermore, we study how the dataset scores on our statistical metrics affect the model performance and appeal for dataset quality evaluation or targeted dataset improvement before training or testing models.
To wit, we have three specific goals here. First, we want to review the activities of the three-hatted pollster. We do this to provide greater context for each type of pollster. Some of us are all three; others are some combination of these. Any pollster worth their salt must at least be a data scientist, or they risk losing credibility.
Second, we explore the role of the pollster in society. Ultimately, what is the purpose of the pollster? In our view, pollsters are critically important in any democracy. We believe this is often overlooked due to the ranking frenzy after every electoral cycle. Here, we put the profession into proper perspective.
And third, we discuss the use of non-survey, or alternative data, inputs as proxy measures for public opinion. We provide a framework for pollsters to think through them in a critical manner. Validation is a key concept which we introduce here – one more tool for the data scientist.
This chapter describes some commonly used nonhuman paradigms for assessing animal behavior and the figures that are used to present those data. The chapter opens with an overview of some animal species used in neuroscience research, a discussion about nonhuman housing, and a description of types of validity that behavioral neuroscientists concern themselves with. The behavioral tests described here are divided into five major categories: motor behaviors; pain; learning and memory; mental disorders such as anxiety, depression, and substance use disorder; and social behaviors. Included is a description of a survival analysis and an explanation of interpreting Kaplan–Meier curves.
Contestations about the contents and validity of laws and legal principles are fundamental to the (international) legal profession. After all, when engaging with legal norms, disagreements about their meaning and validity a central part of the day-to-day work of legal professionals specialising in international law, including legal counsel representing governments, international judges, legal officers working for international organisations and non-governmental organisations, and legal academics. We propose a practice-oriented approach to empirically research such interpretive legal contestations by groups of legal professionals. Using an interdisciplinary perspective, we contribute to IR norms research by drawing on not only IR practice theory, but also Bourdieu-inspired research within the Sociology of International Law and ongoing discussions on legal realism in International Legal Theory, including what we have called European New Legal Realism. After outlining how to implement our approach using either a Bourdieusian perspective or the concept of communities of practice, we use normative contestations in and around climate change law to illustrate its added value. Such an approach not only promises to make interpretive legal contestations visible empirically, but also emphasises how interpretive legal contestations matter as they reflect underlying power dynamics and may result in normative legal change in practice.