Chapter 7 Scale Reliability And Validity

This is further combined with the first- and second-order reliability methods to create a unique reliability analysis framework. To assess this approach, the deterministic computational homogenisation method is combined with the Monte Carlo method as an alternative reliability method. Numerical examples are used to demonstrate the capability of the proposed method in measuring the safety of composite structures. The paper shows that it provides estimates very close to those from Monte Carlo method, but is significantly more efficient in terms of computational time. It is advocated that this new method can be a fundamental element in the development of stochastic multi-scale design methods for composite structures.

If the construct measures satisfy most or all of the requirements of reliability and validity described in this chapter, we can be assured that our operationalized measures are reasonably adequate and accurate. Herein, a prototype steel box-girder bridge was introduced to illustrate the feasibility of the proposed framework. Parametric studies demonstrated the accuracy and the efficiency of the framework. Influence of an increase in the traffic volume and vehicle weight on the fatigue reliability of the bridge was investigated. The ultimate goal of this study was to apply the stochastic fatigue truck model for probabilistic modeling of fatigue damage and the reliability assessment of welded steel bridge decks. An alternative and more common statistical method used to demonstrate convergent and discriminant validity is exploratory factor analysis .

In order to model the vehicle state more realistically, a cellular automation based traffic flow simulation technique was proposed to simulate the stochastic live load from traffic for long-span bridges . Research work on the topic of the interaction between vehicles and bridges originated in the middle of the 20th century. In the beginning, the vehicle loads were modeled as a constantly moving force, moving mass, or moving mass-spring. Further progress in this research area led to a fully computerized approach for assembling equations of motions of coupled vehicle-bridge, which was proposed by modeling the vehicles as a combination of a number of rigid bodies connected by a series of springs and dampers . On this basis, a 3-D simulation approach including a 3-D suspension vehicle model and a 3-D dynamic bridge model was developed (Shi et al. 2008).

Composite Scoring And Reliability

By increasing variability in observations, random error reduces the reliability of measurement. In contrast, by shifting the central tendency measure, systematic error reduces the validity of measurement. Systematic error is an error that is introduced by factors that systematically affect all observations of a construct across an entire sample in a systematic manner. Unlike random error, which may be positive negative, or zero, across observation in a sample, systematic errors tends to be consistently positive or negative across the entire sample. Hence, systematic error is sometimes considered to be “bias” in measurement and should be corrected. Split-half reliability is a measure of consistency between two halves of a construct measure.

This type of reliability also assumes the equality of the true scores of each item measured (Tau-equivalence hypothesis) so that the different estimators of the internal coherence of the test have a minimal bias . High reliability suggests strong relationships between the measures/items within the measurement procedure. Concurrent validity examines how well one measure relates to other concrete criterion that is presumed to occur simultaneously. For instance, do students’ scores in a calculus class correlate well with their scores in a linear algebra class? These scores should be related concurrently because they are both tests of mathematics. Unlike convergent and discriminant validity, concurrent and predictive validity is frequently ignored in empirical social science research.

In other words, if we use this scale to measure the same construct multiple times, do we get pretty much the same result every time, assuming the underlying phenomenon is not changing? Quite likely, people will guess differently, the different measures will be inconsistent, and therefore, the “guessing” technique of measurement is unreliable. A more reliable measurement may be to use a weight scale, where you are likely to get the same value every time you step on the scale, unless your weight has actually changed between measurements. In the proposed study, the alpha coefficient obtained is 0.869, which indicates a good ability of the items of the questionnaire to evaluate the same latent factor in subjects, neuroticism in our case. The internal consistency itself , based on the scores between each measure/item and the sum of all the others (Cronbach’s Alpha, Guttman indices L1 and L6) which assumes a good homogeneity among the items. To make the demonstration on Cronbach’s alpha possible, SB8, which was a variable previously deleted during factor analysis, was restored in the data set.

Reliability Analysis In Excel

Remember that reliability is a number that ranges from 0 to 1, with values closer to 1 indicating higher reliability. Cronbach’s alpha coefficient, also known as α coefficient, is used to evaluate the internal consistency of the questions asked in this test . Its value generally lies between 0 and 1 and is considered as acceptable when it’s higher than 0.70. A complete and adequate assessment of validity must include both theoretical and empirical approaches. As shown in Figure 7.4, this is an elaborate multi-step process that must take into account the different types of scale reliability and validity. If employee morale in a firm is measured by watching whether the employees smile at each other, whether they make jokes, and so forth, then different observers may infer different measures of morale if they are watching the employees on a very busy day or a light day .

This book is suitable for adoption as a text book or a reference book in an advanced structural reliability analysis course. Abstract A large number of long-span bridges are under construction or have been constructed all over the world. The steady increase in traffic volume and gross vehicle weight has caused a threat to the serviceability or even safety of in-service bridges. Therefore, ensuring the safety and serviceability of these bridges has become a growing concern. In particular, long-span suspension bridges support heavy traffic volumes and experience considerable wind loads on the bridge deck on a regular basis. Excessive dynamic responses may cause large deformation and undesirable vibration of the stiffening girders.

  • A second source of unreliable observation is asking imprecise or ambiguous questions.
  • For instance, if you ask people what their salary is, different respondents may interpret this question differently as monthly salary, annual salary, or per hour wage, and hence, the resulting observations will likely be highly divergent and unreliable.
  • Numerical examples are used to demonstrate the capability of the proposed method in measuring the safety of composite structures.
  • In particular, long-span suspension bridges support heavy traffic volumes and experience considerable wind loads on the bridge deck on a regular basis.
  • Nevertheless, the miscalibrated weight scale will still give you the same weight every time , and hence the scale is reliable.

The best items (say 10-15) for each construct are selected for further analysis. Each of the selected items is reexamined by judges for face validity and content validity. If an adequate set of items is not achieved at this stage, new items may have to be created based on the conceptual definition of the intended construct. Two or three rounds of Q-sort may be needed to arrive at reasonable agreement between judges on a set of items that best represents the constructs of interest. Reliability comes to the forefront when variables developed from summated scales are used as predictor components in objective models.

Therefore, the GMM can provide a reliable connection for the monitored traffic data and the probabilistic modeling of structural fatigue stress range. Furthermore, most dynamic analyses reported in this area have focused on time domain analysis, because of the nature of time-varying differential equations in the interaction system, while very limited developments have been done in the frequency domain. However, incorporating the random vibration in the aforementioned coupled system, which requires and necessitates the use of spectral analysis, is more important and results in more valuable information in the frequency domain. This includes defining each construct and identifying their constituent domains and/or dimensions. Next, we select items or indicators for each construct based on our conceptualization of these construct, as described in the scaling procedure in Chapter 5.

Cronbach’s alpha is not a measure of dimensionality, nor a test of unidimensionality. In fact, it’s possible to produce a high \( \alpha \) coefficient for scales of similar length and variance, even if there are multiple underlying dimensions. To check for dimensionality, you’ll perhaps want to conduct an exploratory factor analysis. After accounting for the reversely-worded items, this scale has a reasonably strong \( \alpha \) coefficient of 0.67 based on responses during the 2008 wave of the ANES data collection. In part because of this \( \alpha \) coefficient, and in part because these items exhibit strong face validity and construct validity , I feel comfortable saying that these items do indeed tap into an underlying construct of egalitarianism among respondents. Once you calculate the composite score, you can move forward with conducting a reliability analysis.

Theory Of Measurement

The printed output facilitates the identification of dispensable variable by listing down the deleted variables in the first column together with the expected resultant alpha in the same row in the third column. For this example, the table indicates that if SB8 were to be deleted then the value of raw alpha will increase from the current .77 to .81. Note that the same variable has the lowest item-total correlation value (.185652). This indicates that SB8 is not measuring the same construct as the rest of the items in the scale are measuring. With this process alone, not only was the author able to come up with the reliability index of the “REGULATE” construct but he also managed to improve on it. What this means is that removal SB8 from the scale will make the construct more reliable for use as a predictor variable.

The integrated approach to measurement validation discussed here is quite demanding of researcher time and effort. Nonetheless, this elaborate multi-stage process is needed to ensure that measurement scales used in our research meets the expected norms of scientific research. Because inferences drawn using flawed or compromised scales are meaningless, scale validation and measurement remains one of the most important and involved phase of empirical research. Criterion-related validity can also be assessed based on whether a given measure relate well with a current or future criterion, which are respectively called concurrent and predictive validity. Predictive validity is the degree to which a measure successfully predicts a future outcome that it is theoretically expected to predict. For instance, can standardized test scores (e.g., Scholastic Aptitude Test scores) correctly predict the academic success in college (e.g., as measured by college grade point average)?

Note that the different types of validity discussed here refer to the validity of the measurement procedures , which is distinct from the validity of hypotheses testing procedures , such as internal validity , external validity , or statistical conclusion validity. •Enabling the development of stochastic multi-scale design composite structures. Now that you have completed your data collection and are armed with your raw data, the data management and analysis can begin! Data management is an important step to successfully completing your results chapter. In many quantitative studies, composite scoring and assessing reliability are key steps in data management and analysis process. We see clearly on this map that the questions N2 to N4 are less correlated with the remaining items of the test.

Reliability Analysis

The reliability analysis will allow you to assess how well the items work together to assess the variable of interest in your sample. Researchers commonly calculate the Cronbach’s alpha to evaluate the reliability of the items comprising a composite score. multi-scale analysis This statistic allows you to make a statement regarding the acceptability of the combination of items to represent your variable. Cronbach’s alphas of at least 0.7 indicate that the combination of items has acceptable reliability (George & Mallery, 2016).

Bucher and Macke , introduced the solutions to the first-passage problem by importance sampling. A probability density evolution method which was capable of capturing the instantaneous PDF and its evolution of the responses was developed by Chen and Li . Applications of first-passage reliability to engineering structures are very interesting since safety assessment and design can be put forward to guarantee the structural safety. Park and Ang assessed the probability of damage for a reinforced concrete structure under the seismic load. Zhang et al. adopted a pseudo-excitation method and a precise integration method to compute the non-stationary random response of 3-D train-bridge systems subjects to lateral horizontal earthquakes. Significant progress in structural reliability evaluation has been achieved in the last decades utilizing nonlinear stochastic structural dynamics .

For instance, if there are two raters rating 100 observations into one of three possible categories, and their ratings match for 75% of the observations, then inter-rater reliability is 0.75. If the measure is interval or ratio scaled (e.g., classroom activity is being measured once every 5 minutes by two raters on 1 to 7 response scale), then a simple correlation between measures from the two raters can also serve as an estimate of inter-rater reliability. A measure can be reliable but not valid, if it is measuring something very consistently but is consistently measuring the wrong construct.

This is a data reduction technique which aggregates a given set of items to a smaller set of factors based on the bivariate correlation structure discussed above using a statistical technique called principal components analysis. These factors should ideally correspond to the underling theoretical constructs that we are trying to measure. The general norm for factor extraction is that each extracted factor should have an eigenvalue greater than 1.0.

Interpreting The Results Of A Reliability Analysis In Excel Using Xlstat

In the event that you do not want to calculate \( \alpha \) by hand (!), it is thankfully very easy using statistical software. Let’s assume that the six scale items in question are named Q1, Q2, Q3, Q4, Q5, and Q6, and see below for examples in SPSS, Stata, and R. A reliable measure is one that contains zero or very little random measurement error—i.e., anything that might introduce arbitrary or haphazard distortion into the measurement process, resulting in inconsistent measurements. However, it need not be free of systematic error—anything that might introduce consistent and chronic distortion in measuring the underlying concept of interest—in order to be reliable; it only needs to be consistent. For example, if we try to measure egalitarianism through a precise recording of a person’s height, the measure may be highly reliable, but also wildly invalid as a measure of the underlying concept. Aligning theoretical framework, gathering articles, synthesizing gaps, articulating a clear methodology and data plan, and writing about the theoretical and practical implications of your research are part of our comprehensive dissertation editing services.