Comparison of mastery criteria applied to individual targets and stimulus sets on acquisition of tacts, intraverbals, and listener responses

Maria Clara Cordeiro , Tiffany Kodak, Jessi Reidy, Abigail Stoppleworth, Karly Zelinski and Andrea Jainga
Marquette University
Abstract
Mastery criteria can be applied to individual targets or stimuli organized into sets. Wong et al. (2021) and Wong and Fienup (2022) found that participants who received special education services learned sight words more rapidly when an individual target mastery criterion was applied. The current study replicated and extended these findings across novel skills. Five participants with ASD received tact or intraverbal training in Experiment 1, and 2 participants with ASD received auditory–visual conditional discrimination training (AVCD) in Experiment 2. In both experiments, mastery criteria were applied to targets and stimulus sets to compare sessions to mastery. Results showed the target mastery criterion required fewer sessions of tact training for 3 of 5 participants and AVCD training for both participants. However, overselection of stimuli occurred for 20% of AVCD mastered targets, suggesting a false positive for acquisition of those targets. Maintenance was similar across conditions and experiments.
Key words: autism spectrum disorder, false positives, listener training, mastery criteria, tact training
We thank Lauren Debertin, Kirsten Lloyd, Marisa McKee, Courtney Meyerhofer, Diana Meredith, Alyssa Scott, and Xi’an Williams for their assistance with data collection.
Address correspondence to: Dr. Tiffany Kodak, 525 N 6th St., Marquette University, Milwaukee, WI 53203.
Email: tiffany.kodak@marquette.edu doi: 10.1002/jaba.946
© 2022 Society for the Experimental Analysis of Behavior (SEAB).
Discrete-trial instruction (DTI) involves the arrangement of antecedents (e.g., motivating operations, discriminative stimuli) to occasion target behavior and a preferred consequence to increase the future likelihood of behavior. Although DTI programs include these components, their arrangements vary. One component that can vary is the arrangement of stimuli into teaching sets and the mastery criteria applied to those stimuli. For example, a set of three targets can be taught simultaneously (Maurice et al., 1996), and the learner completes training of one stimulus set and begins instruction for a second stimulus set once all three targets in that first set meet a mastery criterion (e.g., Kodak et al., 2020). Alternatively, stimuli may be taught individually or as a fluctuating set (Lovaas, 1981), and the mastery criterion may be applied to each target. Once an individual target is mastered, the mastered target is replaced by a new target and training with the newly constituted set continues (Knutson et al., 2019).
The selection of mastery criteria can influence the efficacy of instruction and maintenance of skills. Research suggests that more stringent mastery criteria (e.g., 90%-100%) result in acquisition of skills that maintain for longer durations, in comparison to less stringent mastery criteria (e.g., 80%; Fuller & Feinup, 2018; Richling et al., 2019). Further, the application of mastery criteria to sets of stimuli versus individual targets can influence the efficiency of instruction (Wong et al., 2021; Wong & Fienup, 2022). For example, teaching stimuli in sets may result in extended exposure to instruction if acquisition of one or more targets in the set is delayed while the learner consistently responds correctly to all other targets in the set (e.g., Kodak et al., 2020). In this case, use of mastery criteria applied to targets may prevent the loss of instructional time because any mastered targets can be replaced with novel targets.
Researchers have recently begun to investigate the effects of mastery criteria applied to stimulus sets versus individual targets (e.g., Wong et al., 2021; Wong & Fienup, 2022). Wong et al. (2021) compared the effects of mastery criteria applied to stimulus sets (set analysis; SA) and individual targets (operant analysis; OA) on skill acquisition and maintenance. Four learners who received special education services participated, and textual responses to sight words were targeted for instruction. In the SA condition, each set of four sight words was mastered once correct responding reached 100% for one session. In the OA condition, an individual target within the set of three sight words was mastered once correct responding to that target reached 100% for one session. Once all sight words were mastered in the OA condition, the experimenters applied the OA mastery criterion to the remaining SA targets. All four participants mastered more sight words in the OA condition. Two of the four participants maintained a higher percentage of responding to targets taught in the SA condition, and two participants showed similar maintenance of responses to targets taught in both conditions.
Although novel outcomes were obtained by Wong et al. (2021), the experimenters applied a decision-making protocol to targets that were not quickly mastered, and this modification was applied disproportionately to the OA condition. In addition, the authors discontinued use of the SA mastery criterion once all sight words were taught in the OA condition, thereby preventing an analysis of differences in the efficiency of instruction of targets exposed to varying mastery criteria. Wong and Fienup (2022) addressed these limitations by conducting a replication that maintained the same instructional procedures across SA and OA conditions. The authors also modified the mastery criterion from 100% correct responses for one session to two consecutive sessions to address potential confounds in evaluating maintenance. Three participants who received special education services were taught sight words arranged in SA and OA conditions. All participants acquired textual responses more rapidly in the OA condition, replicating findings in Wong et al. However, participants did not show differences in maintenance across conditions, as observed by Wong et al. Instead, maintenance of responses to targets was similar across conditions.
The findings of Wong et al. (2021) and Wong and Fienup (2022) provide preliminary support for the application of mastery criteria to individual targets. Nevertheless, these studies compared set and target mastery criteria[1]applied to teaching of sight words only. Fienup and Carr (2021) suggested continued research on mastery criteria applied to different skills. Thus, replications of Wong and Fienup should be conducted with other skills that are likely to be grouped into stimulus sets for instruction (e.g., tacts, auditory–visual conditional discrimination [AVCD]) and for which target and set mastery criteria may be arranged in practice. Also, use of mastery criteria applied to targets could result in false positives for acquisition. For example, during AVCD training in practice, a learner could touch the same comparison stimulus across all trials, and that target would be considered mastered according to an individual target mastery criterion. This would be an example of a false positive for acquisition of that target; acquisition of the target occurs when the participant touches the picture (e.g., a hippo) when and only when the auditory stimulus that corresponds to that picture (i.e., “hippo”) is presented. Touching the picture of the hippo should not occur when other auditory stimuli in the stimulus set (e.g., “alligator” and “flamingo”) are presented. This type of false positive could not occur when a set mastery criterion is applied to training, because the learner must engage in correct responses to all stimuli in the set to meet mastery. Further analysis of response patterns during teaching should be included to identify instances of false positives produced by use of certain acquisition mastery criteria. For example, learners who select the same target in every trial during AVCD training would demonstrate responding that is considered mastered according to the target mastery criterion used by Wong et al. (2021) and Wong and Fienup (2022).
The purpose of the current investigation was to systematically replicate and extend Wong et al. (2021) and Wong and Fienup (2022) by investigating the application of mastery criteria to sets and targets across different skills frequently taught in skill-acquisition programs; tact and intraverbal training occurred in Experiment 1, and AVCD training occurred in Experiment 2. In addition, we analyzed data for potential false positives of acquisition during AVCD training.
Experiment 1
Method
Participants and Setting
Five children with a medical diagnosis of autism spectrum disorder (ASD) participated in Experiment 1. All participants were previously exposed to mastery criteria applied to sets of stimuli in daily programming. Two participants (Omar and Tim) also had prior exposure to mastery criteria applied to individual targets during tact training for a brief period more than 2 years prior to the current investigation. All participants engaged in vocal-verbal behavior as their primary mode of communication.
Omar was a 7-year-old, Middle Eastern boy who had received 4 years of behavior-analytic intervention. His tact and listener repertoires were within Level 3 (30-48 months) on the Verbal Behavior Milestones Assessment and Placement Program (VB-MAPP; Sundberg, 2008). Josh was an 8-year-old, European American boy who communicated with one- to three-word phrases and had received 1 year of behavior-analytic intervention. His tact and listener repertoires were within Level 2 (18-30 months) on the VBMAPP. Tim was an 8-year-old, European American boy who had received 5 years of behavior analytic intervention to reduce severe problem behavior and increase functional communication. His tact and listener repertoires were within Level 3 (30-48 months) on the VB-MAPP. He received 3 months of telehealth services at the start of the investigation and required caregiver assistance for the entire duration of appointments. Connor was a 9-year-old, European American adolescent who spoke in complete sentences, participated in general education for academic instruction at school, and had received 1 year of in-person behavior-analytic intervention. He received 3 months of telehealth services prior to his participation and engaged in intervention independently (i.e., with no caregiver assistance). Billy was a 6-year-old, European American boy who had received 2 years of behavior-analytic intervention. His tact and listener repertoires were within Level 3 (30-48 months) on the VB-MAPP.
All sessions were conducted in a quiet room with minimal distractions; the location of sessions remained the same across sessions for each participant. In-person sessions took place in a university-based clinic in the Midwest for Omar, Josh, and Bill, whereas sessions for Tim and Connor occurred in their home via telehealth. A chair, table, and relevant instructional and data collection materials were present during sessions.
Materials
Laminated stimulus cards (approximately 12.7 cm x 9.4 cm) with images printed in color were utilized with Omar, Josh, and Billy (Table 1). Stimuli were delivered in PowerPoint® via the Zoom® screen share function for Tim and Connor (Table 1). Math problems for Connor were presented in 239-point black Calibri font centered on a white slide. Single- and double-digit numbers were presented in a vertical array (i.e., numerator on the top and denominator on the bottom) with the division symbol adjacent to the denominator.
Preferred tangible and edible items were included during teaching. Preferred items were identified via previous paired-stimulus or brief multiple-stimulus-without-replacement preference assessments (Carr et al., 2000; Fisher et al., 1992) conducted daily or multiple times per day, depending on the participant, and participant mands (e.g., Omar’s mand “I want the marble machine” resulted in access to a marble run during the reinforcer interval). Connor received tally marks on a whiteboard following correct responses. Each tally mark was worth 15 s of video game play, and Connor was permitted to exchange the points immediately or accumulate and exchange them at the end of the session.
| Table 1
Participant Target Stimuli Across Experiments
|
Response Measurement
Data were collected on correct independent responses, errors, and no responses. The primary dependent variable was correct independent responses defined as the participant emitting a vocalization within 5 s that corresponded to the discriminative stimulus (SD). An error was defined as the participant emitting a vocalization other than the targeted response (excluding vocal stereotypy) or multiple responses (e.g., saying, “boat-bus” or “bus-boat” for the target bus). No response was defined as no emission of vocalizations within 5 s of the SD. Correct independent responses were converted to a percentage by dividing the number of trials with a correct independent response by the total number of trials and multiplying by 100.
Data also were collected on overtraining at the conclusion of the condition comparison. Overtraining trials in the set condition were calculated using an identical method to Wong et al. (2021) and Wong and Fienup (2022) for their SA condition. That is, data from the set condition were graphed per target stimulus. Once a target in the set condition reached mastery, we counted the number of additional sessions of training completed for that target in order to meet the set mastery criterion. For example, if target 1 in a set met the target mastery criterion in three sessions, but 10 total sessions of training were required to reach the set mastery criterion, we calculated that an additional seven sessions of unnecessary training occurred for target 1. Because each target was presented three times per session, we multiplied the number of additional training sessions conducted past mastery by three (e.g., target 1 had 7 additional training sessions x 3 trials per stimulus = 21 overtraining trials for target 1). We replicated this method with the other two targets in the set to obtain the total number of overtraining trials per set. Then, we summed the number of overtraining trials per set (i.e., added the number of overtraining trials from targets 1, 2, and 3). Finally, the total trials of overtraining were divided by the total trials conducted until the mastery criterion was met and multiplied by 100 to obtain a percentage of overtraining trials per set (i.e., training trials allocated to overtraining). Also, the average number of overtraining trials in the set condition was calculated by dividing the sum of overtraining trials across all sets (e.g., overtraining trials for set 1 + set 2 + set 3, etc.) by the total number of targets per set (i.e., 15). Results of this analysis are reported in Table 2 for Experiments 1 and 2.
Interobserver Agreement and Treatment Integrity
Interobserver agreement (IOA) was calculated using the trial-by-trial method. Two independent observers collected data on all dependent variables from video recordings or in person. An agreement was defined as two observers scoring all the same dependent variables (e.g., incorrect response) during each trial. Interobserver agreement was calculated for a minimum of 34% of sessions for each participant. The number of trials with agreements was divided by the total number of trials per session then converted to a percentage. All sessions with IOA were averaged to calculate the mean agreement for each participant (Table 3).
Treatment integrity (TI) data were also collected for two measures. First, TI data for the teaching procedure were collected on a trial-bytrial basis for a minimum of 34% of sessions per participant. A trained observer collected data on the experimenter’s implementation of all components in each trial according to a protocol. The trained observer scored if the experimenter secured the participant’s attending to the target stimulus, presented the correct instruction, presented prompts at the appropriate prompt delay (i.e., 5 s), delivered the appropriate error correction (when necessary), provided praise and tangible reinforcers at the correct times, and did not add additional treatment components. The trial was scored as correct (1) if the experimenter conducted all components of the trial according to the experimental protocol. Deviations from the protocol resulted in a score of 0 for the trial. The number of trials implemented correctly was divided by the number of trials in a session and converted to a percentage (Table 2).
Treatment integrity data also were collected on the correct implementation of the mastery criterion in each condition for each participant. The mastery criterion was scored as correctly applied to a target or set when the experimenter ended teaching once two consecutive sessions with 100% correct responding occurred for the target or set. If the experimenter continued teaching any of the mastered targets or sets, this was scored as an error (0). The number of targets or sets for which the mastery criterion was correctly applied was divided by the total number of targets or sets in the condition. Mean TI was calculated for each condition by dividing the sum of mastery criteria correctly applied across targets or across participants by the total number of participants, multiplied by 100. Mean TI for correct implementation of the mastery criterion across participants was 100% for the target condition and 100% for the set condition.
Experimental Design
An adapted alternating treatments design (Cariveau & Fetzner, 2022; Sindelar et al., 1985) was used to compare the effects of an individual target versus set mastery criterion on acquisition. Sessions of each condition were alternated until all assigned stimuli were mastered in one condition. Thereafter, sessions of the remaining condition were conducted in succession until all stimuli in the remaining condition were mastered.
Table 2
Sessions of Overtraining Trials in the Target Condition
| Experiment 1 | ||||||
| Overtraining Trials | Omar | Josh | Tim | Connor | Billy | |
| Set 1 (%)
Set 2 (%) Set 3 (%) Set 4 (%) Set 5 (%) Total (%) Total Targets Mastered Average Overtraining Trials per Target |
3 (7%)
3 (8%) 0 9 (20%) 9 (20%) 24 (1%) 15 1.6 |
18 (29%)
40 (56%) 63 (44%) 45 (42%) 27 (38%) 273 (55%) 15 18.2 |
30 (37%)
12 (19%) 12 (22%) 6 (6%) 21 (29%) 81 (30%) 15 5.4 |
21 (33%)
18 (40%) 12 (22%) 30 (48%) 3 (11%) 84 (44%) 15 5.6 |
6 (17%)
6 (17%) 12 (27%) 0 6 (17%) 30 (20%) 15 2 |
|
| Experiment 2 | ||||||
| Overtraining Trials | Omar | Josh | ||||
| Set 1 (%)
Set 2 (%) Set 3 (%) Set 4 (%) Set 5 (%) Total (%) Total Targets Mastered Average Overtraining Trials per Target |
45 (50%)
9 (25%) 3 (11%) 3 (11%) 3 (11%) 63 (35%) 15 4.2 |
30 (56%) 3 (8%)
24 (38%) 6 (83%) 24 (20%) 54 (27%) 15 6 |
||||
Identification of Stimuli and Baseline
A baseline assessment was conducted to select target stimuli for inclusion in each condition and to verify similar levels of incorrect responding to each stimulus prior to teaching (Wong et al., 2021; Wong & Fienup, 2022). Stimuli included in the baseline assessment were selected based on participants’ individualized intervention goals (e.g., adjective–noun tacts). The experimenter presented a stimulus approximately 1 m from the participant’s face and asked, “What is it?” Participants had 5 s to respond, and no differential consequences were provided for responses. A minimum of three presentations of each stimulus were conducted, and stimuli were alternated across trials in sessions. The experimenter presented a mastered task trial after approximately every two trials, and correct independent responses to mastered tasks resulted in 20-s access to a tangible item or a small piece of a preferred edible for all participants except Connor. Connor received a point for each correct independent response that could be exchanged immediately or accumulated and exchanged at the end of the session.
Stimuli to which participants emitted correct independent responses were omitted from the investigation. The experimenters used a logical analysis to equate targets in each set (Cariveau et al., 2021; Wolery et al., 2018) based on visual similarity, overlapping sounds, and number of syllables. Thirty targets were identified per participant, and each condition consisted of 15 targets. Targets were grouped into sets of three in the set condition.
Procedure
One to two sessions of each condition were conducted per day, 3 to 5 days per week. Sessions included nine trials with three targets presented three times each. Targets were presented in a block (e.g., each of the three targets was presented once) before presenting the targets again in a different order. Trials were presented exactly as in baseline, except a vocal model prompt was delivered if 5 s elapsed without a correct independent response. Both correct independent and prompted responses resulted in praise and 20-s access to a tangible item, small piece of a preferred edible, or a point (Connor).
An interspersed mastered tact error-correction procedure was implemented with all participants (Plaisance et al., 2016). Following an incorrect response, the experimenter provided a vocal model of the correct response, provided brief praise following a prompted correct response, and presented a mastered-tact trial using the same 5-s prompt delay procedure to teach study targets. Once the participant engaged in a correct response to the mastered tact, the experimenter re-presented the target stimulus for the trial. This procedure was repeated until a correct independent response was emitted, or five errorcorrection trials occurred without a correct independent response.
Condition Comparison
Set Condition. The purpose of this condition was to evaluate acquisition of tacts or intraverbals when a mastery criterion was applied to a set of targets. Stimuli presented in this condition were organized into five sets of three targets (see supporting information for all participants’ stimulus sets). Mastery was met for the set when the participant engaged in 100% correct independent responses across two consecutive sessions. The mastered set of three targets was then replaced with a novel set of three targets for which instruction was introduced. Sessions were conducted until all five sets were mastered.
Target Condition. The purpose of this condition was to evaluate acquisition of tacts or intraverbals when a mastery criterion was applied to individual targets. Mastery was met for this condition when the participant engaged in 100% correct independent responses to one target across two sessions. The mastered target was then replaced with a novel target and instruction continued. When nearly all the targets were mastered in this condition, a mastered tact or intraverbal from a different program (i.e., none of the 30 targets included in the investigation) was included in the session to ensure that nine trials were conducted during every session. For example, if the participant had mastered 13 of 15 tact targets, and two untrained tact targets remained, a mastered tact not included as a target in either condition was presented along with the two remaining untrained targets during instruction in the nine-trial session.
Maintenance
The purpose of maintenance was to assess if the mastery criterion assigned to each condition resulted in differences in correct responding over time. Maintenance sessions were conducted at 1-, 3-, and 5-week intervals following the day on which the targets were mastered. Once a set of targets in the set condition or an individual target in the target condition was mastered, the set or target was moved to the maintenance phase. Participants did not receive any additional exposure to, nor training of targets outside of maintenance sessions. Maintenance session procedures were like teaching (i.e., correct responses produced reinforcers), except error correction was excluded (i.e., no consequences occurred following an error) and unrelated mastered tasks (tacts or intraverbals not included in either condition) were interspersed approximately every two trials. Correct responses to unrelated mastered tasks resulted in praise and a tangible or point.
Results and Discussion
Participants’ cumulative data across both conditions are presented in Figure 1. Three participants (Josh, Tim, and Connor) acquired targets in fewer teaching sessions in the target condition than in the set condition, whereas differences across conditions were minimal (i.e., difference of ≥ 2 sessions to mastery across conditions) for two participants (Omar and Billy). Josh required 30, Tim required 21, and Connor required eight additional sessions of instruction to reach mastery in the set condition in comparison to the target condition. Omar’s and Billy’s results showed minimal differences between conditions, and they required only one or two additional session(s) of instruction, respectively, to acquire all targets in the set condition in comparison to the target condition. These data show tact and intraverbal responses were acquired in fewer teaching sessions by three participants when a mastery criterion was applied to targets rather than sets.
We observed two patterns in outcomes across conditions for participants, and participants were grouped based on these observed similarities. Figures 2 and 3 show representative data for one participant from each group. Omar (whose data are like Connor’s and Billy’s) showed minimal differences in acquisition of targets assigned to the set mastery criterion (Figure 2). That is, applying a mastery criterion to sets of stimuli did not result in extended exposure to teaching for any targets. In contrast, Josh (whose data are like Tim’s) showed a pattern of extended exposure to teaching for certain targets within stimulus sets (Figure 3). For example, in Set 2, Josh’s correct responding reached 100% for targets 1 and 2 in three teaching sessions. However, increases in correct responding were delayed and correct responding was variable for target 3 in the set. Due to delayed acquisition of target 3, the two other targets (targets 1 and 2) were exposed to an additional 20 sessions of teaching after reaching two consecutive sessions at 100%. Similarly, for Set 3, target 1 reached 100% correct responding in three sessions, and target 3 reached 100% correct responding in eight sessions. Delays in acquisition of target 2 resulted in eight additional sessions of teaching before the entire set met the mastery criterion. This same pattern of delayed mastery based on extended training of one target was observed for all of Josh’s sets.
Table 3 shows overtraining trials in the set condition for each stimulus set and participant. Recall that overtraining trials are additional instruction on individual targets that reached the mastery criterion before the entire stimulus set was mastered. The set condition produced an average of 1.6 to 18.2 overtraining trials per target. This resulted in the allocation of 1% to 55% of instructional time spent teaching targets that were already mastered (individually) due to applying mastery criteria to a set rather than individual targets.

Figure 1: Cumulative Number of Tacts or Intraverbals Mastered Across Conditions in Experiment 1
Figure 4 shows maintenance data at 1-, 3-, and 5-week intervals following mastery for each participant. All participants except Tim demonstrated maintenance of 75% or more of stimuli for both conditions with one exception. In week 5, Josh’s maintenance in the target condition decreased to 7 out of 15 targets, suggesting that less exposure to teaching based on a target mastery criterion could have hindered his longterm maintenance of responding correctly to these targets. An additional phase of maintenance was introduced for Tim due to a pattern of no responses during maintenance probes. In the initial maintenance sessions, Tim responded correctly to the beginning trials of each session. Part way through each session, Tim engaged in a no-response and contacted the absence of prompts. Thereafter, he did not respond during the 5-s response interval in the remaining trials and frequently whispered the correct response after the experimenter removed the stimulus at the end of each trial modified maintenance phase included prompts so that a response was required for every trial. Following this modification, Tim’s correct independent responding increased to levels that were similar across conditions and consistent with other participants’ maintenance data.
Although the results of Experiment 1 show the benefits of implementing a target mastery criterion to maximize learning opportunities in client programming for three of the five participants, it is possible that these benefits may lead to inaccurate outcomes in certain situations. For example, when teaching AVCD, participants demonstrate mastery of targeted discriminations when they select the target only when the corresponding auditory stimulus is presented and not when other auditory stimuli are presented. Thus, a set mastery criterion may be necessary during some types of training (e.g., AVCD) to accurately measure acquisition. The purpose of the second experiment was to evaluatemasterycriteriaappliedtosets versus targets when teaching AVCD and measure whether false positives occurred.

Figure 2: Percentage of Correct Independent Responses Per Target for Sets 1-5 in the Set Condition for Omar in Experiment 1
Experiment 2
Method
Participants, Setting, Materials, and Experimental Design
Omar and Josh participated in Experiment 2 since they each had intervention goals related to AVCD training. Materials included laminated stimulus cards, data collection materials, and preferred items provided during teaching. The experimental design was identical to that in Experiment 1.
Response Measurement
Data were collected on correct independent responses, errors, and no responses. The primary dependent variable was correct independent responses defined as the participant touching the comparison stimulus that corresponded to the auditory sample stimulus during the 6-s response interval. An error was defined as the participant touching a comparison stimulus that did not correspond to the auditory sample or touching more than one comparison stimulus. No response was defined as the participant refraining from touching any stimulus during the response interval. Correct independent responses were converted to a percentage by dividing the number of trials with a correct independent response by the total number of trials and multiplying by 100.
Data also were collected on the occurrence of a response to a stimulus in each trial of the session. Each stimulus served as a S+ on three trials and an S- on six trials per session. If the participant selected the picture of a dog in five of the nine trials, then this stimulus was scored as being selected in five trials. These data were used to calculate the frequency of overselecting a stimulus in trials during the last two sessions in which the target met the mastery criterion. Overselecting stimuli was only calculated in the target condition; overselecting stimuli was not possible when a set reached mastery because participants had to respond correctly to all targets in the set during 100% of trials. Overselecting a stimulus was defined as selecting a stimulus in more trials than it was programmed as an S+ (i.e., selecting the stimulus on more than three trials). We calculated the percentage of targets for which overselection occurred by dividing the number of targets that were multiplying by 100. Any occurrence of overselected when they were not an S+ (during selecting stimuli during the two sessions in the final two sessions of training) by the total which the target was mastered was identified number of stimuli in the condition (15) and as a false positive for acquisition.

Figure 3: Percentage of Correct Independent Responses Per Target for Sets 1-5 in the Set Condition for Josh in Experiment 1

Figure 4: Number of Targets Maintained across Weeks in Experiment 1
Note. Asterisks denote weeks in which data for specific targets were not collected due to extraneous circumstances such as appointment cancellation (e.g., quarantines, absences).
Interobserver Agreement and Treatment
Integrity
Interobserver agreement (IOA) was calculated using the trial-by-trial method exactly as in Experiment 1 (Table 2). Treatment-integrity (TI) data were also calculated on a trial-by-trial basis for a minimum of 47% of sessions per participant. The trained observer scored if the experimenter secured the participant’s visual attending to the stimulus array, presented the correct auditory sample stimulus, repeated the sample every 2 s, presented prompts at the appropriate prompt delay (i.e., 6 s), delivered the identity-matching picture prompt (when necessary), provided praise and tangible reinforcers at the correct times, and did not add additional treatment components. The trial was scored as correct (1) if the experimenter conducted all components of the trial according to the experimental protocol. Deviations from the protocol resulted in a score of 0 for the trial. All other procedures were identical to those in Experiment 1 (Table 2).
Treatment-integrity data also were collected on the implementation of the target and set mastery criteria in an identical manner to Experiment 1. Mean TI for correct implementation of the mastery criteria across participants was 100% for the target condition and 100% for the set condition.
Identification of Stimuli and Baseline
A baseline assessment was conducted to select target stimuli for inclusion in each condition and to verify similar levels of incorrect responding to each stimulus as a measure prior to teaching. The experimenter presented a three-stimulus horizontal array placed approximately 15 cm in front of the participant on the table. The experimenter ensured attending to the array, presented the auditory sample stimulus, and repeated it every 2 s (e.g., “cup,” “cup,” “cup” with 2 s between presentations;
Bergmann et al., 2021). Due to the timing of the sample re-presentation, participants had 6 s to engage in a response. No differential consequences were provided for responding. Stimuli were alternated across baseline trials so that each stimulus was targeted as the correct response on some trials and served as an incorrect comparison stimulus on other trials. The experimenter presented a mastered task approximately every two trials, and correct independent responses to mastered tasks resulted in praise and 20-s access to a tangible or a small edible. Inclusion criteria and procedures for the assignment of stimuli to conditions was identical to Experiment 1.
Procedure
One to two sessions of each condition were conducted per day, 3 to 5 days per week. Sessions included nine trials with three stimuli presented three times each. Stimuli were presented in a block exactly as in Experiment 1 except the locations of correct (S+) and incorrect comparison stimuli (S-) within the three-stimulus array were also counterbalanced (i.e., the location of the S+ occurred in each position once per session per target).
Trials were presented exactly as in baseline, except the experimenter implemented error correction by providing an identity-matching picture prompt (IMPP; Fisher et al., 2007) if the participant did not engage in a correct independent response. During the IMPP, the experimenter held up a visual stimulus that matched the S+ in the array, re-presented the auditory sample (e.g., “cup”), and repeated the auditory sample every 2 s during the 6-s response interval. If an incorrect or no response occurred following the IMPP, the experimenter repeated the auditory sample and physically guided the correct response. Following a prompted correct response, the trial was re-presented to provide an opportunity for a correct independent response (data not included in figures). Error correction continued until the participant engaged in a correct independent response or five error-correction trials occurred. Correct independent and prompted responses resulted in praise and 20-s access to a tangible or a small edible.
The mastery criteria for set and target conditions were identical to those in Experiment 1. All maintenance procedures were identical to Experiment 1 (i.e., procedures were like AVCD teaching without error correction).
Results and Discussion
Figure 5 shows participants’ cumulative data across both conditions. Omar and Josh both required fewer sessions to mastery for stimuli in the target condition in comparison to the set condition. Omar required seven additional teaching sessions to acquire targets in the set condition. Josh required 11 additional teaching sessions to acquire targets in the set condition. These results partially replicate those from Experiment 1 with a different skill, although Omar showed minimal difference in sessions to mastery across conditions in Experiment 1. Thus, the mastery criteria applied to targets versus sets when teaching AVCDs in Experiment 2 resulted in larger differences in sessions to mastery across conditions for Omar.
Both participants acquired AVCD responses to stimuli in the target condition in two to six sessions (black bars; Figure 6). White squares plotted on a secondary y-axis show the percentage of overselecting responses per target. These data are representative of the last two sessions (i.e., sessions in which mastery was met). Stimuli should have been selected in 33% of the trials when reaching mastery. Omar and Josh engaged in overselecting responses to three targets (20% of targets). Although the percentage of overselecting responses per target was never at or near 100%, any overselecting responses could be considered problematic because they represent continued errors despite identification of the target as mastered. Thus, 20% of AVCD stimuli in the target condition produced a false positive for acquisition for both participants. Selection responses for each target were also recorded in the set condition (Figures 7 and 8). However, use of a set mastery criterion eliminated instances of overselecting responses during mastery because the participant could not engage in an incorrect response and reach the 100% mastery criterion for the set.
Omar required the most sessions of teaching for set 1 (Figure 8) due to ongoing errors to targets 1 and 3. He required 10 sessions of teaching to meet mastery for set 1. In comparison, Omar’s responding reached the mastery criterion for sets 2-5 in three or four teaching sessions. Josh engaged in overselecting responses to target 1 that delayed mastery of set 3. If the mastery criterion had been applied to a target rather than the set, target 1 would have been considered mastered despite selection of target 1 in five out of nine trials (rather than the programmed three trials in which target 1 was the S+). However, teaching continued for set 3 and overselecting of target 1 decreased, and Josh’s responding met the mastery criterion for set 3 in seven teaching sessions. Josh’s responding met mastery in the other teaching sets in four to six teaching sessions.
Table 3 shows overtraining trials for the set condition for both participants. Omar had an average of four overtraining trials per target, and Josh had an average of six overtraining trials per target. As a result, 35% and 27% of Omar’s and Josh’s teaching time, respectively, was allocated to overtraining trials.
Omar and Josh maintained targets in both conditions across all weeks with one exception (Figure 9). Omar responded incorrectly to one target in the set condition at week 1; however, he responded correctly to this target in subsequent weeks. These results partially replicate maintenance outcomes of Experiment 1, replicate maintenance data in Wong and Fienup (2022), and further suggest that maintenance of responding to targets was similar across conditions.

Figure 5: Cumulative Number of AVCD Targets Mastered Across Conditions in Experiment 2

Figure 6: Number of Sessions to Mastery and the Percentage of Selection Responses Per Stimulus in the Target Condition for Omar and Josh in Experiment 2

Figure 7: Percentage of Correct Independent Responses and Selection Responses Per Stimulus in the Set Condition for Omar in Experiment 2

Figure 8: Percentage of Correct Independent Responses and Selection Responses Per Stimulus in the Set Condition for Josh in Experiment 2

Figure 9: Number of Targets Maintained across Weeks in Experiment 2
General Discussion
A mastery criterion applied to targets resulted in fewer teaching sessions for tacts and intraverbals for three of five participants in Experiment 1 and AVCD targets for both participants in Experiment 2. These results replicate Wong et al. (2021) and Wong and Fienup (2022) with neurodivergent participants and novel skills. In Experiment 1, we observed patterns of responding similar to previous studies in that three participants acquired tacts or intraverbals in fewer sessions in the target condition in comparison to the set condition. Mastery in the set condition was often delayed due to one target remaining in training after the other targets in the set were mastered, as in previous studies (Wong et al., 2021; Wong & Fienup, 2022).
We observed a similar pattern of delayed acquisition for the set condition in Experiment 2, extending the findings of Wong et al. (2021) and Wong and Fienup (2022) to instruction on listener skills. Omar and Josh required an additional seven and 11 sessions of instruction, respectively, in the set condition in comparison to the target condition in Experiment 2 (AVCD training). Interestingly, there were minimal delays in acquisition during AVCD training due to the consistent selection of one target in the set condition (i.e., selection response per stimulus graphs, Figures 7 and 8). Although some learners with ASD may display consistent biases to specific stimuli (e.g., Kodak et al., 2015), neither participant showed exclusive responding to one stimulus in any of the AVCD teaching sessions across sets. Nevertheless, overselection occurred to 20% of targets in the target condition at the point of mastery (Figure 6). Thus, selecting a target more often than it was programmed for instruction during trials occurred for 20% of targets during the two sessions leading to mastery, which resulted in false positives for acquisition in the target condition.
Overselection of a stimulus in the array can be problematic when a target mastery criterion is used because the mastery criterion can be met despite the participant engaging in biased responding to a stimulus in the array (Grow et al., 2011). For example, if presented with pictures of a Boston Terrier, Weimaraner, and Dachshund, the participant’s responding may meet mastery for the Boston Terrier if that stimulus is selected on most or all trials in the session. This pattern of responding would incorrectly identify mastery of the Boston Terrier during AVCD training, which would result in a false positive for acquisition. This same pattern of biased responding also could occur during tact training if the participant engages in the same response during all tact training trials (e.g., the participant says, “Boston Terrier” on all trials). Although we did not evaluate this pattern of biased responding during tact and intraverbal training in Experiment 1, the overselection results during AVCD instruction in Experiment 2 (i.e., false acquisition for 20% of stimuli in the target condition) suggest this behavior should be measured if a target mastery criterion is applied to instruction. Further, because Wong et al. (2021) and Wong and Fienup (2022) did not report overselection during sight word instruction, it is unclear whether and to what extent persistent patterns of biased responding and false positives occurred in previous studies. Future research should further investigate the likelihood of overselection responses that can lead to false positives across tasks when a target mastery criterion is applied during skill-acquisition programs.
If a target mastery criterion is applied to training, practitioners could revise the target mastery criterion in the current and previous studies to prevent overselection from leading to false acquisition. The target mastery criteria could include criteria related to discriminated responding across targets in addition to correct responses to the target. For example, during AVCD training with a target mastery criterion, practitioners could require learners to respond correctly to the stimulus (e.g., the picture of the hippo) on all trials in which that stimulus is targeted (i.e., when the instructor says, “hippo) and not respond to the stimulus in trials when other stimuli are targeted (e.g., when the instructor says, “alligator” and “flamingo”; when the picture of a hippo is a S-). If these revised criteria were applied in the present investigation, participant responding in Experiment 2 would have met the mastery criteria when the participant (1) engaged in 100% correct independent responses to the target and (2) did not touch the target when it was an S- on 100% of trials across two consecutive sessions.
Although false positives for acquisition were observed for 20% of stimuli in the set condition in Experiment 2, correct responding to those stimuli maintained up to 5 weeks after the end of training. One of those stimuli was associated with one incorrect response in week 1 of maintenance (i.e., the participant responded to that stimulus when it was an S-); thereafter, that stimulus was not associated with incorrect responses. Thus, identification of these stimuli as being associated with false positives for acquisition did not lead to persistent errors following training. Altering the criteria to identify stimuli associated with false positives for acquisition may be necessary in future research and practice to avoid this issue.
As in Wong and Fienup (2022), mastery criteria of 100% correct responses across two consecutive sessions was applied to the set and target conditions in this investigation, whereas Wong et al. (2021) implemented mastery criteria of one session at 100%. The current findings support previous research suggesting more stringent mastery criteria lead to higher levels of response maintenance (Fuller & Fienup, 2018; Wong &
Fienup, 2022). More stringent mastery criteria likely increase the number of pairings of reinforcers with responses in the presence of relevant discriminative stimuli. Further, more stringent mastery criteria applied to stimulus sets might lead to overtraining as the already mastered targets are continually practiced and reinforced while the remaining target(s) in the set is mastered. Overtraining can strengthen stimulus classes (Bortoloti et al., 2013), but there are limited evaluations of overtraining on maintenance of skills (McDougale et al., 2020). Wong et al. observed somewhat lower levels of maintenance in the target condition relative to the set condition (that included overtraining) for two of their four participants when using less stringent mastery criteria, whereas results of Wong and Fienup and the present study suggest more stringent mastery criteria led to comparable maintenance for target and set conditions. Thus, any benefits of overtraining that occur in the set condition may be reduced when more stringent mastery criteria are applied to instruction. However, future research is needed to further evaluate the effects of more stringent mastery criteria on maintenance (Fienup & Carr, 2021).
Results of two participants in Experiment 1 differ from those of Wong et al. (2021) and Wong and Fienup (2022). Omar and Billy had minimal differences (i.e., difference of ≥ two sessions across conditions) in mastery of targets across conditions, suggesting either set or target mastery criteria can be used during tact instruction. Wong and Fienup found large differences in sessions to mastery across conditions, although these differences were typically produced by one or more targets delaying acquisition in the set condition. For Billy and Omar, there were no specific targets in the set condition that required extended training, which resulted in rapid mastery of sets of stimuli. In comparison, consistent errors (i.e., engaging in the same incorrect response across trials; Scott et al., 2021) was a pattern of responding that delayed other participants’ acquisition of specific targets in the set condition. For example, Josh had a consistent error that delayed acquisition in set 2 of the set condition. Consistent errors during instruction can delay acquisition when stimuli are grouped into sets, because one target may require many more exposures to prompts of the alternative response while the other targets in the set are quickly mastered. Although we used a common and empirically based method to select targets and assign them to sets in the set condition (i.e., logical analysis; Wolery et al., 2018), consistent errors to stimuli during prestudy probes is not a variable that is included in this method. Researchers and practitioners who plan to teach stimuli in sets could collect data on consistent errors that occur to specific stimuli during probes and consider either excluding those stimuli from any planned study comparison or using target mastery criteria to prevent overtraining of other targets in the set.
In previous experiments, participants learned textual responses to sight words, whereas the current investigation replicated these outcomes with tacts or intraverbals (Experiment 1) and AVCD (Experiment 2). Taken together, the current and previous studies suggest a mastery criterion applied to individual targets may increase the efficiency of instruction (by reducing sessions to mastery) for these three skills for some neurodivergent learners. Nevertheless, the ease with which the target mastery criterion can be implemented by practitioners is an important consideration. Tracking of mastery and replacement of mastered targets in practice would likely need to be done by Registered Behavior
Technicians® (RBT®) and behavioral technicians, which could be a cumbersome task requiring training to ensure integrity (Brand et al., 2019). In the current investigation, researchers closely monitored acquisition of targets and sets daily. In practice, it may be less likely to have daily oversight by supervisors. To increase the feasibility of implementation, practices may need to be modified to help RBTs® and behavioral technicians identify the point of mastery of targets. Strategies to increase the feasibility and integrity of implementation of target mastery criteria is an important topic for future research.
References
Bergmann, S., Turner, M., Kodak, T., Grow, L. L., Meyerhofer, C., Niland, H. S., & Edmonds, K. (2021). Replicating stimulus-presentation orders in discrimination training. Journal of Applied Behavior Analysis, 54(2), 793–812. https://doi.org/10.1002/jaba.797
Bortoloti, R., Rodrigues, N. C., Cortez, M. D., Pimentel, N., & de Rose, J. C. (2013). Overtraining increases the strength of equivalence relations. Psychology and Neuroscience, 6(3), 357–364. https://doi.org/10.3922/j.psns.2013.3.13
Brand, D., Henley, A. J., DiGennaro Reed, F. D., Gray, E., & Crabbs, B. (2019). A review of published studies involving parametric manipulations of treatment integrity. Journal of Behavioral Education, 28(1), 1–26. https://doi.org/10.1007/s10864-01809311-8
Cariveau, T., Batchelder, S., Ball, S., & La Cruz Montilla, A. (2021). Review of methods to equate target sets in the adapted alternating treatments design. Behavior Modification, 45(5), 695–714. https://doi.org/10.1177/0145445520903049
Cariveau, T., & Fetzner, D. (2022). Experimental control in the adapted alternating treatments design: A review of procedures and outcomes. Behavioral Interventions. Advance online publication. https://doi.org/10.1002/bin.1865
Carr, J. E., Nicolson, A. C., & Higbee, T. S. (2000). Evaluation of a brief multiple-stimulus preference assessment in a naturalistic context. Journal of Applied Behavior Analysis, 33(3), 353–357. https://doi.org/10.1901/jaba.2000.33-353
Fienup, D. M., & Carr, J. E. (2021). The use of performance criteria for determining “mastery” in discretetrial instruction: A call for research. Behavioral Interventions, 36, 756–763. https://doi.org/10.1002/bin.1827
Fisher, W., Piazza, C. C., Bowman, L. G., Hagopian, L. P., Owens, J. C., & Slevin, I. (1992). A comparison of two approaches for identifying reinforcers for persons with severe and profound disabilities. Journal of Applied Behavior Analysis, 25(2), 491–498. https://doi.org/10.1901/jaba.1992.25-491
Fisher, W. W., Kodak, T., & Moore, J. W. (2007). Embedding an identity-matching task within a prompting hierarchy to facilitate acquisition of conditional discriminations in children with autism. Journal of Applied Behavior Analysis, 40(3), 489–499. https://doi.org/10.1901/jaba.2007.40-489
Fuller, J. L., & Fienup, D. M. (2018). A preliminary analysis of mastery criterion level: Effects on response maintenance. Behavior Analysis in Practice, 11(1), 1– 8. https://doi.org/10.1007/s40617-017-0201-0
Grow, L. L., Carr, J. E., Kodak, T. M., Jostad, C. M., & Kisamore, A. N. (2011). A comparison of methods for teaching receptive labeling to children with autism spectrum disorders. Journal of Applied Behavior Analysis, 44(3), 475–498. https://doi.org/10.1901/jaba.2011.44-475
Knutson, S., Kodak, T., Costello, D. R., & Cliett, T. (2019). Comparison of task interspersal ratios on skill acquisition and problem behavior for children with autism spectrum disorder. Journal of Applied Behavior Analysis, 52(2), 355–369. https://doi.org/10.1002/jaba.527
Kodak, T., Clements, A., Paden, A. R., LeBlanc, B., Mintz, J., & Toussaint, K. A. (2015). Examination of the relation between an assessment of skills and performance on auditory-visual conditional discriminations for children with autism spectrum disorder. Journal of Applied Behavior Analysis, 48(1), 52–70. https://doi.org/10.1002/jaba.160
Kodak, T., Halbur, M., Bergmann, S., Costello, D. R., Benitez, B., Olsen, M., Gorgan, E., & Cliett, T. (2020). A comparison of stimulus set size on tact training for children with autism spectrum disorder. Journal of Applied Behavior Analysis, 53(1), 265–283. https://doi.org/10.1002/jaba.553
Lovaas, O. I. (1981). Teaching developmentally disabled children: The me book. University Park Press.
Maurice, C., Green, G., & Luce, S. C. (1996). Behavioral interventions for young children with autism: A manual for parents and professionals. PRO-ED.
McDougale, C. B., Richling, S. M., Longino, E. B., & O’Rourke, S. A. (2020). Mastery criteria and maintenance: A descriptive analysis of applied research procedures. Behavior Analysis in Practice, 13(2), 402–410. https://doi.org/10.1007/s40617-019-00365-2
Plaisance, L., Lerman, D., Laudont, C., & Wu, W. (2016). Inserting mastered targets during errorcorrection when teaching skills to children with autism. Journal of Applied Behavior Analysis, 2, 251–264. https://doi.org/10.1002/jaba.292
Richling, S. M., Williams, W. L., & Carr, J. E. (2019). The effects of different mastery criteria on the skill maintenance of children with developmental disabilities. Journal of Applied Behavior Analysis, 52(3), 707– 717. https://doi.org/10.1002/jaba.580
Scott, A. P., Kodak, T., & Cordeiro, M. C. (2021). Do targets with persistent responses affect the efficiency of instruction? Analysis of Verbal Behavior, 37, 217– 225. https://doi.org/10.1007/s40616-021-00163-4
Sindelar, P. (1985). An adapted alternating treatments design for instructional research. Education and Treatment of Children, 8(1), 67–76.
Sundberg, M. L. (2008). Verbal behavior milestones assessment and placement program: The VB-MAPP. AVB Press.
Wolery, M., Gast, D. L., & Ledford, J. R. (2018). Comparative Designs. In J. R. Ledford & D. L. Gast (Eds.), Single Case Research Methodology. Routledge. https://doi.org/10.4324/9781315150666-11
Wong, K. K., Bajwa, T., & Fienup, D. M. (2021). The application of mastery criterion to individual operants and the effects on acquisition and maintenance of responses. Journal of Behavioral Education. Advance online publication. https://doi.org/10.1007/s10864-020-09420-3
Wong, K. K., & Fienup, D. M. (2022). Units of analysis in acquisition-performance criteria for “mastery”: A systematic replication. Journal of Applied Behavior Analysis. Advance online publication. https://doi.org/10.1002/jaba.915
Received December 15, 2021
Final acceptance June 16, 2022
Action Editor, Daniel Fienup



















