The following is a summary of our publications including journal papers, conference papers, book chapters, reviews, volumes, etc.:
Jim Talley
Talley, J., & Walesiak, B. (2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Talley, J., & Walesiak, B. (2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
The ideal of fully-individualized, dynamically-adjusted L2 pronunciation learning (Couper, 2022; Derwing and Munro, 2015) crucially depends on diagnostic assessments, where goal-setting and goal-focused activities are enabled by an initial needs assessment with adjustments based on periodic formative assessments. Yet Couper’s research shows “very few teachers use diagnostics for pronunciation” (2022, p. 180). Conducting diagnostic assessments, developing/adjusting plans based upon them, and communicating those results/plans is perceived as prohibitively time consuming. We discuss (and illustrate) how a new pronunciation teaching support system, Accent Explorer (AE), potentially helps remediate that time crunch. Whenever a recording made in AE’s student app is submitted to AE’s teacher app, it is extensively annotated with AI-detected, importance-ranked (supra-)segmental issues. The flagged issues are formulated as suggestions for the teacher which may be accepted, adjusted, augmented, or discarded. The teacher-arbitrated issues (plus any added notes) are returned to the student as targeted feedback which the student can fully explore (graphically and auditorily, in addition to textual explanations). The identified issues are also logged, allowing generation of (date range-constrained) summaries of pronunciation issues, supporting diagnostic assessment. Suggestions for optimizing pedagogical practices via this new technology’s assessment (and other) affordances will be offered.
Walesiak, B., & Talley, J. (2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Walesiak, B., & Talley, J. (2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Recent and ongoing advances in language technologies (such as chatbots, Automatic Speech Recognition, Text-to-Speech, and voice conversion) present exciting opportunities for L2 pronunciation improvement (Gottardi et al., 2022) and potentially provide precise and varied pronunciation feedback (Levis and Rehman, 2022). This research investigates the assessment and feedback mechanisms employed in a wide range of pronunciation and speech coaching (PSC) apps, extending prior work on the affordances of Mobile-Assisted Pronunciation Training (Walesiak, 2021). Its point of departure is a qualitative examination of over 50 web and/or Android apps (some of which are AI-based), focusing on the type of pronunciation assessments they employ and on the quality of feedback they provide, including their feedback modalities (textual, visual, and/or auditory). We then systematically structure this information as a comprehensive matrix that categorises the affordances of the PSC apps with respect to their feedback type (Henrichsen, 2020), the hierarchical priority level (Kang and Hirschi, 2023), their incorporation of High Variability Phonetic Training, and attention to intelligibility. The examined PSC apps recruit a range of feedback strategies for varied assessment targets, yet we frequently observe feedback which is not particularly actionable (e.g., binary, scale-based, raw numeric graphs) and oftentimes directed at segmental targets. This picture which emerges from our systematic analysis highlights the fact that a number of research-indicated assessment targets and feedback strategies remain underrepresented in the current PSC app landscape. We discuss the implications and opportunities of this for educators, researchers, and developers alike.
Talley, Jim (2023). Identification of non-native English speakers’ L1s via patterns of prosodic feature deviance from native speaker norms. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (pp. 2736–2740). Guarant International.
Talley, Jim (2023). Identification of non-native English speakers’ L1s via patterns of prosodic feature deviance from native speaker norms. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (pp. 2736–2740). Guarant International.
This study is an exploratory data science-based look at the question of whether the first language (L1) of non-native speakers of English can be identified from only a few simple syllable- and utterance-level prosodic features of their speech. Simple machine learning (ML) modeling on these loudness, pitch, and duration cues yields imperfect, but much better than chance, discrimination (1) between each individual L1 and General American English (GAE), and (2) between the five studied L1s. The described modeling is based upon “atypicality scores” (a-Scores) for the prosodic features, representing the degree to which features deviate, or not, from GAE native speaker norms. The prosodic features, their normalizations, and the aScore characterizations are discussed. Finally, ML-based feature selection analysis examines the individual prosodic features’ relative importance for the individual L1 vs. GAE discrimination tasks and for the 5-way, forced-choice L1 classification task.
Talley, J. (2022). Non-native prosodic deviations from American English norms and their implications for accentedness: The Case of Polish L1 [Conference paper]. 15th International Conference on Native and Non-Native Accents of English Accents 2022, Łódź, Poland.
Talley, J. (2022). Non-native prosodic deviations from American English norms and their implications for accentedness: The Case of Polish L1 [Conference paper]. 15th International Conference on Native and Non-Native Accents of English Accents 2022, Łódź, Poland.
Across the world’s languages, we find four primary acoustic means of signaling prominence – duration (D), pitch (P), loudness (L), and (non-)reduction of vowels. English, somewhat exceptionally, utilizes all four (Chrabaszcz, et al., 2014) in trading relations (Howell, 1993), potentially making mastery of English prosody challenging for ESL/EFL learners.
In this study, intuitively accessible features representing the three suprasegmental (D/P/L) prosodic signals are extracted from two subsets of the Speech Accent Archive (Weinberger, 2015) – Polish speakers of English (PE) and native General American English (GAE) speakers. Those descriptive features are used to examine ways in which PE prosody deviates from that of GAE speakers, and how predictive those deviations are with respect to the perceived accentedness of the PE sentences.
Research on prosody frequently uses (highly technical) descriptive features – e.g., the 1000’s in the openSMILE suite (Coutinho, et al., 2016). But, for this study, a small set of simple features was chosen instead, given that eventual utilization in ESL/EFL didactic contexts was a consideration. The selected descriptive features are both from whole sentences (pause-to-speech ratio/D, speaking rate/D, pitch dynamism/P, and loudness dynamism/L) and from syllable triplets (D/P/L center syllable values and left & right deltas, plus inter-syllable gaps/D). All of the features are automatically extracted, and normalized, after manual review/correction of word boundaries.
There are three steps in the basic analysis pipeline. First, the GAE data are used to construct a simple statistical model for each prosodic feature – estimating native speaker population norms. Those normative GAE models are, in turn, used to characterize the degree to which the observed PE feature values deviate from native-like productions.
Finally, the collected feature deviances from each PE sentence are analyzed with respect to the sentence’s mean accentedness score (as assessed on a 5-point scale by four GAE speakers).
Results show that the observed PE deviations from GAE prosodic norms generally correlate with judgments of accentedness. However, across the range of D/P/L prosodic features, there is substantial variation in the degree of correlation. This paper examines the relative strengths of association for the features (both individually and sub-grouped), especially with respect to their synergistic combination. It, additionally, gives consideration to (1) the features’ potential utility for predicting human accentedness judgments (e.g., in an automated assessment context) and (2) their potential yield as foci for accent mitigation (e.g., in an ESL/EFL context).
Talley, J. (2016). What makes a Bostonian sound Bostonian and a Texan sound Texan? In J. Levis, H. Le, I. Lucic, E. Simpson, & S. Vo (Eds.), Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference, ISSN 2380-9566, Dallas, TX, October 2015 (pp. 168-179). Ames, IA: Iowa State University. [pdf] https://www.iastatedigitalpress.com/psllt/article/15286/galley/13634/view/
Talley, J. (2016). What makes a Bostonian sound Bostonian and a Texan sound Texan? In J. Levis, H. Le, I. Lucic, E. Simpson, & S. Vo (Eds.), Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference, ISSN 2380-9566, Dallas, TX, October 2015 (pp. 168-179). Ames, IA: Iowa State University. [pdf]
This paper introduces a preliminary version of a new methodology for the automated, data-driven discovery of acoustic features of speech which potentially contribute to an accent’s distinctiveness. The results discussed herein, while merely illustrative at this stage, provide reason to be optimistic about the prospects of evolving a truly useful and robust automated methodology for cataloging the characteristic acoustic aspects of accented speech. If this line of research were to fully fulfill its promise, the resulting comprehensive catalog of features would contribute to our explicit knowledge of the correlates of accent. The knowledge represented by such a catalog could potentially be directly applied by teachers of second language pronunciation, and it certainly would inform the development of the more capable and individualized computer-assisted pronunciation training (CAPT) tools of the future.
Talley, J. (2006). Bootstrapping new language ASR capabilities: Achieving best letter-to-sound performance under resource constraints. LREC. http://www.lrec-conf.org/proceedings/lrec2006/pdf/436_pdf.pdf [pdf]
Talley, J. (2006). Bootstrapping new language ASR capabilities: Achieving best letter-to-sound performance under resource constraints. LREC. http://www.lrec-conf.org/proceedings/lrec2006/pdf/436_pdf.pdf [pdf]
One of the most critical components in the process of building automatic speech recognition (ASR) capabilities for a new language is the lexicon, or pronouncing dictionary. For practical reasons, it is desirable to manually create only the minimal lexicon using available native-speaker phonetic expertise and, then, use the resulting seed lexicon for machine learning based induction of a highquality letter-to-sound (L2S) model for generation of pronunciations for the remaining words of the language. This paper examines the viability of this scenario, specifically investigating three possible strategies for selection of lexemes (words) for manual transcription – choosing the most frequent lexemes of the language, choosing lexemes randomly, and selection of lexemes via an information theoretic diversity measure. The relative effectiveness of these three strategies is evaluated as a function of the number of lexemes to be transcribed to create a bootstrapping lexicon. Generally, the newly developed orthographic diversity based selection strategy outperforms the others for this scenario where a limited number of lexemes can be transcribed. The experiments also provide generally useful insight into expected L2S accuracy sacrifice as a function of decreasing training set size
Lee, K-T., Melnar, L., Talley, J., & Wellekens, C. J. (2003) “Symbolic speaker adaptation with phone inventory expansion,” 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. (ICASSP ’03), I-I. https://doi.org/10.1109/ICASSP.2003.1198776 [pdf]
Lee, K-T., Melnar, L., Talley, J., & Wellekens, C. J. (2003) “Symbolic speaker adaptation with phone inventory expansion,” 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. (ICASSP ’03), I-I. https://doi.org/10.1109/ICASSP.2003.1198776 [pdf]
This paper further develops a previously proposed adaptation method for speech recognition called symbolic speaker adaptation (SSA). The basic idea of SSA is to model a speaker’s pronunciation as a blend of speech varieties (SVs) – regional dialects and foreign accents – for which the system has existing pronunciation models. The system determines during an adaptation process the relative applicability of those models, yielding a speech variety profile (SVP) for each speaker. Speaker-dependent lexica for recognition are determined from a speaker’s SVP. In this paper, we discuss a series of experiments designed to analyze how the SSA method is affected by SV-balanced training, expanded phone inventories, reduced amounts of adaptation data, and speech from SVs not modeled by the system. The most dramatic improvements were obtained by using expanded (“SV-inclusive”) phone inventories. SSA was also shown to be effective with a very small number of adaptation sentences. And, SSA’s SV blending scheme yields higher accuracy than using a SV classification scheme for speakers of novel (unseen) SVs.
Melnar, L., & Talley, J. (2003). Phone merger specification for multilingual ASR: The Motorola Polyphone Network. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, August 3-9, 2003 (pp. 1337-1340). ISBN 1-876346-48-5. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/papers/p15_1337.pdf [pdf]
Melnar, L., & Talley, J. (2003). Phone merger specification for multilingual ASR: The Motorola Polyphone Network. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, August 3-9, 2003 (pp. 1337-1340). ISBN 1-876346-48-5. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/papers/p15_1337.pdf [pdf]
This paper describes the Motorola Polyphone Network (MotPoly), a hierarchical, universal phone correspondence network that defines allowable phone mergers for shared acoustic modeling in multilingual and multi-dialect automatic speech recognition (ML-ASR). MotPoly’s organization is defined by phonetic similarity and other language-independent phonological factors. Unlike other approaches to shared acoustic modeling, MotPoly can be effectively used in systems where computational resources are limited, such as portable
devices. Furthermore, it is less constrained by language data availability than other approaches. With MotPoly as part of an overall strategy, Motorola’s Voice Dialog Systems Lab’s ML-ASR team was able to define a set of multilingual acoustic models whose size was only 23% of the largest monolingual model set but whose overall performance was higher than the monolingual models by 1.4 percentage points.
Melnar, L., & Talley, J. (2002). Phone inventory optimization for multilingual automatic speech recognition. The Journal of the Acoustical Society of America, 112(5), 2305-2305. https://asa.scitation.org/doi/10.1121/1.4779285
Melnar, L., & Talley, J. (2002). Phone inventory optimization for multilingual automatic speech recognition. The Journal of the Acoustical Society of America, 112(5), 2305-2305. https://asa.scitation.org/doi/10.1121/1.4779285
This paper describes a phone inventory optimization procedure for application in multilingual automatic speech recognition (ASR). The optimization procedure is based on three knowledge sources that act collectively to guide phonological reduction and selection processes: (1) abstract (language‐independent) phonological universals and tendencies that are used in the construction of a hierarchical structure that specifies phone class reduction paths; (2) language‐dependent knowledge that includes information of the targeted languages’ phone inventories and individual phone frequencies in language data resources; (3) acoustic data that provides phone discriminability and similarity metrics. Using the optimization procedure, the phone inventories of six languages, American English, Mandarin Chinese, Egyptian Colloquial Arabic, Japanese, German, and Spanish, were merged to create an inventory consisting of 64 distinct cross‐phonological units. This reduced phone set was used in all training and testing procedures and resources for the recognition of the six targeted languages. Preliminary recognition results are very encouraging: while purely data‐driven approaches to multilingual ASR fail to reach word‐recognition rates comparable to monolingual applications, the use of the optimized phone inventory in our multilingual ASR program yields recognition rates approximating that of monolingual ASR.
Talley, J. (2002). Context dependencies in vowel identification in ablated CVC syllables. The Journal of the Acoustical Society of America, 112(5), 2249-2249. https://asa.scitation.org/doi/10.1121/1.4778943
Talley, J. (2002). Context dependencies in vowel identification in ablated CVC syllables. The Journal of the Acoustical Society of America, 112(5), 2249-2249. https://asa.scitation.org/doi/10.1121/1.4778943
In previously reported work [Talley, J. Acoust. Soc. Am. 108, 2601 (2000)], novel results from a new perceptual study of human vowel identification under ablation conditions were discussed. That study, which used ten American English (AE) vowels in each of four simple CVC consonantal contexts, found highly significant effects of ablation condition and consonantal context on vowel identifiability. However, little insight was available at the time regarding the specifics of the vowel–context interactions. This paper extends that work providing detailed analysis of vowel identification sensitivity relative to consonantal context under differing ablation conditions.
Lee, K-T., Melnar, L., & T, J. (2002). Symbolic speaker adaptation for pronunciation modeling. [pdf]
Lee, K-T., Melnar, L., & T, J. (2002). Symbolic speaker adaptation for pronunciation modeling. [pdf]
CT This paper presents a method of modeling a speaker’s pronunciation of a given language as a blend of ”standard” speech and other non-standard speech varieties (regional dialects and foreign accented pronunciation styles) by way of speaker-dependent modification of a lexicon. In thissystem, a lexicon of Standard American English (SAE) forms, the ”canonical” lexicon, is filtered and transformed via a group of speech variety (SV) dependent rule sets into a speaker specific set of pronunciation variants (and associated probabilities) for use during recognition. The relative importance of these rule sets depends on the speaker’s pronunciation characteristics and is represented by a Speech Variety Profile (SVP) associated with each speaker. A speaker’s individual SVP is acquired through feedback from an adaptation process. Convergence to a speaker’s SVP represents adaptation of the lexicon (symbolic adaptation) to those SV-specific forms that speaker is likely to utter.
Talley, J. (2000b). Vowel perception in varied symmetric CVC contexts. The Journal of the Acoustical Society of America, 108(5), 2601-2601. https://asa.scitation.org/doi/10.1121/1.4743684
Talley, J. (2000b). Vowel perception in varied symmetric CVC contexts. The Journal of the Acoustical Society of America, 108(5), 2601-2601. https://asa.scitation.org/doi/10.1121/1.4743684
In the three‐decade‐long debate over static versus dynamic specification of vowels, perceptual studies in which subjects are tasked with identifying naturally spoken vowels under various ablation conditions have been a mainstay. While not directly producing an understanding of how humans go about recognizing this major subclass of phones, this type of study [e.g., Strange, Jenkins, and Johnson, J. Acoust. Soc. Am. 74, 695–705 (1983)] has provided compelling results which must be accounted for in any successful theory of vowel perception. This paper presents results from yet another perceptual study of human vowel identification under ablation conditions. This study uses CVC syllables spoken rapidly by three male speakers in a carrier sentence. Syllables consist of ten American English vowels in each of four consonantal contexts (b_b, d_d, g_g, and h_d). The conditions studied are silent centers (SC), centers only (CO), and the control condition (full). A very robust hierarchy of full>CO>SC is found. Consonantal contexts also have a clear ordering (h_d>b_b>d_d>g_g) with respect to the ease with which they are perceived. Interesting interactions between vowels and their contexts are also evident.
Talley, J. (2000a). The establishment of Motorola’s Human Language Data Resource Center: Addressing the criticality of language resources in the industrial setting. LREC. http://www.lrec-conf.org/proceedings/lrec2000/pdf/260.pdf [pdf]
Talley, J. (2000a). The establishment of Motorola’s Human Language Data Resource Center: Addressing the criticality of language resources in the industrial setting. LREC. http://www.lrec-conf.org/proceedings/lrec2000/pdf/260.pdf [pdf]
Within the human language technology (HLT) field it is widely understood that the availability (and effective utilization) of voluminous, high quality language resources is both a critical need and a critical bottleneck in the advancement and deployment of cutting edge HLT applications. Recently formed (inter-)national human language resource (HLR) consortia (e.g., LDC, ELRA,…) have made great strides in addressing this challenge by distributing a rich array of pre-competitive HLRs. However, HLT application commercialization will continue to demand that HLRs specific to target products (and complementary to consortially available resources) be created. In recognition of the general criticality of HLRs, Motorola has recently formed the Human Language Data Resource Center (HLDRC) to streamline and leverage our HLR creation and utilization efforts. In this paper, we use the specific case of the Motorola HLDRC to help examine the goals and range of activities which fall into the purview of a company-internal HLR organization, look at ways in which such an organization differs from (and is similar to) HLR consortia, and explore some issues with respect to implementation of a wholly within-company HLR organization like the HLDRC.
Martin, G. L., & Talley, J. (1995). Recognizing handwritten phrases from U. S. census forms by combining neural networks and dynamic programming. Journal of Artificial Neural Networks, (2:3), 167-193. https://dl.acm.org/doi/10.5555/226864.226866
Talley, J. (1994). The PEACC method of characterization of dynamic aspects of speech. The Journal of the Acoustical Society of America, 96(5), 3351-3351. http://dx.doi.org/10.1121/1.410637
Talley, J. (1994). The PEACC method of characterization of dynamic aspects of speech. The Journal of the Acoustical Society of America, 96(5), 3351-3351. http://dx.doi.org/10.1121/1.410637
In the phonetics/speech perception community, the assertion that dynamic aspects of the speech signal are employed in robust speech decoding is not particularly controversial. Though an increasing number of studies are addressing the various dynamic cues of speech, quantitative, analytical research is hampered somewhat by a lack of established methods. Methods for studying dynamic phenomena are much less well developed than those for more static properties. This paper proposes a new method for characterizing dynamic aspects of speech, the Piecewise Exponential Approximation with Continuity Constraints (PEACC) method, to help remedy this state of affairs. As its name suggests, PEACC performs piecewise fitting of exponential segments—γeαx+β, −∞<x≤0—to the sampled signal. Dynamic programming is utilized in global sequence optimization where MSE is minimized within the solution space permitted by the constraints on continuity. This method has broad applicability and produces low‐distortion fits at a specifiable level of detail; however, its principle strength, from the perspective of speech science research, is that its resulting signal transition parameters have direct, intuitive interpretations. The paper concludes with a brief examination of the results of applying PEACC to a corpus of formant track data. [Work supported by NSF.]
Talley, J. (1994). Neural network-based analysis of cues for vowel and consonant identification. The Journal of the Acoustical Society of America, 95(5), 2922-2922. http://dx.doi.org/10.1121/1.409202
Talley, J. (1994). Neural network-based analysis of cues for vowel and consonant identification. The Journal of the Acoustical Society of America, 95(5), 2922-2922. http://dx.doi.org/10.1121/1.409202
Many static and dynamic features of the acoustic speech signal have been proposed in the literature as cues for identification of phonetic categories. Ultimately, such features’ cue validity is most appropriately studied via well‐designed perceptual experiments involving human subjects, but such studies are hindered by their expense (especially in terms of time) and inherent confounds (stimuli must be sufficiently speech‐like to garner treatment as such). This paper examines ways in which neural networks (NNs) can be utilized as auxiliary tools in phonetics/speech perception research. Discussion includes the application of NNs to the task of learning relevant speech category discriminations from various restricted characterizations of a corpus of naturally spoken CVC syllables. That task examines learnability as a function of the available featural information. In addition, various ‘‘post‐mortem’’ techniques (e.g., transfer function mapping, weight analysis,…) are discussed which, when applied to trained NNs, yield estimates of the cue validity of (ensembles of) features with respect to phonetic category discriminations. These methods cannot be blindly interpreted as producing valid characterizations of human speech perception, however, they represent useful tools that are inexpensive and highly targetable (confounds can be controlled) and can serve as guides to fruitful experiments with human subjects. [Work supported by NSF.]
Brown, J. R., & Talley, J. (1994). Locating faces in color photographs using neural networks. Proceedings: SPIE, Applications of Artificial Neural Networks V(2243), 584-590. https://doi.org/10.1117/12.170007
Brown, J. R., & Talley, J. (1994). Locating faces in color photographs using neural networks. Proceedings: SPIE, Applications of Artificial Neural Networks V, Vol. 2243, March 1994, pp. 584-590. https://doi.org/10.1117/12.170007
This paper summarizes a research effort in finding the locations and sizes of faces in color images (photographs, video stills, etc.) if, in fact, faces are presented. Scenarios for using such a system include serving as the means of localizing skin for automatic color balancing during photo processing or it could be used as a front-end in a customs port of energy context for a system which identified persona non grata given a database of known faces. The approach presented here is a hybrid system including: a neural pre-processor, some conventional image processing steps, and a neural classifier as the final face/non-face discriminator. Neither the training (containing 17,655 faces) nor the test (containing 1829 faces) imagery databases were constrained in their content or quality. The results for the pilot system are reported along with a discussion for improving the current system.
Wall, R., & Talley, J. (1993). Understanding English specification of finite state devices. In: Lenguajes naturales y lenguajes formales: actas del IX congreso de lenguajes naturales y lenguajes formales: (Reus, 20-22.12.1993) (pp. 79-98). Promociones y Publicaciones Universitarias, PPU. https://dialnet.unirioja.es/servlet/libro?codigo=5283 [pdf]
Wall, R., & Talley, J. (1993). Understanding English specification of finite state devices. In Lenguajes naturales y lenguajes formales: actas del IX congreso de lenguajes naturales y lenguajes formales: (Reus, 20-22.12.1993) (pp. 79-98). Promociones y Publicaciones Universitarias, PPU. https://dialnet.unirioja.es/servlet/libro?codigo=5283 [pdf]
This paper describes a system which receives a specification of a regular language, in English (for example, “all strings over the alphabet {a, b} which do not contain the string ‘bab’) and constructs a representation of a finite-state automation (FSA) which accepts the language specified. The system uses a strategy for domain-specific natural language understanding (NLU) where syntactic processing is based on a grammar which conflates syntactic, semantic, and pragmatic concerns and relies on a strong underlying semantic model to simplify the parsing task. The system has possible applications as an aid in teaching automata theory and might also be regarded as a miniature system for constructing computer programs in natural language.
Talley, J. (1992). Quantitative characterization of vowel formant transitions. The Journal of the Acoustical Society of America, 92(4), 2413-2413. http://dx.doi.org/10.1121/1.404673
Talley, J. (1992). Quantitative characterization of vowel formant transitions. The Journal of the Acoustical Society of America, 92(4), 2413-2413. http://dx.doi.org/10.1121/1.404673
This paper presents an acoustic study of vowel formant dynamics and the analysis methods that were developed to carry it out. The main goal of the described study was to bring quantitative, acoustic evidence to bear on competing theories regarding the source(s) of vowel identity specification [W. Strange, J. Acoust. Soc. Am. 85, 2081–2087 (1989)]. A set of 40 CVC syllables are studied: symmetric voiced stop (/bVb/, /dVd/, /gVg/) and ‘‘neutral’’ (/hVd/) contexts × the ‘‘monopthongal’’ vowels of Midwestern American English (/i,i,e,eh,æ,Λ,u,u,o,a/). Three male speakers (speaking normally) contributed two repetitions each. Voice pulse by voice pulse tracks of the first three formant frequencies were measured using GEMS [J. Talley, J. Acoust. Soc. Am. 90, 2274 (A) (1991)] and LPC. PEACC, a new technique for speech coding using exponential pieces, was then applied to the trajectories to automatically segment them into transitions and characterize the segments in terms of intuitive parameters−Δf (‘‘locus‐to‐target’’ distance), Δt (duration), α (curvature), and f0 (‘‘target’’ frequency). This paper discusses the resulting data’s characteristics and the results from analyzing initial and final transitions with respect to intracategory similarity and intercategory distinctiveness using a variety of interesting category boundaries. [Work supported by NSF.]
Talley, J. (1991). Graphical editor for marking spectrograms (GEMS). The Journal of the Acoustical Society of America, 90(4), 2274-2274. https://doi.org/10.1121/1.401197
Talley, J. (1991). Graphical editor for marking spectrograms (GEMS). The Journal of the Acoustical Society of America, 90(4), 2274-2274. https://doi.org/10.1121/1.401197
This paper presents a new computer system, GEMS (or the Graphical Editor for Marking Spectrograms), for interactively collecting frequency and timing data from speech samples. GEMS was originally developed as a data collection aid for on‐going research in vowel formant transition patterns, but it potentially has wider applicability. It takes digitized speech as input, generates a spectrographic display as the background, and yields as output time, frequency, and continuity data on the points (and sequences) marked as interesting by the user. This system, as its name suggests, is primarily a graphical editor for such markings. However, given the editing substrate, it is possible to incorporate less than perfect algorithms (true of most) to automate selection/marking of interesting features. The user immediately sees the results of such automatic techniques (e.g., LPC analysis) and has an opportunity to modify the outcome in context before proceeding. A middle road is, thereby, taken between completely automatic, but error prone, data production, and time consuming manual techniques of data collection. This framework can also serve as an environment for improvement of algorithms with a learning component. The system is currently implemented in Pascal under VAX/VMS. It is designed with a client/(DSP) server architecture, and implements some automated techniques such as a new heuristic voice pulse localization algorithm. [Work supported by NSF.]
Wittenburg, K., Weitzman, L., & Talley, J. (1991). Unification-based grammars and tabular parsing for graphical languages. Journal of Visual Languages & Computing, 2, 347-370. https://doi.org/10.1016/S1045-926X(05)80004-7
Wittenburg, K., Weitzman, L., & Talley, J. (1991). Unification-based grammars and tabular parsing for graphical languages. J. Vis. Lang. Comput., 2, 347-370. https://doi.org/10.1016/S1045-926X(05)80004-7
In this paper we present a unification-based grammar formalism and parsing algorithm for the purposes of defining and processing generalizations of concatenative languages such as those found in two-dimensional graphical domains. In order to encompass languages whose elements are combined by operations other than simple string concatenation, we extend the PATR unification-based grammar formalism with functionally specified constraints. In order to parse with these grammars, we extend tabular parsing methods and discuss a bottom-up algorithm that can process input incrementally in a maximally flexible order. This work is currently being applied in the interpretation of handsketched mathematical expressions and structured flowcharts on notebook computers and interactive worksurfaces.
Beata Walesiak
Walesiak, B., & Talley, J. (2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Walesiak, B., & Talley, J. (2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Recent and ongoing advances in language technologies (such as chatbots, Automatic Speech Recognition, Text-to-Speech, and voice conversion) present exciting opportunities for L2 pronunciation improvement (Gottardi et al., 2022) and potentially provide precise and varied pronunciation feedback (Levis and Rehman, 2022). This research investigates the assessment and feedback mechanisms employed in a wide range of pronunciation and speech coaching (PSC) apps, extending prior work on the affordances of Mobile-Assisted Pronunciation Training (Walesiak, 2021). Its point of departure is a qualitative examination of over 50 web and/or Android apps (some of which are AI-based), focusing on the type of pronunciation assessments they employ and on the quality of feedback they provide, including their feedback modalities (textual, visual, and/or auditory). We then systematically structure this information as a comprehensive matrix that categorises the affordances of the PSC apps with respect to their feedback type (Henrichsen, 2020), the hierarchical priority level (Kang and Hirschi, 2023), their incorporation of High Variability Phonetic Training, and attention to intelligibility. The examined PSC apps recruit a range of feedback strategies for varied assessment targets, yet we frequently observe feedback which is not particularly actionable (e.g., binary, scale-based, raw numeric graphs) and oftentimes directed at segmental targets. This picture which emerges from our systematic analysis highlights the fact that a number of research-indicated assessment targets and feedback strategies remain underrepresented in the current PSC app landscape. We discuss the implications and opportunities of this for educators, researchers, and developers alike.
Talley, J., & Walesiak, B. (2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Talley, J., & Walesiak, B. (2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
The ideal of fully-individualized, dynamically-adjusted L2 pronunciation learning (Couper, 2022; Derwing and Munro, 2015) crucially depends on diagnostic assessments, where goal-setting and goal-focused activities are enabled by an initial needs assessment with adjustments based on periodic formative assessments. Yet Couper’s research shows “very few teachers use diagnostics for pronunciation” (2022, p. 180). Conducting diagnostic assessments, developing/adjusting plans based upon them, and communicating those results/plans is perceived as prohibitively time consuming. We discuss (and illustrate) how a new pronunciation teaching support system, Accent Explorer (AE), potentially helps remediate that time crunch. Whenever a recording made in AE’s student app is submitted to AE’s teacher app, it is extensively annotated with AI-detected, importance-ranked (supra-)segmental issues. The flagged issues are formulated as suggestions for the teacher which may be accepted, adjusted, augmented, or discarded. The teacher-arbitrated issues (plus any added notes) are returned to the student as targeted feedback which the student can fully explore (graphically and auditorily, in addition to textual explanations). The identified issues are also logged, allowing generation of (date range-constrained) summaries of pronunciation issues, supporting diagnostic assessment. Suggestions for optimizing pedagogical practices via this new technology’s assessment (and other) affordances will be offered.
Archer, G., Červinková Poesová, K., Duckinoska-Mihajlovska, I., Rocha, A.P. B., & Walesiak. B. (forthcoming). ‘Finding your tribe’: How membership of a pronunciation-focused teacher association can positively impact classroom instruction and beyond. In A. Kirkova-Naskova, & E. Tergujeff (Eds.), Achievements in Second Language Pronunciation: Good Practices for L2 Teaching and Learning. Cambridge University Press.
Archer, G., Červinková Poesová, K., Duckinoska-Mihajlovska, I., Rocha, A.P. B., & Walesiak. B. (forthcoming). ‘Finding your tribe’: How membership of a pronunciation-focused teacher association can positively impact classroom instruction and beyond. In A. Kirkova-Naskova, & E. Tergujeff (Eds.), Achievements in Second Language Pronunciation: Good Practices for L2 Teaching and Learning. Cambridge University Press.
Abstract soon
Zawadzki, Z., Challis, K., Goodale, E., Guskaroska, A., Walesiak, B., & Levis, J. (2023) “The Inspiration for Creating the Best of Teaching Tips”, Pronunciation in Second Language Learning and Teaching Proceedings 1. https://doi.org/10.31274/psllt.16935
Zawadzki, Z., Challis, K., Goodale, E., Guskaroska, A., Walesiak, B., & Levis, J. (2023) “The Inspiration for Creating the Best of Teaching Tips”, Pronunciation in Second Language Learning and Teaching Proceedings 1. https://doi.org/10.31274/psllt.16935
This introduction gives a brief overview of the Pronunciation in Second Language Learning and Teaching (PSLLT), describes what teaching tips are, and explains the motivation behind taking on the creation of this collection.
Walesiak, B. (2023). Supporting receptive and productive pronunciation of accents through online tools. In D. Bullock (Ed.), IATEFL 2022 Belfast Conference Selections. (pp. 203–205). Faversham: IATEFL. ISBN: 978-1-912588-44-2. https://www.iatefl.org/resources/iatefl-conference-selections-2022-printed-edition
Walesiak, B. (2023). Supporting receptive and productive pronunciation of accents through online tools. In D. Bullock (Ed.), IATEFL 2022 Belfast Conference Selections. (pp. 203–205). Faversham: IATEFL. ISBN: 978-1-912588-44-2. https://www.iatefl.org/resources/iatefl-conference-selections-2022-printed-edition
Students enroll in (online) courses, hoping not only to understand native speakers better but often to speak more like them. Many do not seem to be aware of the fact that speakers who are perceived as strongly accented can also be highly intelligible (Levis, 2020) and that awareness of diversity of accents in today’s world is key to learning and teaching pronunciation, not the standard model exclusively.
Walesiak, B. (2021). Mobile apps for pronunciation training. Exploring learner engagement and retention. In A. Kirkova-Naskova, A. Henderson & J. Fouz-González. (Eds.), English Pronunciation Instruction: Research-based insights (pp. 357-384). John Benjamins. https://doi.org/10.1075/aals.19.15wal
Walesiak, B. (2021). Mobile apps for pronunciation training. Exploring learner engagement and retention. In A. Kirkova-Naskova, A. Henderson & J. Fouz-González. (Eds.), English Pronunciation Instruction: Research-based insights (pp. 357-384). John Benjamins. https://doi.org/10.1075/aals.19.15wal
This chapter explores adult learners’ perceptions towards using apps for pronunciation training and the engagement and retention rates of mobile technologies for pronunciation practice. The study focuses on five apps that address different areas of pronunciation selected by the teacher-researcher. The findings show that learners are keen on using pronunciation apps to learn pronunciation, but the frequency of self-reported mobile app use and time devoted to their use dwindle without the teacher’s guidance. A discussion of the learners’ engagement with and retention of the apps is offered, together with a number of recommendations to facilitate the integration of apps into pronunciation instruction.
Walesiak, B. (2019). English After RP – Standard British Pronunciation Today (Review). Speak Out! Journal of the IATEFL Pronunciation Special Interest Group, 61, 44-47. ISSN: 2313-7703. https://pronsig.iatefl.org/journal/
Walesiak, B. (2018). Beyond Repeat After Me – Teaching pronunciation to English learners (Review). Speak Out! Journal of the IATEFL Pronunciation Special Interest Group, 59, 43-45. ISSN: 2313-7703. https://pronsig.iatefl.org/journal/
Walesiak, B. (2017). Mobile pronunciation apps: a personal investigation. Speak Out! Journal of the IATEFL Pronunciation Special Interest Group, 57, 16-28. ISSN: 2313-7703. https://www.academia.edu/34720473/Mobile_pron_apps_a_personal_investigation
Walesiak, B. (2017). Mobile pronunciation apps: a personal investigation. Speak Out! Journal of the IATEFL Pronunciation Special Interest Group, 57, 16-28. ISSN: 2313-7703. https://www.academia.edu/34720473/Mobile_pron_apps_a_personal_investigation
In this article, I provide a subjective list of pronunciation apps that might be useful when practising pronunciation in class, teaching online or assigning out-of-class self-study. This is followed by a few practical tips concerning the ways you can incorporate apps in the process of English language teaching. Whether you teach pronunciation as a separate skill or simply integrate it spontaneously into your lessons, mobile apps can definitely help you create a stimulating classroom environment, improve the quality and effectiveness of your teaching and reinforce to students the need to learn pronunciation.
Walesiak B. (2015). Fjuczersy, cudofiksingi, market mejkerzy – samples of a speech corpus of the Polish stock market sociolect. Language contact in specialist speech. In D. Stanulewicz (Ed.), Beyond Philology (pp. 59-75). Gdańsk: Wydawnictwo Uniwersytetu Gdańskiego. ISSN: 1732-1220. https://fil.ug.edu.pl/sites/default/files/_nodes/strona-filologiczny/33797/files/beyond_philology_no_12_2015.pdf
Walesiak B. (2015). Fjuczersy, cudofiksingi, market mejkerzy – samples of a speech corpus of the Polish stock market sociolect. Language contact in specialist speech. In D. Stanulewicz (Ed.), Beyond Philology (pp. 59-75). Gdańsk: Wydawnictwo Uniwersytetu Gdańskiego. ISSN: 1732-1220. https://fil.ug.edu.pl/sites/default/files/_nodes/strona-filologiczny/33797/files/beyond_philology_no_12_2015.pdf
Foreign language awareness helps break down communication barriers in the area of business and information in Poland. Market participants, in particular, choose to interweave their Polish utterances with phrasings or loan words of English origin in order to maintain effective and efficient communication. Changes take place irrespective of the efforts made to discourage English borrowings from their transgression into everyday usage, which leaves much room for research in the field of Englishization of business communication in Poland, as well as its effects on the culture of the Polish language as a whole. The article comments on some aspects of this communication on the market in Poland. It also lists some samples of the English borrowings isolated from a speech corpus related to the area of the capital markets and presents informants’ views on interferences recorded in the process of isolating the spoken data.
Dziczek-Karlikowska, H., & Mikołajewska, B. (2015). Computer-assisted awareness raising of L2 phonology: pronunciation in commercials – a pilot study. In A. Turula, B. Mikołajewska & D. Stanulewicz (Eds.), Insights into Technology Enhanced Language Pedagogy (pp. 163-174). Frankfurt: Peter Lang. ISBN: 9783631656693. https://doi.org/10.3726/978-3-653-04995-4
Dziczek-Karlikowska, H., & Mikołajewska, B. (2015). Computer-assisted awareness raising of L2 phonology: pronunciation in commercials – a pilot study. In A. Turula, B. Mikołajewska & D. Stanulewicz (Eds.), Insights into Technology Enhanced Language Pedagogy (pp. 163-174). Frankfurt: Peter Lang. ISBN: 9783631656693. https://doi.org/10.3726/978-3-653-04995-4
Linguists agree that TV and the Internet are now the sources of the influx of English-originating phrases in today’s Polish, and that commercials in particular may influence the respondents’ vocabulary quite significantly. Advantages of the process notwithstanding, there are also a few problems related to this. Copywriters resort to the use of various linguistic devices – many of them of English origin – in order to meet the demands of the brand. In doing so, they use erroneous pronunciation to achieve marketing goals, which may have a negative effect on the perception of the lexeme and its performance by Polish learners of English (on different levels of advancement). The paper starts by analysing the phenomena described above. Then it goes on to present a computer-aided Moodle-based pronunciation class, the intricacies of its design and some ready-made tasks. It closes with the feedback obtained from participants (students of English Studies and Open University students) in the course of the pilot study.
co-edited volumes
Zawadzki, Z., Challis, K., Goodale, E., Guskaroska, A., Walesiak, B., & Levis, J. (2023). Pronunciation in Second Language Learning and Teaching Proceedings 1. doi: https://doi.org/10.31274/psllt.16935
Turula, A., Mikołajewska, B., & Stanulewicz, D. (2015). Insights into Technology Enhanced Language Pedagogy. Frankfurt: Peter Lang. ISBN: 9783631656693. https://doi.org/10.3726/978-3-653-04995-4
Łukasik, M., & Mikołajewska B. (2014). Języki specjalistyczne wczoraj, dziś i jutro. Warszawa: Wydawnictwo Naukowe IKL@. ISBN: 978-83-64020-12-4. http://sn.iksi.uw.edu.pl/wp-content/uploads/sites/306/2018/09/SN-17-Marek-%C5%81ukasik-Beata-Miko%C5%82ajewska-red.-J%C4%99zyki-specjalistyczne-wczoraj-dzi%C5%9B-i-jutro.pdf