our TEAM OF experts
The team
Our team comprises of individuals with diverse and specialized skills, including expertise in speech-related research, engineering, phonetics, and pedagogy. Their collective knowledge and proficiency provides for a comprehensive understanding and effective implementation of innovative solutions in the fields of speech technology and education.
Jim Talley
founder & Chief technologist
Jim founded LingCosms after decades of human language technology (HLT) R&D at the University of Texas at Austin, Microelectronics and Technology Corp. (MCC), Lexicus, and Motorola (see Research below). Though he wears many hats at LingCosms, he primarily identifies as a speech scientist and an applied machine learning (ML) researcher. He has extensive experience producing and delivering solutions across a wide range of HLTs such as pronunciation prediction (letter-to-sound), language modeling, speech recognition, speech analysis tools, computational linguistics (parsing, including 2D graphical parsing), speech-based dialog, handwriting recognition, and speech database creation, among others, with much of that work multi-lingual in nature.
Beata Walesiak
Product Launch lead
Beata has cooperated with a number of start-ups, corporations and academic institutions within the domain of educational technologies, pronunciation training and AI-based speech pedagogy and assessment. At LingCosms, she is responsible for coordinating and overseeing the quality and successful launch of products and services. Previously, she has worked as Project Manager, AI Linguist, as well as Technical Support Specialist, and she has extensive experience in research (see Research below) and pedagogy as a University Lecturer within the Life Long Learning framework, an EFL Teacher and a certified Examiner.
Company backstory
One of Jim’s primary motivations in the founding of LingCosms was dissatisfaction with the limited degree of (bi-directional) transfer between the practice of HLT engineering on the one hand and linguistics/phonetics/speech science on the other, and the sense that the potential benefits of such transfer were being left on the table. His vision for LingCosms was to produce performant, economical solutions to existing problems in the space via judicious combinations of (1) any and all appropriate signal processing and ML technologies, (2) the extant knowledge from linguistics and speech sciences, and (3) targeted novel research and development, with each such solution contributing to a growing, symbiotic technology stack supporting research into the nature of language.
For example, though deep learning (DL) based neural networks (NNs) (e.g., large language models [LLMs]) are all the rage currently, and though Jim has been doing R&D with NNs since before NNs were cool (the 1980’s!) and almost as long with LMs (1990’s), he is not an acolyte of the trend of throwing all eggs into the deep-learning-of-enormous-models basket. While that paradigm has yielded unquestionable advances in HLT capabilities in recent years, it is problematic in a number of ways. Most critical, from Jim’s perspective, is the black box nature of such models. While the end-to-end training enabled by newer DL methods has been a win for the engineering of accurate systems, it makes them extremely opaque to explanation/understanding of their decision processes and, thereby, virtually precludes the cross-feeding between language knowledge and language engineering. Another problem is that the high cost of training (and inference) leads to a situation where a handful of deep-pocketed corporations have de facto control of the HLT landscape. Jim’s pragmatic bias is take advantage of some of the impressive affordances of the latest DL NNs (e.g., via APIs) but focus LingCosms’s R&D on (combinations of) technologies appropriate to the characteristics of the problem at hand with the goal of producing understandable/explainable solutions with economical training/operation.
Our research
The following is a summary of our publications including journal papers, conference papers, book chapters, reviews, volumes, etc.:
Jim Talley
- James (Jim) Talley’s Semantic Scholar
Talley, J., & Walesiak, B. (2024). Comprehensive visual and auditory feedback in support of teaching/learning pronunciation: Introducing Accent Explorer [Conference paper]. 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
Talley, J., & Walesiak, B. (2024). Comprehensive visual and auditory feedback in support of teaching/learning pronunciation: Introducing Accent Explorer [Conference paper]. 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
In examining methods of reducing intelligibility-impacting pronunciation issues, increasing evidence indicates that explicit pronunciation instruction (PI) and corrective feedback (CF) are beneficial (Lee, Jang, and Plonsky, 2015; Sardegna and McGregor, 2022). While providing one-on-one PI and CF to each student, focusing on the student’s individual pronunciation challenges, is ideal, most pronunciation learning contexts (e.g., university pronunciation classes) involve relatively high student-to-teacher ratios, making extensive one-on-one PI and/or targeted CF prohibitively time consuming.
Computer-assisted pronunciation training (CAPT) offers the promise of mitigating that time crunch. To realize that promise, CAPT needs to serve as a force multiplier for the teacher – helping learners to understand their individual pronunciation issues, offering relevant practice opportunities, and providing useful CF, where “useful” implies that the feedback is targeted and actionable (providing insight on how to improve). Unfortunately, recent research (Walesiak & Talley, 2024) finds that relatively few of the currently available CAPT apps offer significant amounts of targeted/actionable feedback. Some ambitious teachers attempt to fill that feedback gap via available general purpose tools (e.g., Praat, Google speech-to-text, Audacity,…), but the set-up and operation of such tools can be daunting for students, and their outputs (spectrograms, waveforms,…) can be difficult for non-experts to interpret.
This talk discusses and illustrates Accent Explorer (AE) – a new tool designed specifically to help make individualized PI and CF a more manageable endeavor. AE does not attempt to be a pronunciation course, nor is it an instructional methodology. It is just a tool which, via its extensive visualization (and auditorialization) of significant pronunciation related phenomena and its extensive AI-supported annotation capabilities, aims to facilitate student understanding of the various components of accent (and the results of efforts to modify them). AE additionally provides some student management dashboard functionality for teachers. While its AI-based functionality is integral to the attempt to serve as a force multiplier for the teacher, AE is intentional with respect to maintaining teachers’ agency regarding their students’ pronunciation education – i.e., it attempts to assist, not to replace, the teacher.
We will survey the range of affordances incorporated into AE’s student and teacher apps. These include, among others, student/teacher sharing of recordings/feedback, detailed (supra-)segmental issue call-outs, visualization/auditorialization of prosodic elements, narrative feedback regarding observed issues with suggested mitigation strategies, and summarization in support of (diagnostic, formative, and/or summative) assessments by the teacher. Active discussion of potential uses, and missed opportunities, will be encouraged.
Walesiak, B., & Talley, J. (2024). A pronunciation and speech coaching (PSC) apps search engine for teachers: A research-driven solution [Conference paper]. 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
Walesiak, B., & Talley, J. (2024). A pronunciation and speech coaching (PSC) apps search engine for teachers: A research-driven solution [Conference paper]. 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
Although language technologies for L2 pronunciation pedagogy are diverse and increasingly sophisticated, some fail to deliver targeted content or genuinely personalised feedback (Fouz-González, 2024; Walesiak & Talley, 2024). The challenges related to incorporating apps into teaching pronunciation are discussed in the literature (García et al., 2020; Inceoglu, 2022), with teachers indicating that they struggle to know which technologies to include in their pedagogy (Metruk, 2022).
To assist teachers in finding suitable resources to meet their needs, we have embarked on a research project (Walesiak & Talley, 2024) devoted to the assessment mechanisms and feedback affordances employed in widely available pronunciation and speech coaching (PSC) apps, i.e. apps which aim to improve users’ articulation, pronunciation or spoken communication, sometimes via utilization of speech recognition (SR), text-to-speech (TTS) and/or Artificial Intelligence (AI) technologies. Some of these apps evaluate users’ spoken attempts and provide feedback or suggestions for improvement, while others may focus solely on a clickable practice material. In the talk, we present how a sequential research design has been employed, beginning with a qualitative phase that included investigating a range of mobile and web PSC apps, followed by a quantitative analysis of a selected subset.
The talk extends prior work on Mobile-Assisted Pronunciation Training affordances (Walesiak, 2021) by introducing a research-driven solution for educators that allows teachers to search for the affordances (Sobkowiak, 2012) and other characteristics of PSC apps, helping them find apps which will appropriately support their didactic needs. The PSC apps search engine currently selects from Android and web apps based upon their content and feedback types. By filling an information gap regarding mobile and web apps, the tool empowers educators to better assess app suitability for practice in class or outside of school settings, encouraging a more informed, research-aligned approach to pronunciation instruction.
Talley, J., & Walesiak, B. (2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Talley, J., & Walesiak, B. (2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
The ideal of fully-individualized, dynamically-adjusted L2 pronunciation learning (Couper, 2022; Derwing and Munro, 2015) crucially depends on diagnostic assessments, where goal-setting and goal-focused activities are enabled by an initial needs assessment with adjustments based on periodic formative assessments. Yet Couper’s research shows “very few teachers use diagnostics for pronunciation” (2022, p. 180). Conducting diagnostic assessments, developing/adjusting plans based upon them, and communicating those results/plans is perceived as prohibitively time consuming. We discuss (and illustrate) how a new pronunciation teaching support system, Accent Explorer (AE), potentially helps remediate that time crunch. Whenever a recording made in AE’s student app is submitted to AE’s teacher app, it is extensively annotated with AI-detected, importance-ranked (supra-)segmental issues. The flagged issues are formulated as suggestions for the teacher which may be accepted, adjusted, augmented, or discarded. The teacher-arbitrated issues (plus any added notes) are returned to the student as targeted feedback which the student can fully explore (graphically and auditorily, in addition to textual explanations). The identified issues are also logged, allowing generation of (date range-constrained) summaries of pronunciation issues, supporting diagnostic assessment. Suggestions for optimizing pedagogical practices via this new technology’s assessment (and other) affordances will be offered.
Walesiak, B., & Talley, J. (2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Walesiak, B., & Talley, J. (2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Recent and ongoing advances in language technologies (such as chatbots, Automatic Speech Recognition, Text-to-Speech, and voice conversion) present exciting opportunities for L2 pronunciation improvement (Gottardi et al., 2022) and potentially provide precise and varied pronunciation feedback (Levis and Rehman, 2022). This research investigates the assessment and feedback mechanisms employed in a wide range of pronunciation and speech coaching (PSC) apps, extending prior work on the affordances of Mobile-Assisted Pronunciation Training (Walesiak, 2021). Its point of departure is a qualitative examination of over 50 web and/or Android apps (some of which are AI-based), focusing on the type of pronunciation assessments they employ and on the quality of feedback they provide, including their feedback modalities (textual, visual, and/or auditory). We then systematically structure this information as a comprehensive matrix that categorises the affordances of the PSC apps with respect to their feedback type (Henrichsen, 2020), the hierarchical priority level (Kang and Hirschi, 2023), their incorporation of High Variability Phonetic Training, and attention to intelligibility. The examined PSC apps recruit a range of feedback strategies for varied assessment targets, yet we frequently observe feedback which is not particularly actionable (e.g., binary, scale-based, raw numeric graphs) and oftentimes directed at segmental targets. This picture which emerges from our systematic analysis highlights the fact that a number of research-indicated assessment targets and feedback strategies remain underrepresented in the current PSC app landscape. We discuss the implications and opportunities of this for educators, researchers, and developers alike.
Talley, Jim (2023). Identification of non-native English speakers’ L1s via patterns of prosodic feature deviance from native speaker norms. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (pp. 2736–2740). Guarant International.
Talley, Jim (2023). Identification of non-native English speakers’ L1s via patterns of prosodic feature deviance from native speaker norms. In R. Skarnitzl, & J. Volín (Eds.), Proceedings of the 20th International Congress of Phonetic Sciences (pp. 2736–2740). Guarant International.
This study is an exploratory data science-based look at the question of whether the first language (L1) of non-native speakers of English can be identified from only a few simple syllable- and utterance-level prosodic features of their speech. Simple machine learning (ML) modeling on these loudness, pitch, and duration cues yields imperfect, but much better than chance, discrimination (1) between each individual L1 and General American English (GAE), and (2) between the five studied L1s. The described modeling is based upon “atypicality scores” (a-Scores) for the prosodic features, representing the degree to which features deviate, or not, from GAE native speaker norms. The prosodic features, their normalizations, and the aScore characterizations are discussed. Finally, ML-based feature selection analysis examines the individual prosodic features’ relative importance for the individual L1 vs. GAE discrimination tasks and for the 5-way, forced-choice L1 classification task.
Talley, J. (2022). Non-native prosodic deviations from American English norms and their implications for accentedness: The Case of Polish L1 [Conference paper]. 15th International Conference on Native and Non-Native Accents of English Accents 2022, Łódź, Poland.
Talley, J. (2022). Non-native prosodic deviations from American English norms and their implications for accentedness: The Case of Polish L1 [Conference paper]. 15th International Conference on Native and Non-Native Accents of English Accents 2022, Łódź, Poland.
Across the world’s languages, we find four primary acoustic means of signaling prominence – duration (D), pitch (P), loudness (L), and (non-)reduction of vowels. English, somewhat exceptionally, utilizes all four (Chrabaszcz, et al., 2014) in trading relations (Howell, 1993), potentially making mastery of English prosody challenging for ESL/EFL learners.
In this study, intuitively accessible features representing the three suprasegmental (D/P/L) prosodic signals are extracted from two subsets of the Speech Accent Archive (Weinberger, 2015) – Polish speakers of English (PE) and native General American English (GAE) speakers. Those descriptive features are used to examine ways in which PE prosody deviates from that of GAE speakers, and how predictive those deviations are with respect to the perceived accentedness of the PE sentences.
Research on prosody frequently uses (highly technical) descriptive features – e.g., the 1000’s in the openSMILE suite (Coutinho, et al., 2016). But, for this study, a small set of simple features was chosen instead, given that eventual utilization in ESL/EFL didactic contexts was a consideration. The selected descriptive features are both from whole sentences (pause-to-speech ratio/D, speaking rate/D, pitch dynamism/P, and loudness dynamism/L) and from syllable triplets (D/P/L center syllable values and left & right deltas, plus inter-syllable gaps/D). All of the features are automatically extracted, and normalized, after manual review/correction of word boundaries.
There are three steps in the basic analysis pipeline. First, the GAE data are used to construct a simple statistical model for each prosodic feature – estimating native speaker population norms. Those normative GAE models are, in turn, used to characterize the degree to which the observed PE feature values deviate from native-like productions.
Finally, the collected feature deviances from each PE sentence are analyzed with respect to the sentence’s mean accentedness score (as assessed on a 5-point scale by four GAE speakers).
Results show that the observed PE deviations from GAE prosodic norms generally correlate with judgments of accentedness. However, across the range of D/P/L prosodic features, there is substantial variation in the degree of correlation. This paper examines the relative strengths of association for the features (both individually and sub-grouped), especially with respect to their synergistic combination. It, additionally, gives consideration to (1) the features’ potential utility for predicting human accentedness judgments (e.g., in an automated assessment context) and (2) their potential yield as foci for accent mitigation (e.g., in an ESL/EFL context).
Talley, J. (2016). What makes a Bostonian sound Bostonian and a Texan sound Texan? In J. Levis, H. Le, I. Lucic, E. Simpson, & S. Vo (Eds.), Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference, ISSN 2380-9566, Dallas, TX, October 2015 (pp. 168-179). Ames, IA: Iowa State University. [pdf] https://www.iastatedigitalpress.com/psllt/article/15286/galley/13634/view/
Talley, J. (2016). What makes a Bostonian sound Bostonian and a Texan sound Texan? In J. Levis, H. Le, I. Lucic, E. Simpson, & S. Vo (Eds.), Proceedings of the 7th Pronunciation in Second Language Learning and Teaching Conference, ISSN 2380-9566, Dallas, TX, October 2015 (pp. 168-179). Ames, IA: Iowa State University. [pdf]
This paper introduces a preliminary version of a new methodology for the automated, data-driven discovery of acoustic features of speech which potentially contribute to an accent’s distinctiveness. The results discussed herein, while merely illustrative at this stage, provide reason to be optimistic about the prospects of evolving a truly useful and robust automated methodology for cataloging the characteristic acoustic aspects of accented speech. If this line of research were to fully fulfill its promise, the resulting comprehensive catalog of features would contribute to our explicit knowledge of the correlates of accent. The knowledge represented by such a catalog could potentially be directly applied by teachers of second language pronunciation, and it certainly would inform the development of the more capable and individualized computer-assisted pronunciation training (CAPT) tools of the future.
Talley, J. (2006). Bootstrapping new language ASR capabilities: Achieving best letter-to-sound performance under resource constraints. LREC. http://www.lrec-conf.org/proceedings/lrec2006/pdf/436_pdf.pdf [pdf]
Talley, J. (2006). Bootstrapping new language ASR capabilities: Achieving best letter-to-sound performance under resource constraints. LREC. http://www.lrec-conf.org/proceedings/lrec2006/pdf/436_pdf.pdf [pdf]
One of the most critical components in the process of building automatic speech recognition (ASR) capabilities for a new language is the lexicon, or pronouncing dictionary. For practical reasons, it is desirable to manually create only the minimal lexicon using available native-speaker phonetic expertise and, then, use the resulting seed lexicon for machine learning based induction of a highquality letter-to-sound (L2S) model for generation of pronunciations for the remaining words of the language. This paper examines the viability of this scenario, specifically investigating three possible strategies for selection of lexemes (words) for manual transcription – choosing the most frequent lexemes of the language, choosing lexemes randomly, and selection of lexemes via an information theoretic diversity measure. The relative effectiveness of these three strategies is evaluated as a function of the number of lexemes to be transcribed to create a bootstrapping lexicon. Generally, the newly developed orthographic diversity based selection strategy outperforms the others for this scenario where a limited number of lexemes can be transcribed. The experiments also provide generally useful insight into expected L2S accuracy sacrifice as a function of decreasing training set size
Lee, K-T., Melnar, L., Talley, J., & Wellekens, C. J. (2003) “Symbolic speaker adaptation with phone inventory expansion,” 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. (ICASSP ’03), I-I. https://doi.org/10.1109/ICASSP.2003.1198776 [pdf]
Lee, K-T., Melnar, L., Talley, J., & Wellekens, C. J. (2003) “Symbolic speaker adaptation with phone inventory expansion,” 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. (ICASSP ’03), I-I. https://doi.org/10.1109/ICASSP.2003.1198776 [pdf]
This paper further develops a previously proposed adaptation method for speech recognition called symbolic speaker adaptation (SSA). The basic idea of SSA is to model a speaker’s pronunciation as a blend of speech varieties (SVs) – regional dialects and foreign accents – for which the system has existing pronunciation models. The system determines during an adaptation process the relative applicability of those models, yielding a speech variety profile (SVP) for each speaker. Speaker-dependent lexica for recognition are determined from a speaker’s SVP. In this paper, we discuss a series of experiments designed to analyze how the SSA method is affected by SV-balanced training, expanded phone inventories, reduced amounts of adaptation data, and speech from SVs not modeled by the system. The most dramatic improvements were obtained by using expanded (“SV-inclusive”) phone inventories. SSA was also shown to be effective with a very small number of adaptation sentences. And, SSA’s SV blending scheme yields higher accuracy than using a SV classification scheme for speakers of novel (unseen) SVs.
Melnar, L., & Talley, J. (2003). Phone merger specification for multilingual ASR: The Motorola Polyphone Network. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, August 3-9, 2003 (pp. 1337-1340). ISBN 1-876346-48-5. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/papers/p15_1337.pdf [pdf]
Melnar, L., & Talley, J. (2003). Phone merger specification for multilingual ASR: The Motorola Polyphone Network. In M. J. Solé, D. Recasens, & J. Romero (Eds.), Proceedings of the 15th International Congress of Phonetic Sciences, Barcelona, Spain, August 3-9, 2003 (pp. 1337-1340). ISBN 1-876346-48-5. https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2003/papers/p15_1337.pdf [pdf]
This paper describes the Motorola Polyphone Network (MotPoly), a hierarchical, universal phone correspondence network that defines allowable phone mergers for shared acoustic modeling in multilingual and multi-dialect automatic speech recognition (ML-ASR). MotPoly’s organization is defined by phonetic similarity and other language-independent phonological factors. Unlike other approaches to shared acoustic modeling, MotPoly can be effectively used in systems where computational resources are limited, such as portable
devices. Furthermore, it is less constrained by language data availability than other approaches. With MotPoly as part of an overall strategy, Motorola’s Voice Dialog Systems Lab’s ML-ASR team was able to define a set of multilingual acoustic models whose size was only 23% of the largest monolingual model set but whose overall performance was higher than the monolingual models by 1.4 percentage points.
Melnar, L., & Talley, J. (2002). Phone inventory optimization for multilingual automatic speech recognition. The Journal of the Acoustical Society of America, 112(5), 2305-2305. https://asa.scitation.org/doi/10.1121/1.4779285
Melnar, L., & Talley, J. (2002). Phone inventory optimization for multilingual automatic speech recognition. The Journal of the Acoustical Society of America, 112(5), 2305-2305. https://asa.scitation.org/doi/10.1121/1.4779285
This paper describes a phone inventory optimization procedure for application in multilingual automatic speech recognition (ASR). The optimization procedure is based on three knowledge sources that act collectively to guide phonological reduction and selection processes: (1) abstract (language‐independent) phonological universals and tendencies that are used in the construction of a hierarchical structure that specifies phone class reduction paths; (2) language‐dependent knowledge that includes information of the targeted languages’ phone inventories and individual phone frequencies in language data resources; (3) acoustic data that provides phone discriminability and similarity metrics. Using the optimization procedure, the phone inventories of six languages, American English, Mandarin Chinese, Egyptian Colloquial Arabic, Japanese, German, and Spanish, were merged to create an inventory consisting of 64 distinct cross‐phonological units. This reduced phone set was used in all training and testing procedures and resources for the recognition of the six targeted languages. Preliminary recognition results are very encouraging: while purely data‐driven approaches to multilingual ASR fail to reach word‐recognition rates comparable to monolingual applications, the use of the optimized phone inventory in our multilingual ASR program yields recognition rates approximating that of monolingual ASR.
Talley, J. (2002). Context dependencies in vowel identification in ablated CVC syllables. The Journal of the Acoustical Society of America, 112(5), 2249-2249. https://asa.scitation.org/doi/10.1121/1.4778943
Talley, J. (2002). Context dependencies in vowel identification in ablated CVC syllables. The Journal of the Acoustical Society of America, 112(5), 2249-2249. https://asa.scitation.org/doi/10.1121/1.4778943
In previously reported work [Talley, J. Acoust. Soc. Am. 108, 2601 (2000)], novel results from a new perceptual study of human vowel identification under ablation conditions were discussed. That study, which used ten American English (AE) vowels in each of four simple CVC consonantal contexts, found highly significant effects of ablation condition and consonantal context on vowel identifiability. However, little insight was available at the time regarding the specifics of the vowel–context interactions. This paper extends that work providing detailed analysis of vowel identification sensitivity relative to consonantal context under differing ablation conditions.
Lee, K-T., Melnar, L., & T, J. (2002). Symbolic speaker adaptation for pronunciation modeling. [pdf]
Lee, K-T., Melnar, L., & T, J. (2002). Symbolic speaker adaptation for pronunciation modeling. [pdf]
CT This paper presents a method of modeling a speaker’s pronunciation of a given language as a blend of ”standard” speech and other non-standard speech varieties (regional dialects and foreign accented pronunciation styles) by way of speaker-dependent modification of a lexicon. In thissystem, a lexicon of Standard American English (SAE) forms, the ”canonical” lexicon, is filtered and transformed via a group of speech variety (SV) dependent rule sets into a speaker specific set of pronunciation variants (and associated probabilities) for use during recognition. The relative importance of these rule sets depends on the speaker’s pronunciation characteristics and is represented by a Speech Variety Profile (SVP) associated with each speaker. A speaker’s individual SVP is acquired through feedback from an adaptation process. Convergence to a speaker’s SVP represents adaptation of the lexicon (symbolic adaptation) to those SV-specific forms that speaker is likely to utter.
Talley, J. (2000b). Vowel perception in varied symmetric CVC contexts. The Journal of the Acoustical Society of America, 108(5), 2601-2601. https://asa.scitation.org/doi/10.1121/1.4743684
Talley, J. (2000b). Vowel perception in varied symmetric CVC contexts. The Journal of the Acoustical Society of America, 108(5), 2601-2601. https://asa.scitation.org/doi/10.1121/1.4743684
In the three‐decade‐long debate over static versus dynamic specification of vowels, perceptual studies in which subjects are tasked with identifying naturally spoken vowels under various ablation conditions have been a mainstay. While not directly producing an understanding of how humans go about recognizing this major subclass of phones, this type of study [e.g., Strange, Jenkins, and Johnson, J. Acoust. Soc. Am. 74, 695–705 (1983)] has provided compelling results which must be accounted for in any successful theory of vowel perception. This paper presents results from yet another perceptual study of human vowel identification under ablation conditions. This study uses CVC syllables spoken rapidly by three male speakers in a carrier sentence. Syllables consist of ten American English vowels in each of four consonantal contexts (b_b, d_d, g_g, and h_d). The conditions studied are silent centers (SC), centers only (CO), and the control condition (full). A very robust hierarchy of full>CO>SC is found. Consonantal contexts also have a clear ordering (h_d>b_b>d_d>g_g) with respect to the ease with which they are perceived. Interesting interactions between vowels and their contexts are also evident.
Talley, J. (2000a). The establishment of Motorola’s Human Language Data Resource Center: Addressing the criticality of language resources in the industrial setting. LREC. http://www.lrec-conf.org/proceedings/lrec2000/pdf/260.pdf [pdf]
Talley, J. (2000a). The establishment of Motorola’s Human Language Data Resource Center: Addressing the criticality of language resources in the industrial setting. LREC. http://www.lrec-conf.org/proceedings/lrec2000/pdf/260.pdf [pdf]
Within the human language technology (HLT) field it is widely understood that the availability (and effective utilization) of voluminous, high quality language resources is both a critical need and a critical bottleneck in the advancement and deployment of cutting edge HLT applications. Recently formed (inter-)national human language resource (HLR) consortia (e.g., LDC, ELRA,…) have made great strides in addressing this challenge by distributing a rich array of pre-competitive HLRs. However, HLT application commercialization will continue to demand that HLRs specific to target products (and complementary to consortially available resources) be created. In recognition of the general criticality of HLRs, Motorola has recently formed the Human Language Data Resource Center (HLDRC) to streamline and leverage our HLR creation and utilization efforts. In this paper, we use the specific case of the Motorola HLDRC to help examine the goals and range of activities which fall into the purview of a company-internal HLR organization, look at ways in which such an organization differs from (and is similar to) HLR consortia, and explore some issues with respect to implementation of a wholly within-company HLR organization like the HLDRC.
Martin, G. L., & Talley, J. (1995). Recognizing handwritten phrases from U. S. census forms by combining neural networks and dynamic programming. Journal of Artificial Neural Networks, (2:3), 167-193. https://dl.acm.org/doi/10.5555/226864.226866
Talley, J. (1994). The PEACC method of characterization of dynamic aspects of speech. The Journal of the Acoustical Society of America, 96(5), 3351-3351. http://dx.doi.org/10.1121/1.410637
Talley, J. (1994). The PEACC method of characterization of dynamic aspects of speech. The Journal of the Acoustical Society of America, 96(5), 3351-3351. http://dx.doi.org/10.1121/1.410637
In the phonetics/speech perception community, the assertion that dynamic aspects of the speech signal are employed in robust speech decoding is not particularly controversial. Though an increasing number of studies are addressing the various dynamic cues of speech, quantitative, analytical research is hampered somewhat by a lack of established methods. Methods for studying dynamic phenomena are much less well developed than those for more static properties. This paper proposes a new method for characterizing dynamic aspects of speech, the Piecewise Exponential Approximation with Continuity Constraints (PEACC) method, to help remedy this state of affairs. As its name suggests, PEACC performs piecewise fitting of exponential segments—γeαx+β, −∞<x≤0—to the sampled signal. Dynamic programming is utilized in global sequence optimization where MSE is minimized within the solution space permitted by the constraints on continuity. This method has broad applicability and produces low‐distortion fits at a specifiable level of detail; however, its principle strength, from the perspective of speech science research, is that its resulting signal transition parameters have direct, intuitive interpretations. The paper concludes with a brief examination of the results of applying PEACC to a corpus of formant track data. [Work supported by NSF.]
Talley, J. (1994). Neural network-based analysis of cues for vowel and consonant identification. The Journal of the Acoustical Society of America, 95(5), 2922-2922. http://dx.doi.org/10.1121/1.409202
Talley, J. (1994). Neural network-based analysis of cues for vowel and consonant identification. The Journal of the Acoustical Society of America, 95(5), 2922-2922. http://dx.doi.org/10.1121/1.409202
Many static and dynamic features of the acoustic speech signal have been proposed in the literature as cues for identification of phonetic categories. Ultimately, such features’ cue validity is most appropriately studied via well‐designed perceptual experiments involving human subjects, but such studies are hindered by their expense (especially in terms of time) and inherent confounds (stimuli must be sufficiently speech‐like to garner treatment as such). This paper examines ways in which neural networks (NNs) can be utilized as auxiliary tools in phonetics/speech perception research. Discussion includes the application of NNs to the task of learning relevant speech category discriminations from various restricted characterizations of a corpus of naturally spoken CVC syllables. That task examines learnability as a function of the available featural information. In addition, various ‘‘post‐mortem’’ techniques (e.g., transfer function mapping, weight analysis,…) are discussed which, when applied to trained NNs, yield estimates of the cue validity of (ensembles of) features with respect to phonetic category discriminations. These methods cannot be blindly interpreted as producing valid characterizations of human speech perception, however, they represent useful tools that are inexpensive and highly targetable (confounds can be controlled) and can serve as guides to fruitful experiments with human subjects. [Work supported by NSF.]
Brown, J. R., & Talley, J. (1994). Locating faces in color photographs using neural networks. Proceedings: SPIE, Applications of Artificial Neural Networks V(2243), 584-590. https://doi.org/10.1117/12.170007
Brown, J. R., & Talley, J. (1994). Locating faces in color photographs using neural networks. Proceedings: SPIE, Applications of Artificial Neural Networks V, Vol. 2243, March 1994, pp. 584-590. https://doi.org/10.1117/12.170007
This paper summarizes a research effort in finding the locations and sizes of faces in color images (photographs, video stills, etc.) if, in fact, faces are presented. Scenarios for using such a system include serving as the means of localizing skin for automatic color balancing during photo processing or it could be used as a front-end in a customs port of energy context for a system which identified persona non grata given a database of known faces. The approach presented here is a hybrid system including: a neural pre-processor, some conventional image processing steps, and a neural classifier as the final face/non-face discriminator. Neither the training (containing 17,655 faces) nor the test (containing 1829 faces) imagery databases were constrained in their content or quality. The results for the pilot system are reported along with a discussion for improving the current system.
Wall, R., & Talley, J. (1993). Understanding English specification of finite state devices. In: Lenguajes naturales y lenguajes formales: actas del IX congreso de lenguajes naturales y lenguajes formales: (Reus, 20-22.12.1993) (pp. 79-98). Promociones y Publicaciones Universitarias, PPU. https://dialnet.unirioja.es/servlet/libro?codigo=5283 [pdf]
Wall, R., & Talley, J. (1993). Understanding English specification of finite state devices. In Lenguajes naturales y lenguajes formales: actas del IX congreso de lenguajes naturales y lenguajes formales: (Reus, 20-22.12.1993) (pp. 79-98). Promociones y Publicaciones Universitarias, PPU. https://dialnet.unirioja.es/servlet/libro?codigo=5283 [pdf]
This paper describes a system which receives a specification of a regular language, in English (for example, “all strings over the alphabet {a, b} which do not contain the string ‘bab’) and constructs a representation of a finite-state automation (FSA) which accepts the language specified. The system uses a strategy for domain-specific natural language understanding (NLU) where syntactic processing is based on a grammar which conflates syntactic, semantic, and pragmatic concerns and relies on a strong underlying semantic model to simplify the parsing task. The system has possible applications as an aid in teaching automata theory and might also be regarded as a miniature system for constructing computer programs in natural language.
Talley, J. (1992). Quantitative characterization of vowel formant transitions. The Journal of the Acoustical Society of America, 92(4), 2413-2413. http://dx.doi.org/10.1121/1.404673
Talley, J. (1992). Quantitative characterization of vowel formant transitions. The Journal of the Acoustical Society of America, 92(4), 2413-2413. http://dx.doi.org/10.1121/1.404673
This paper presents an acoustic study of vowel formant dynamics and the analysis methods that were developed to carry it out. The main goal of the described study was to bring quantitative, acoustic evidence to bear on competing theories regarding the source(s) of vowel identity specification [W. Strange, J. Acoust. Soc. Am. 85, 2081–2087 (1989)]. A set of 40 CVC syllables are studied: symmetric voiced stop (/bVb/, /dVd/, /gVg/) and ‘‘neutral’’ (/hVd/) contexts × the ‘‘monopthongal’’ vowels of Midwestern American English (/i,i,e,eh,æ,Λ,u,u,o,a/). Three male speakers (speaking normally) contributed two repetitions each. Voice pulse by voice pulse tracks of the first three formant frequencies were measured using GEMS [J. Talley, J. Acoust. Soc. Am. 90, 2274 (A) (1991)] and LPC. PEACC, a new technique for speech coding using exponential pieces, was then applied to the trajectories to automatically segment them into transitions and characterize the segments in terms of intuitive parameters−Δf (‘‘locus‐to‐target’’ distance), Δt (duration), α (curvature), and f0 (‘‘target’’ frequency). This paper discusses the resulting data’s characteristics and the results from analyzing initial and final transitions with respect to intracategory similarity and intercategory distinctiveness using a variety of interesting category boundaries. [Work supported by NSF.]
Talley, J. (1991). Graphical editor for marking spectrograms (GEMS). The Journal of the Acoustical Society of America, 90(4), 2274-2274. https://doi.org/10.1121/1.401197
Talley, J. (1991). Graphical editor for marking spectrograms (GEMS). The Journal of the Acoustical Society of America, 90(4), 2274-2274. https://doi.org/10.1121/1.401197
This paper presents a new computer system, GEMS (or the Graphical Editor for Marking Spectrograms), for interactively collecting frequency and timing data from speech samples. GEMS was originally developed as a data collection aid for on‐going research in vowel formant transition patterns, but it potentially has wider applicability. It takes digitized speech as input, generates a spectrographic display as the background, and yields as output time, frequency, and continuity data on the points (and sequences) marked as interesting by the user. This system, as its name suggests, is primarily a graphical editor for such markings. However, given the editing substrate, it is possible to incorporate less than perfect algorithms (true of most) to automate selection/marking of interesting features. The user immediately sees the results of such automatic techniques (e.g., LPC analysis) and has an opportunity to modify the outcome in context before proceeding. A middle road is, thereby, taken between completely automatic, but error prone, data production, and time consuming manual techniques of data collection. This framework can also serve as an environment for improvement of algorithms with a learning component. The system is currently implemented in Pascal under VAX/VMS. It is designed with a client/(DSP) server architecture, and implements some automated techniques such as a new heuristic voice pulse localization algorithm. [Work supported by NSF.]
Wittenburg, K., Weitzman, L., & Talley, J. (1991). Unification-based grammars and tabular parsing for graphical languages. Journal of Visual Languages & Computing, 2, 347-370. https://doi.org/10.1016/S1045-926X(05)80004-7
Wittenburg, K., Weitzman, L., & Talley, J. (1991). Unification-based grammars and tabular parsing for graphical languages. J. Vis. Lang. Comput., 2, 347-370. https://doi.org/10.1016/S1045-926X(05)80004-7
In this paper we present a unification-based grammar formalism and parsing algorithm for the purposes of defining and processing generalizations of concatenative languages such as those found in two-dimensional graphical domains. In order to encompass languages whose elements are combined by operations other than simple string concatenation, we extend the PATR unification-based grammar formalism with functionally specified constraints. In order to parse with these grammars, we extend tabular parsing methods and discuss a bottom-up algorithm that can process input incrementally in a maximally flexible order. This work is currently being applied in the interpretation of handsketched mathematical expressions and structured flowcharts on notebook computers and interactive worksurfaces.
Beata Walesiak
- Beata Walesiak’s ORCID ID
- Beata Walesiak’s Google Scholar
- Beata Walesiak’s Scopus Author ID
Walesiak, B., & Talley, J. (2024). A pronunciation and speech coaching (PSC) apps search engine for teachers: A research-driven solution [Conference paper]. 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
Walesiak, B., & Talley, J. (2024). A pronunciation and speech coaching (PSC) apps search engine for teachers: A research-driven solution [Conference paper]. 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
Although language technologies for L2 pronunciation pedagogy are diverse and increasingly sophisticated, some fail to deliver targeted content or genuinely personalised feedback (Fouz-González, 2024; Walesiak & Talley, 2024). The challenges related to incorporating apps into teaching pronunciation are discussed in the literature (García et al., 2020; Inceoglu, 2022), with teachers indicating that they struggle to know which technologies to include in their pedagogy (Metruk, 2022).
To assist teachers in finding suitable resources to meet their needs, we have embarked on a research project (Walesiak & Talley, 2024) devoted to the assessment mechanisms and feedback affordances employed in widely available pronunciation and speech coaching (PSC) apps, i.e. apps which aim to improve users’ articulation, pronunciation or spoken communication, sometimes via utilization of speech recognition (SR), text-to-speech (TTS) and/or Artificial Intelligence (AI) technologies. Some of these apps evaluate users’ spoken attempts and provide feedback or suggestions for improvement, while others may focus solely on a clickable practice material. In the talk, we present how a sequential research design has been employed, beginning with a qualitative phase that included investigating a range of mobile and web PSC apps, followed by a quantitative analysis of a selected subset.
The talk extends prior work on Mobile-Assisted Pronunciation Training affordances (Walesiak, 2021) by introducing a research-driven solution for educators that allows teachers to search for the affordances (Sobkowiak, 2012) and other characteristics of PSC apps, helping them find apps which will appropriately support their didactic needs. The PSC apps search engine currently selects from Android and web apps based upon their content and feedback types. By filling an information gap regarding mobile and web apps, the tool empowers educators to better assess app suitability for practice in class or outside of school settings, encouraging a more informed, research-aligned approach to pronunciation instruction.
Talley, J., & Walesiak, B. (2024). Comprehensive visual and auditory feedback in support of teaching/learning pronunciation: Introducing Accent Explorer [Conference paper]. 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
Talley, J., & Walesiak, B. (2024). Comprehensive visual and auditory feedback in support of teaching/learning pronunciation: Introducing Accent Explorer [Conference paper]. 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
In examining methods of reducing intelligibility-impacting pronunciation issues, increasing evidence indicates that explicit pronunciation instruction (PI) and corrective feedback (CF) are beneficial (Lee, Jang, and Plonsky, 2015; Sardegna and McGregor, 2022). While providing one-on-one PI and CF to each student, focusing on the student’s individual pronunciation challenges, is ideal, most pronunciation learning contexts (e.g., university pronunciation classes) involve relatively high student-to-teacher ratios, making extensive one-on-one PI and/or targeted CF prohibitively time consuming.
Computer-assisted pronunciation training (CAPT) offers the promise of mitigating that time crunch. To realize that promise, CAPT needs to serve as a force multiplier for the teacher – helping learners to understand their individual pronunciation issues, offering relevant practice opportunities, and providing useful CF, where “useful” implies that the feedback is targeted and actionable (providing insight on how to improve). Unfortunately, recent research (Walesiak & Talley, 2024) finds that relatively few of the currently available CAPT apps offer significant amounts of targeted/actionable feedback. Some ambitious teachers attempt to fill that feedback gap via available general purpose tools (e.g., Praat, Google speech-to-text, Audacity,…), but the set-up and operation of such tools can be daunting for students, and their outputs (spectrograms, waveforms,…) can be difficult for non-experts to interpret.
This talk discusses and illustrates Accent Explorer (AE) – a new tool designed specifically to help make individualized PI and CF a more manageable endeavor. AE does not attempt to be a pronunciation course, nor is it an instructional methodology. It is just a tool which, via its extensive visualization (and auditorialization) of significant pronunciation related phenomena and its extensive AI-supported annotation capabilities, aims to facilitate student understanding of the various components of accent (and the results of efforts to modify them). AE additionally provides some student management dashboard functionality for teachers. While its AI-based functionality is integral to the attempt to serve as a force multiplier for the teacher, AE is intentional with respect to maintaining teachers’ agency regarding their students’ pronunciation education – i.e., it attempts to assist, not to replace, the teacher.
We will survey the range of affordances incorporated into AE’s student and teacher apps. These include, among others, student/teacher sharing of recordings/feedback, detailed (supra-)segmental issue call-outs, visualization/auditorialization of prosodic elements, narrative feedback regarding observed issues with suggested mitigation strategies, and summarization in support of (diagnostic, formative, and/or summative) assessments by the teacher. Active discussion of potential uses, and missed opportunities, will be encouraged.
Walesiak, B., & Talley, J. (2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Walesiak, B., & Talley, J. (2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Recent and ongoing advances in language technologies (such as chatbots, Automatic Speech Recognition, Text-to-Speech, and voice conversion) present exciting opportunities for L2 pronunciation improvement (Gottardi et al., 2022) and potentially provide precise and varied pronunciation feedback (Levis and Rehman, 2022). This research investigates the assessment and feedback mechanisms employed in a wide range of pronunciation and speech coaching (PSC) apps, extending prior work on the affordances of Mobile-Assisted Pronunciation Training (Walesiak, 2021). Its point of departure is a qualitative examination of over 50 web and/or Android apps (some of which are AI-based), focusing on the type of pronunciation assessments they employ and on the quality of feedback they provide, including their feedback modalities (textual, visual, and/or auditory). We then systematically structure this information as a comprehensive matrix that categorises the affordances of the PSC apps with respect to their feedback type (Henrichsen, 2020), the hierarchical priority level (Kang and Hirschi, 2023), their incorporation of High Variability Phonetic Training, and attention to intelligibility. The examined PSC apps recruit a range of feedback strategies for varied assessment targets, yet we frequently observe feedback which is not particularly actionable (e.g., binary, scale-based, raw numeric graphs) and oftentimes directed at segmental targets. This picture which emerges from our systematic analysis highlights the fact that a number of research-indicated assessment targets and feedback strategies remain underrepresented in the current PSC app landscape. We discuss the implications and opportunities of this for educators, researchers, and developers alike.
Talley, J., & Walesiak, B. (2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
Talley, J., & Walesiak, B. (2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology [Conference paper]. 15th annual Pronunciation in Second Language Learning and Teaching Conference. PSLLT 2024, Ames, United States.
The ideal of fully-individualized, dynamically-adjusted L2 pronunciation learning (Couper, 2022; Derwing and Munro, 2015) crucially depends on diagnostic assessments, where goal-setting and goal-focused activities are enabled by an initial needs assessment with adjustments based on periodic formative assessments. Yet Couper’s research shows “very few teachers use diagnostics for pronunciation” (2022, p. 180). Conducting diagnostic assessments, developing/adjusting plans based upon them, and communicating those results/plans is perceived as prohibitively time consuming. We discuss (and illustrate) how a new pronunciation teaching support system, Accent Explorer (AE), potentially helps remediate that time crunch. Whenever a recording made in AE’s student app is submitted to AE’s teacher app, it is extensively annotated with AI-detected, importance-ranked (supra-)segmental issues. The flagged issues are formulated as suggestions for the teacher which may be accepted, adjusted, augmented, or discarded. The teacher-arbitrated issues (plus any added notes) are returned to the student as targeted feedback which the student can fully explore (graphically and auditorily, in addition to textual explanations). The identified issues are also logged, allowing generation of (date range-constrained) summaries of pronunciation issues, supporting diagnostic assessment. Suggestions for optimizing pedagogical practices via this new technology’s assessment (and other) affordances will be offered.
Archer, G., Červinková Poesová, K., Duckinoska-Mihajlovska, I., Rocha, A.P. B., & Walesiak. B. (forthcoming). ‘Finding your tribe’: How membership of a pronunciation-focused teacher association can positively impact classroom instruction and beyond. In A. Kirkova-Naskova, & E. Tergujeff (Eds.), Achievements in Second Language Pronunciation: Good Practices for L2 Teaching and Learning. Cambridge University Press.
Archer, G., Červinková Poesová, K., Duckinoska-Mihajlovska, I., Rocha, A.P. B., & Walesiak. B. (forthcoming). ‘Finding your tribe’: How membership of a pronunciation-focused teacher association can positively impact classroom instruction and beyond. In A. Kirkova-Naskova, & E. Tergujeff (Eds.), Achievements in Second Language Pronunciation: Good Practices for L2 Teaching and Learning. Cambridge University Press.
Abstract soon
Zawadzki, Z., Challis, K., Goodale, E., Guskaroska, A., Walesiak, B., & Levis, J. (2023) “The Inspiration for Creating the Best of Teaching Tips”, Pronunciation in Second Language Learning and Teaching Proceedings 1. https://doi.org/10.31274/psllt.16935
Zawadzki, Z., Challis, K., Goodale, E., Guskaroska, A., Walesiak, B., & Levis, J. (2023) “The Inspiration for Creating the Best of Teaching Tips”, Pronunciation in Second Language Learning and Teaching Proceedings 1. https://doi.org/10.31274/psllt.16935
This introduction gives a brief overview of the Pronunciation in Second Language Learning and Teaching (PSLLT), describes what teaching tips are, and explains the motivation behind taking on the creation of this collection.
Walesiak, B. (2023). Supporting receptive and productive pronunciation of accents through online tools. In D. Bullock (Ed.), IATEFL 2022 Belfast Conference Selections. (pp. 203–205). Faversham: IATEFL. ISBN: 978-1-912588-44-2. https://www.iatefl.org/resources/iatefl-conference-selections-2022-printed-edition
Walesiak, B. (2023). Supporting receptive and productive pronunciation of accents through online tools. In D. Bullock (Ed.), IATEFL 2022 Belfast Conference Selections. (pp. 203–205). Faversham: IATEFL. ISBN: 978-1-912588-44-2. https://www.iatefl.org/resources/iatefl-conference-selections-2022-printed-edition
Students enroll in (online) courses, hoping not only to understand native speakers better but often to speak more like them. Many do not seem to be aware of the fact that speakers who are perceived as strongly accented can also be highly intelligible (Levis, 2020) and that awareness of diversity of accents in today’s world is key to learning and teaching pronunciation, not the standard model exclusively.
Walesiak, B. (2021). Mobile apps for pronunciation training. Exploring learner engagement and retention. In A. Kirkova-Naskova, A. Henderson & J. Fouz-González. (Eds.), English Pronunciation Instruction: Research-based insights (pp. 357-384). John Benjamins. https://doi.org/10.1075/aals.19.15wal
Walesiak, B. (2021). Mobile apps for pronunciation training. Exploring learner engagement and retention. In A. Kirkova-Naskova, A. Henderson & J. Fouz-González. (Eds.), English Pronunciation Instruction: Research-based insights (pp. 357-384). John Benjamins. https://doi.org/10.1075/aals.19.15wal
This chapter explores adult learners’ perceptions towards using apps for pronunciation training and the engagement and retention rates of mobile technologies for pronunciation practice. The study focuses on five apps that address different areas of pronunciation selected by the teacher-researcher. The findings show that learners are keen on using pronunciation apps to learn pronunciation, but the frequency of self-reported mobile app use and time devoted to their use dwindle without the teacher’s guidance. A discussion of the learners’ engagement with and retention of the apps is offered, together with a number of recommendations to facilitate the integration of apps into pronunciation instruction.
Walesiak, B. (2019). English After RP – Standard British Pronunciation Today (Review). Speak Out! Journal of the IATEFL Pronunciation Special Interest Group, 61, 44-47. ISSN: 2313-7703. https://pronsig.iatefl.org/journal/
Walesiak, B. (2018). Beyond Repeat After Me – Teaching pronunciation to English learners (Review). Speak Out! Journal of the IATEFL Pronunciation Special Interest Group, 59, 43-45. ISSN: 2313-7703. https://pronsig.iatefl.org/journal/
Walesiak, B. (2017). Mobile pronunciation apps: a personal investigation. Speak Out! Journal of the IATEFL Pronunciation Special Interest Group, 57, 16-28. ISSN: 2313-7703. https://www.academia.edu/34720473/Mobile_pron_apps_a_personal_investigation
Walesiak, B. (2017). Mobile pronunciation apps: a personal investigation. Speak Out! Journal of the IATEFL Pronunciation Special Interest Group, 57, 16-28. ISSN: 2313-7703. https://www.academia.edu/34720473/Mobile_pron_apps_a_personal_investigation
In this article, I provide a subjective list of pronunciation apps that might be useful when practising pronunciation in class, teaching online or assigning out-of-class self-study. This is followed by a few practical tips concerning the ways you can incorporate apps in the process of English language teaching. Whether you teach pronunciation as a separate skill or simply integrate it spontaneously into your lessons, mobile apps can definitely help you create a stimulating classroom environment, improve the quality and effectiveness of your teaching and reinforce to students the need to learn pronunciation.
Walesiak B. (2015). Fjuczersy, cudofiksingi, market mejkerzy – samples of a speech corpus of the Polish stock market sociolect. Language contact in specialist speech. In D. Stanulewicz (Ed.), Beyond Philology (pp. 59-75). Gdańsk: Wydawnictwo Uniwersytetu Gdańskiego. ISSN: 1732-1220. https://fil.ug.edu.pl/sites/default/files/_nodes/strona-filologiczny/33797/files/beyond_philology_no_12_2015.pdf
Walesiak B. (2015). Fjuczersy, cudofiksingi, market mejkerzy – samples of a speech corpus of the Polish stock market sociolect. Language contact in specialist speech. In D. Stanulewicz (Ed.), Beyond Philology (pp. 59-75). Gdańsk: Wydawnictwo Uniwersytetu Gdańskiego. ISSN: 1732-1220. https://fil.ug.edu.pl/sites/default/files/_nodes/strona-filologiczny/33797/files/beyond_philology_no_12_2015.pdf
Foreign language awareness helps break down communication barriers in the area of business and information in Poland. Market participants, in particular, choose to interweave their Polish utterances with phrasings or loan words of English origin in order to maintain effective and efficient communication. Changes take place irrespective of the efforts made to discourage English borrowings from their transgression into everyday usage, which leaves much room for research in the field of Englishization of business communication in Poland, as well as its effects on the culture of the Polish language as a whole. The article comments on some aspects of this communication on the market in Poland. It also lists some samples of the English borrowings isolated from a speech corpus related to the area of the capital markets and presents informants’ views on interferences recorded in the process of isolating the spoken data.
Dziczek-Karlikowska, H., & Mikołajewska, B. (2015). Computer-assisted awareness raising of L2 phonology: pronunciation in commercials – a pilot study. In A. Turula, B. Mikołajewska & D. Stanulewicz (Eds.), Insights into Technology Enhanced Language Pedagogy (pp. 163-174). Frankfurt: Peter Lang. ISBN: 9783631656693. https://doi.org/10.3726/978-3-653-04995-4
Dziczek-Karlikowska, H., & Mikołajewska, B. (2015). Computer-assisted awareness raising of L2 phonology: pronunciation in commercials – a pilot study. In A. Turula, B. Mikołajewska & D. Stanulewicz (Eds.), Insights into Technology Enhanced Language Pedagogy (pp. 163-174). Frankfurt: Peter Lang. ISBN: 9783631656693. https://doi.org/10.3726/978-3-653-04995-4
Linguists agree that TV and the Internet are now the sources of the influx of English-originating phrases in today’s Polish, and that commercials in particular may influence the respondents’ vocabulary quite significantly. Advantages of the process notwithstanding, there are also a few problems related to this. Copywriters resort to the use of various linguistic devices – many of them of English origin – in order to meet the demands of the brand. In doing so, they use erroneous pronunciation to achieve marketing goals, which may have a negative effect on the perception of the lexeme and its performance by Polish learners of English (on different levels of advancement). The paper starts by analysing the phenomena described above. Then it goes on to present a computer-aided Moodle-based pronunciation class, the intricacies of its design and some ready-made tasks. It closes with the feedback obtained from participants (students of English Studies and Open University students) in the course of the pilot study.
co-edited volumes
Zawadzki, Z., Challis, K., Goodale, E., Guskaroska, A., Walesiak, B., & Levis, J. (2023). Pronunciation in Second Language Learning and Teaching Proceedings 1. doi: https://doi.org/10.31274/psllt.16935
Turula, A., Mikołajewska, B., & Stanulewicz, D. (2015). Insights into Technology Enhanced Language Pedagogy. Frankfurt: Peter Lang. ISBN: 9783631656693. https://doi.org/10.3726/978-3-653-04995-4
Łukasik, M., & Mikołajewska B. (2014). Języki specjalistyczne wczoraj, dziś i jutro. Warszawa: Wydawnictwo Naukowe IKL@. ISBN: 978-83-64020-12-4. http://sn.iksi.uw.edu.pl/wp-content/uploads/sites/306/2018/09/SN-17-Marek-%C5%81ukasik-Beata-Miko%C5%82ajewska-red.-J%C4%99zyki-specjalistyczne-wczoraj-dzi%C5%9B-i-jutro.pdf
Patents
US-9635438-B2 Providing secondary content to accompany a primary content item
Issued: 2017 US (also KR, CA, MX, EP, PCT)
Assignee: General Instrument (acquired by Arris)
US-9635438-B2 Providing secondary content to accompany a primary content item
Issued: 2017 US (also KR, CA, MX, EP, PCT)
Assignee: General Instrument (acquired by Arris)
While a user views a primary content item (for example, a movie on a television screen), secondary content items are selected and presented to the user, either on the same screen or on a screen of the user’s companion device. To choose selections that are relevant to the user’s current interests, the selection process considers information beyond the realm of primary and secondary content. Over time, the selection process learns to make more relevant selections by monitoring selection choices made by other systems.
US-9081868-B2 Voice web search
Issued: 2015 US
Assignee: Google
US-9081868-B2 Voice web search
Issued: 2015 US
Assignee: Google
A search system will receive a voice query and use speech recognition with a predefined vocabulary to generate a textual transcription of the voice query. Queries are sent to a text search engine, retrieving multiple web page results for each of these initial text queries. The collection of the keywords is extracted from the resulting web pages and is phonetically indexed to form a voice query dependent and phonetically searchable index database. Finally, a phonetically-based voice search engine is used to search the original voice query against the voice query dependent and phonetically searchable index database to find the keywords and/or key phrases that best match what was originally spoken. The keywords and/or key phrases that best match what was originally spoken are then used as a final text query for a search engine. Search results from the final text query are then presented to the user.
US-8838435-B2 Communication processing
Issued: 2014 US
Assignee: General Instrument
US-8838435-B2 Communication processing
Issued: 2014 US
Assignee: General Instrument
Disclosed are methods and apparatus for processing linguistic expressions (e.g., opinionated text documents). The linguistic expressions are processed by, firstly, detecting topics of interest discussed in the linguistic expressions. The sentiment, or sentiments, of an originator with respect to each of the topics detected in the linguistic expressions is then assessed. The originators are then grouped (or clustered) into one or more groups based on the similarities between the originators’ respective sets of detected topics and corresponding sentiments. Semantic information is then associated with a given group. Finally, for a given member of a given group, a profile is created or updated. This profile comprises attributes that may be based on a degree of membership of the given member to the given group and the semantic information associated with the given group.
US-7818166-B2 Method and apparatus for intention based communications for mobile communication devices
Issued: 2010 US (also PCT)
US-7818166-B2 Method and apparatus for intention based communications for mobile communication devices
Issued: 2010 US (also PCT)
A method and apparatus for intention based communications in a mobile communication device is disclosed. The method may include receiving an input from a user of the mobile communication device, converting speech portions in the user’s input into linguistic representations, generating a phoneme lattice based on the linguistic representations, scoring stored intention n-grams against the generated phoneme lattice, scoring intentions from the intention n grams, determining the highest scoring intention, determining whether the highest scoring intention is above a predetermined threshold, wherein if the highest scoring intention is above the predetermined threshold, executing the determined intention.
US-7319958-B2 Polyphone network method and apparatus
Issued: 2008 US
Assignee: Motorola
US-7319958-B2 Polyphone network method and apparatus
Issued: 2008 US
Assignee: Motorola
Acoustic phones (preferably drawn 12 from a plurality of spoken languages) are provided 11. A hierarchically-organized polyphone network (20) organizes views of these phones of varying resolution and phone categorization as a function, at least in part, of phonetic similarity (14) and at least one language-independent phonological factor (15). In a preferred approach, a unique transcription system serves to represent the phones using only standard, printable ASCII characters, none of which comprises a special character (such as those characters that have a command significance for common script interpreters such as the UNIX command line).
US-7181397-B2 Speech dialog method and system
Issued: 2007 US (also EP, CN, ES, DE, AT, PCT)
Assignee: Motorola (acquired by Google)
US-7181397-B2 Speech dialog method and system
Issued: 2007 US (also EP, CN, ES, DE, AT, PCT)
Assignee: Motorola (acquired by Google)
An electronic device (300) for speech dialog includes functions that receive (305, 105) a speech phrase that comprises a request phrase that includes an instantiated variable (215), generate (335, 115) pitch and voicing characteristics (315) of the instantiated variable, and performs speech recognition (319, 125) of the instantiated variable to determine a most likely set of acoustic states (235). The electronic device may generate (335, 140) a synthesized value of the instantiated variable using the most likely set of acoustic states and the pitch and voicing characteristics of the instantiated variable. The electronic device may use a table of previously entered values of variables that have been determined to be unique, and in which the values are associated with a most likely set of acoustic states and the pitch and voicing characteristics determined at the receipt of each value to disambiguate (425, 430) a newly received instantiated variable.
US-5857173-A Pronunciation measurement device and method
Issued: 1999 US (also CN, JP)
Assignee: Motorola (acquired by Google)
US-5857173-A Pronunciation measurement device and method
Issued: 1999 US (also CN, JP)
Assignee: Motorola (acquired by Google)
Upon selection of an expression for pronunciation training, a look-up operation is performed in a speaker database (15) to obtain a predetermined model for comparison with a voice of a user received at an input (11). A speech modeling element models speech of a native speaker. The voice input is applied to the modeling element (102-107) and an analysis is carried out of the comparison, in correlation and in duration, between a phoneme or sub-word of the input and a phoneme or sub-word of the native speaker to provide a score, including a score for the correlation and a score for the duration. The score is analyzed with respect to a score for a predetermined speaker in an analysis element (40). An indicator device (16) coupled to the output of the analysis element indicates the result in a graphical illustration. A tracking tool indicates state of progress of the voice of the speaker.
The following talks include selected presentations, webinars, workshops, courses, meet-ups and other:
Jim Talley
- (Dec 2024). Comprehensive visual and auditory feedback in support of teaching/learning pronunciation: Introducing Accent Explorer, 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
- (Dec 2024). A pronunciation and speech coaching (PSC) apps search engine for teachers: A research-driven solution, 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
- (Sep 2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology, 15th annual Pronunciation in Second Language Learning and Teaching Conference (PSLLT), Iowa State University, Ames, US.
- (Sep 2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps, 15th annual Pronunciation in Second Language Learning and Teaching Conference (PSLLT), Iowa State University, Ames, US.
- (May 2023). Identification of non-native English speakers’ L1s via patterns of prosodic feature deviance from native speaker norms. Talk for Rev.com, Austin, TX. May 11, 2023.
- (Feb 2022). PARP at its simplest: A simple 2D graphical explanation of the PARP Method. Presentation for the Austin Deep Learning Meetup. Feb. 1, 2022. Austin, TX.
- (Aug 2019). Ethics in AI panel discussion. Panelist for Austin Deep Learning, Aug 13, 2019. https://youtu.be/ueingD-R2sw?t=2752
- (Feb 2011). Towards a brighter future for cross-disciplinary ASR. Invited talk given at the Bielefeld Workshop on Developmental Speech Recognition at CITEC, Bielefeld University, Germany. February 17, 2011. http://cit-ec.uni-bielefeld.de/en/events/bielefeld-workshop-developmental-speech-recognition [abstract]
- (Dec 1998). The Voice User Interface (VoiceUI) system. Internal presentation for Motorola Research on a system that I built. Dec. 18, 1998. Austin, TX / Schaumburg, IL.
- (1998). In Transition: The description and analysis of CVC formant trajectories. Linguistic Colloquy for the Department of Linguistics, Univ. of Kansas. Lawrence, KS.
- (Apr 1997). Blackboard technology as a basis for multimodal HCI research. Talk at the Center for Human-Computer Communication at the Oregon Graduate Institute of Science and Technology. Apr. 21, 1997. Beaverton-Hillsboro, Oregon.
- (Oct 1989). Syllable structure in Brazilian Portuguese: An argument for Ambisyllabicity. Talk given at the II Colloquium on Hispanic and Luso-Brazilian Literatures and Romance Linguistics, University of Texas, Austin, TX. Oct. 14, 1989.
- and more
Beata Walesiak
- (Dec 2024). A pronunciation and speech coaching (PSC) apps search engine for teachers: A research-driven solution, 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
- (Dec 2024). Comprehensive visual and auditory feedback in support of teaching/learning pronunciation: Introducing Accent Explorer, 17th International Conference on Native and Non-Native Accents of English. Accents 2024, Łódź, Poland.
- (Sep 2024). Assessment and feedback mechanisms in pronunciation and speech coaching apps, 15th annual Pronunciation in Second Language Learning and Teaching Conference (PSLLT), Iowa State University, Ames, US.
- (Sep 2024). Facilitation of individualized L2 pronunciation assessment and training via novel teacher-assistive technology, 15th annual Pronunciation in Second Language Learning and Teaching Conference (PSLLT), Iowa State University, Ames, US.
- (Oct 2023). English Pronunciation with the Support of AI. Intensive course at UOUW – Open University at University of Warsaw, Poland.
- (May 2023). ChatGPT for teachers of pronunciation? Webinar, CATESOL Teaching of Pronunciation Interest Group.
- (Dec 2023). AI for pronunciation learning – do apps teach accents?, 16th International Conference on Native and Non-Native Accents of English, Accents2023, University of Łódź, Poland.
- (Oct 2022). Speech Analyzer as an AI-based in-class exam predictor. Talk, PronSIG Online Conference
- (May 2022). Supporting receptive and productive pronunciation of accents through online tools. Presentation, Pronunciation Showcase Day at IATEFL 2022, Belfast, Ireland.
- (Jan 2022). Using mobile apps for pronunciation training. Workshop, PronSwap, Université Grenoble Alpes, France.
- (Dec 2021). Use of mobile apps and other technologies for prosody training. 14th International Conference on Native and Non-Native Accents of English, Accents2021, University of Łódź, Poland.
- (Nov 2021). Didactics of English phonetics using digital tools and mobile applications. Online workshop, University of Rzeszów, Poland.
- (Oct 2020). Technology for pronunciation training and accents? Poster presentation, PronSIG Online Conference, Scotland.
- (Dec 2019). Adult learners on a mobile-assisted pronunciation course: needs analysis and post-course feedback. 13th International Conference on Native and Non-Native Accents of English, Accents2019, University of Łódź, Poland.
- (Nov/Dec 2018). Mobile Pronunciation Apps. 12th International Conference on Native and Non-Native Accents of English Accents2018, University of Łódź, Poland.
- (Jun 2018). Mobile apps for teaching and learning English pronunciation. Presentation, The Flying School of Linguistics Applied, University of Warsaw, Poland.
- (Nov 2017). Teaching Pron with Apps. Plenary talk, 4th International Conference on ESP, EAP, EMI in the Context of Higher Education Internationalization, National University of Science and Technology in Moscow.
- and more