ILLS 11: Some tools for Digital Linguistics

I will be presenting a talk titled 'Some tools for digital linguistics' at the 11th Annual Meeting of the Illinois Language & Linguistics Society, being held at the University of Illinois at Urbana-Champaign from April 26–28, 2019. Check out the abstract below!

Abstract: Some tools for digital linguistics

This talk presents several web-based tools developed to assist linguists in the process of data management and manipulation, discusses some of the principles underlying these tools and how they abet data longevity and Big Data approaches to linguistics, and offers a brief live demonstration of different tools.

Linguists face a number of pervasive problems when it comes to managing their linguistic data, and productively utilizing their data in their research:

  • Most software for linguistics is platform-specific (i.e. Windows- or Mac-only)
  • Existing linguistics software does not synchronize data across computers easily (or at all)
  • Backup solutions are limited; few cloud backup solutions exist
  • Different software programs use different formats for storing data; switching software means converting between data formats (and data is sometimes lost in the process)
  • Most linguistics software is task-specific, designed to accomplish just one thing; switching tasks requires switching programs, and manually keeping data in sync from one program to the next
  • Datasets are not easily citable or shareable with existing technologies

Solving these problems is the domain of digital linguistics, the science of digital data management for linguistics, including the digital storage, representation, manipulation, and dissemination of linguistic data, crucially informed by ethical considerations and best practices in documentary linguistics. While documentary linguists have been aware of the need for solutions to these issues since at least Bird & Simons (2003), which lays out seven principles for data management in documentary linguistics, little attention has been paid to how such principles are to be implemented. That is, what are the best ways to meet these desiderata in practice?

I suggest that one of the best ways to address these concerns is for linguists to begin utilizing the modern web platform to its fullest, taking advantage of existing solutions to similar concerns regarding data management on the web. Many of the hurdles listed above are already well-known in web software development, and numerous robust solutions exist. In this talk I show how utilizing these technologies goes a long way towards solving the problems of digital linguistics. I present the Data Format for Digital Linguistics (DaFoDiL) (Hieber 2019), a suggested JSON-based data format for storing and disseminating linguistic data on the web, and explain its advantages. I then demonstrate several web-based tools for managing linguistic data that utilize this format, including a transliteration tool, a tool for converting linguistic data between different formats (e.g. FLEx ↔ ELAN), and a lexicon management tool.


  • Bird, Steven & Gary Simons. 2003. Seven dimensions of portability for language documentation and description. Language 79(3): 557–582.
  • Hieber, Daniel W. 2019. Data Format for Digital Linguistics. DOI:10.5281/zenodo.1438589