StatFSM 2016

Call for Papers

StatFSM 2016: ACL Workshop on Statistical Natural Language Processing and Weighted Automata

12 August, 2016, Berlin, Germany
Deadline for paper submissions: 8 May, 2016 (11:59pm GMT -12)

This workshop is endorsed by SIGFSM, the Special Interest Group on Finite State Automata and Natural Language Processing and OCR-D, the DFG coordination project for the improvement of OCR methods.

Workshop description

The past 20 years have seen a fundamental paradigm shift in the field of automated natural language processing: though long dominated by rule-based techniques, the vast majority of contemporary approaches are now based on underlying statistical models. Many classes of statistical models such as Hidden Markov Models have direct connections to graph- rsp. automata-theory. Open research questions remain however regarding the formal relation between automata and other popular statistical models such as Conditional Random Fields or Support Vector Machines.

The purpose of the workshop is to bring together researchers interested in statistical natural language processing, automata theory and application. While the interests and methods of these different communities overlap considerably, there has been little institutional recognition of shared problems and techniques.

Special Theme: Automata-based techniques in Optical Character Recognition

Increasing efforts by libraries and publishing houses to digitize sources not originating in electronic form and the resulting vast quantity of digitally available books has led in recent years to a commensurate demand for high-quality, flexible, and cost-efficient text transcription techniques, Optical Character Recognition (OCR) being of great interest in this regard. Paralleling the increased use of OCR techniques on the part of text providers, interest in computational linguistic research on the topic has grown as well, since many typical OCR-related tasks touch on the discipline's core issues. Our proposed special theme is aimed to reflect the growing interest in OCR-related topics from the fields of computational linguistics and digital humanities on the one hand, and to raise awareness of the associated challenges among the automata research community on the other.

Focus of content

We invite researchers to submit papers containing substantial, original, and unpublished research, potentially including strong work in progress. Appropriate topics include (but are not limited to) the following:

  • weighted automata, their theory and applications,
  • statistical NLP; in particular approaches using finite-state techniques,
  • results concerning the relation of statistical models and weighted automata,
  • automata-based formalizations or implementations of statistical methods,
  • machine learning approaches relating to the other topics,
  • machine learning of finite-state models of natural language,
  • systems and frameworks for OCR/OLR with a connection to automata-based methods,
  • statistical approaches to automated page segmentation and document analysis,
  • supervised or unsupervised extraction of lexica, language- or error-models for OCR post-correction, and
  • systems and frameworks for post-correction or -segmentation of OCR output texts, especially those making use of weighted automata.


All submissions should follow the ACL 2016 style guidelines and must be in PDF format. Style files are available for download from the ACL 2016 website at

Long papers which describe substantial, original, completed and unpublished work may consist of up to eight (8) pages of content, plus references. Short papers which report focused contributions, ongoing research, negative results or system descriptions may consist of up to four (4) pages of content, plus references.

Reviewing will be double-blind, and thus no author information should be included in the papers; self-reference should be avoided as well. Papers that do not conform to these requirements will be rejected without review. Accepted papers will appear in the workshop proceedings.

Papers should be submitted electronically using the Softconf START conference management system via: Please choose the appropriate submission type from the submission page.

Submissions must be uploaded by the submission deadline 8 May, 2016 (11:59pm GMT -12 hours).