Mila is hosting its first quantum computing hackathon on November 21, a unique day to explore quantum and AI prototyping, collaborate on Quandela and IBM platforms, and learn, share, and network in a stimulating environment at the heart of Quebec’s AI and quantum ecosystem.
This new initiative aims to strengthen connections between Mila’s research community, its partners, and AI experts across Quebec and Canada through in-person meetings and events focused on AI adoption in industry.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Recent work has identified properties of pretrained self-attention models that mirror those of dependency parse structures. In particular, s… (see more)ome self-attention heads correspond well to individual dependency types. Inspired by these developments, we propose a new competitive mechanism that encourages these attention heads to model different dependency relations. We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task. Experiment results show that UDGN achieves very strong unsupervised dependency parsing performance without gold POS tags and any other external information. The competitive gated heads show a strong correlation with human-annotated dependency types. Furthermore, the UDGN can also achieve competitive performance on masked language modeling and sentence textual similarity tasks.
2022-01-01
Annual Meeting of the Association for Computational Linguistics (published)
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers) (published)
Syntax is fundamental to our thinking about language. Failing to capture the structure of input language could lead to generalization proble… (see more)ms and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with an incremental parser and maintains the conditional probability setting of a standard language model (left-to-right). To train the incremental parser and avoid exposure bias, we also propose a novel dynamic oracle, so that SOM is more robust to wrong parsing decisions. Experiments show that SOM can achieve strong results in language modeling, incremental parsing, and syntactic generalization tests while using fewer parameters than other models.
2021-06-01
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (published)
Pretrained language models have significantly 001 improved the performance of down-stream 002 language understanding tasks, including ex-00… (see more)3 tractive question answering, by providing 004 high-quality contextualized word embeddings. 005 However, training question answering models 006 still requires large amounts of annotated data 007 for specific domains. In this work, we pro-008 pose a cooperative, self-play learning frame-009 work, REGEX, for automatically generating 010 more non-trivial question-answer pairs to im-011 prove model performance. REGEX is built 012 upon a masked answer extraction task with an 013 interactive learning environment containing an 014 answer entity REcognizer, a question Gener-015 ator, and an answer EXtractor. Given a pas-016 sage with a masked entity, the generator gen-017 erates a question around the entity, and the 018 extractor is trained to extract the masked en-019 tity with the generated question and raw texts. 020 The framework allows the training of question 021 generation and answering models on any text 022 corpora without annotation. We further lever-023 age a reinforcement learning technique to re-024 ward generating high-quality questions and to 025 improve the answer extraction model’s perfor-026 mance. Experiment results show that REGEX 027 outperforms the state-of-the-art (SOTA) pre-028 trained language models and transfer learning 029 approaches on standard question-answering 030 benchmarks, and yields the new SOTA per-031 formance under given model size and transfer 032 learning settings. 033
Learning Task Decomposition with Ordered Memory Policy Network
Many complex real-world tasks are composed of several levels of sub-tasks. Humans leverage these hierarchical structures to accelerate the l… (see more)earning process and achieve better generalization. In this work, we study the inductive bias and propose Ordered Memory Policy Network (OMPN) to discover subtask hierarchy by learning from demonstration. The discovered subtask hierarchy could be used to perform task decomposition, recovering the subtask boundaries in an unstruc-tured demonstration. Experiments on Craft and Dial demonstrate that our modelcan achieve higher task decomposition performance under both unsupervised and weakly supervised settings, comparing with strong baselines. OMPN can also bedirectly applied to partially observable environments and still achieve higher task decomposition performance. Our visualization further confirms that the subtask hierarchy can emerge in our model.
Syntax is fundamental to our thinking about language. Although neural networks are very successful in many tasks, they do not explicitly mod… (see more)el syntactic structure. Failing to capture the structure of inputs could lead to generalization problems and over-parametrization. In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM). The model explicitly models the structure with a one-step look-ahead parser and maintains the conditional probability setting of the standard language model. Experiments show that SOM can achieve strong results in language modeling and syntactic generalization tests, while using fewer parameters then other models.
It is commonly believed that knowledge of syntactic structure should improve language modeling. However, effectively and computationally eff… (see more)iciently incorporating syntactic structure into neural language models has been a challenging topic. In this paper, we make use of a multi-task objective, i.e., the models simultaneously predict words as well as ground truth parse trees in a form called “syntactic distances”, where information between these two separate objectives shares the same intermediate representation. Experimental results on the Penn Treebank and Chinese Treebank datasets show that when ground truth parse trees are provided as additional training signals, the model is able to achieve lower perplexity and induce trees with better quality.
2020-07-01
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (published)
Stack-augmented recurrent neural networks (RNNs) have been of interest to the deep learning community for some time. However, the difficulty… (see more) of training memory models remains a problem obstructing the widespread use of such models. In this paper, we propose the Ordered Memory architecture. Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory. We also introduce a new Gated Recursive Cell to compose lower-level representations into higher-level representation. We demonstrate that our model achieves strong performance on the logical inference task (Bowman et al., 2015) and the ListOps (Nangia and Bowman, 2018) task. We can also interpret the model to retrieve the induced tree structure, and find that these induced structures align with the ground truth. Finally, we evaluate our model on the Stanford Sentiment Treebank tasks (Socher et al., 2013), and find that it performs comparatively with the state-of-the-art methods in the literature.