NLP in the era of generative AI, cognitive sciences, and societal transformation
Join us at Mila in October for a three-day workshop to explore the transformative potential of language technologies and their implications for society.
This program is designed to provide decision-makers, policymakers and professional working in policy with a foundational understanding of AI technology.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Large language models (LLMs) have recently attracted considerable interest for their ability to perform complex reasoning tasks, such as cha… (see more)in-of-thought reasoning. However, most of the existing approaches to enhance this ability rely heavily on data-driven methods, while neglecting the structural aspects of the model's reasoning capacity. We find that while LLMs can manage individual reasoning steps well, they struggle with maintaining consistency across an entire reasoning chain. To solve this, we introduce planning tokens at the start of each reasoning step, serving as a guide for the model, and add their embeddings to the model parameters. Our approach requires a negligible increase in trainable parameters (just 0.001%) and can be applied through either full fine-tuning or a more parameter-efficient scheme. We demonstrate our method's effectiveness by applying it to three different LLMs, showing notable accuracy improvements across three math word problem datasets w.r.t. standard fine-tuning baselines.
One unexpected technique that emerged in recent years consists in training a Deep Network (DN) with a Self-Supervised Learning (SSL) method,… (see more) and using this network on downstream tasks but with its last few projector layers entirely removed. This trick of throwing away the projector is actually critical for SSL methods to display competitive performances on ImageNet for which more than 30 percentage points can be gained that way. This is a little vexing, as one would hope that the network layer at which invariance is explicitly enforced by the SSL criterion during training (the last projector layer) should be the one to use for best generalization performance downstream. But it seems not to be, and this study sheds some light on why. This trick, which we name Guillotine Regularization (GR), is in fact a generically applicable method that has been used to improve generalization performance in transfer learning scenarios. In this work, we identify the underlying reasons behind its success and show that the optimal layer to use might change significantly depending on the training setup, the data or the downstream task. Lastly, we give some insights on how to reduce the need for a projector in SSL by aligning the pretext SSL task and the downstream task.
Greenhouses are a key component of modernised agriculture, aiming for producing high-quality crops and plants. Furthermore, a network of gre… (see more)enhouses has enormous potential as part of demand response programs. Saving energy during off-peak time, reducing power consumption and delaying the start time of subsystems during on-peak time are some strategies that can be used to limit power exchanged with the main grid. In this work, a hierarchical distributed alternating direction method of multipliers-based model predictive control framework is proposed that has two main objectives: 1) providing appropriate conditions for greenhouses' crops and plants to grow, and 2) limiting the total power exchanged with the main grid. At each time step in the framework, an aggregator coordinates the greenhouses to reach a consensus and limit the total electric power exchanged while managing shared resources, e.g., reservoir water. The proposed framework's performance is investigated through a case study.
2023-01-01
IEEE Transactions on Sustainable Energy (published)
This paper is concerned with the identification in the limit from positive data of sub-stitutable context-free languages cfl s) over infinit… (see more)e alphabets. Clark and Eyraud (2007) showed that substitutable cfl s over finite alphabets are learnable in this learning paradigm. We show that substitutable cfl s generated by grammars whose production rules may have predicates that represent sets of potentially infinitely many terminal symbols in a compact manner are learnable if the terminal symbol sets represented by those predicates are learnable, under a certain condition. This can be seen as a result parallel to Argyros and D’Antoni’s work (2018) that amplifies the query learnability of predicate classes to that of symbolic automata classes. Our result is the first that shows such amplification is possible for identifying some cfl s in the limit from positive data.
The restrictions imposed by the COVID-19 pandemic required software development teams to adapt, being forced to work remotely and adjust the… (see more) software engineering activities accordingly. In the studies evaluating these effects, a few have assessed the impact on software engineering activities from a broader perspective and after a period of time when teams had time to adjust to the changes. No studies have been found comparing software startups and established companies either. This paper aims to investigate the impacts of COVID-19 on software development activities after one year of the pandemic restrictions, comparing the results between startups and established companies. Our approach was to design a cross-sectional survey and distribute it online among software development companies worldwide. The participants were asked about their perception of COVID-19’s pandemic impact on different software engineering activities: requirements engineering, software architecture, user experience design, software implementation, and software quality assurance. The survey received 170 valid answers from 29 countries, and for all the software engineering activities, we found that most respondents did not observe a significant impact. The results also showed that software startups and established companies were affected differently since, in some activities, we found a negative impact in the former and a positive impact in the latter. Regarding the time spent on each software engineering activity, most of the answers reported no change, but on those that did, the result points to an increase in time. Thus, we cannot find any relation between the change in time of effort and the reported positive or negative impact.