Kyunghyun Cho

Ruslan Salakhutdinov

Fernando Pereira

2025-12-17

OpenReview (unknown)

Synthetic Data Generation and Joint Learning for Robust Code-Mixed Translation

Hi Bn

Ramakrishna Appicharla

Kamal Kumar

Asif Gupta

Dzmitry Bahdanau

Yoshua Ben

Ondrej Bojar

Christian Buck

Christian Federmann

Yong Cheng

Lu Jiang

Wolfgang Macherey

Alexis Conneau

Guillaume Lample. 2019

Cross

Yinhan Liu

Jiatao Gu

Naman Goyal

Sergey Xian Li … (see 45 more)

Carol MyersScotton. 1997

El Moatez

Billah Nagoudi

AbdelRahim Elmadany

Muhammad AbdulMageed. 2021. Investigat

Myle Ott

Sergey Edunov

Alexei R Baevski

Parth Patwa

Gustavo Aguilar

Sudipta Kar

Suraj

Srinivas Pandey

Björn Pykl

Gambäck

Tanmoy

Ashish Vaswani

Noam M. Shazeer

Niki Parmar

dukasz Kaiser

Illia Polosukhin. 2017

Attention

Genta Indra Winata

Andrea Madotto

ChienSheng

Wu Pascale

Fung

Codeswitching

ing. In

Felix Wu

Angela Fan

Yann Dauphin

Linting Xue

Noah Constant

Mihir Adam Roberts

Rami Kale

Aditya AlRfou

Aditya Siddhant

Barua

Shuyan Zhou

Xiangkai Zeng

Antonios Yingqi Zhou

Anastasopoulos Graham

Neubig. 2019

Im

The widespread online communication in a modern multilingual world has provided opportunities to blend more than one language (aka code-mixe… (see more)d language) in a single utterance. This has resulted a formidable challenge for the computational models due to the scarcity of annotated data and presence of noise. A potential solution to mitigate the data scarcity problem in low-resource setup is to leverage existing data in resource-rich language through translation. In this paper, we tackle the problem of code-mixed (Hinglish and Bengalish) to English machine translation. First, we synthetically develop HINMIX, a parallel corpus of Hinglish to English, with ~4.2M sentence pairs. Subsequently, we propose RCMT, a robust perturbation based joint-training model that learns to handle noise in the real-world code-mixed text by parameter sharing across clean and noisy words. Further, we show the adaptability of RCMT in a zero-shot setup for Bengalish to English translation. Our evaluation and comprehensive analyses qualitatively and quantitatively demonstrate the superiority of RCMT over state-of-the-art code-mixed and robust translation methods.

2024-03-24

ArXiv (preprint)

Dynamic Neural Turing Machine with Continuous and Discrete Addressing Schemes

We extend the neural Turing machine (NTM) model into a dynamic neural Turing machine (D-NTM) by introducing trainable address vectors. This … (see more)addressing scheme maintains for each memory cell two separate vectors, content and address vectors. This allows the D-NTM to learn a wide variety of location-based addressing strategies, including both linear and nonlinear ones. We implement the D-NTM with both continuous and discrete read and write mechanisms. We investigate the mechanisms and effects of learning to read and write into a memory through experiments on Facebook bAbI tasks using both a feedforward and GRU controller. We provide extensive analysis of our model and compare different variations of neural Turing machines on this task. We show that our model outperforms long short-term memory and NTM variants. We provide further experimental results on the sequential [Formula: see text]MNIST, Stanford Natural Language Inference, associative recall, and copy tasks.

2018-03-31

Neural Computation (published)

Fine-grained attention mechanism for neural machine translation

Heeyoul Choi

2018-03-31

Neurocomputing (published)

Boundary Seeking GANs

R Devon Hjelm

Athul Jacob

Adam Trischler

Gerry Che

2018-02-14

International Conference on Learning Representations (published)

On integrating a language model into neural machine translation

2017-08-31

Computer Speech and Language (published)

Multi-way, multilingual neural machine translation

Orhan Firat

Baskaran Sankaran

F. Yarman-Vural

2017-08-31

Computer Speech and Language (published)

ReSeg: A Recurrent Neural Network-based Model for Semantic Segmentation

Francesco Visin

Adriana Romero

Matteo Matteucci

Marco Ciccone

Kyle Kastner

Aaron Courville

We propose a structured prediction architecture, which exploits the local generic features extracted by Convolutional Neural Networks and th… (see more)e capacity of Recurrent Neural Networks (RNN) to retrieve distant dependencies. The proposed architecture, called ReSeg, is based on the recently introduced ReNet model for image classification. We modify and extend it to perform the more challenging task of semantic segmentation. Each ReNet layer is composed of four RNN that sweep the image horizontally and vertically in both directions, encoding patches or activations, and providing relevant global information. Moreover, ReNet layers are stacked on top of pre-trained convolutional layers, benefiting from generic local features. Upsampling layers follow ReNet layers to recover the original image resolution in the final predictions. The proposed ReSeg architecture is efficient, flexible and suitable for a variety of semantic segmentation tasks. We evaluate ReSeg on several widely-used semantic segmentation datasets: Weizmann Horse, Oxford Flower, and CamVid; achieving state-of-the-art performance. Results show that ReSeg can act as a suitable architecture for semantic segmentation tasks, and may have further applications in other structured prediction problems. The source code and model hyperparameters are available on https://github.com/fvisin/reseg.

2016-06-30

2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) (published)

First Result on Arabic Neural Machine Translation

Amjad Almahairi

Nizar Habash

Aaron Courville

Neural machine translation has become a major alternative to widely used phrase-based statistical machine translation. We notice however tha… (see more)t much of research on neural machine translation has focused on European languages despite its language agnostic nature. In this paper, we apply neural machine translation to the task of Arabic translation (Ar En) and compare it against a standard phrase-based translation system. We run extensive comparison using various configurations in preprocessing Arabic script and show that the phrase-based and neural translation systems perform comparably to each other and that proper preprocessing of Arabic script has a similar effect on both of the systems. We however observe that the neural machine translation significantly outperform the phrase-based system on an out-of-domain test set, making it attractive for real-world deployment.

2016-06-07

ArXiv (preprint)

A Controller Recognizer Framework: How necessary is recognition for control?

Recently there has been growing interest in building active visual object recognizers, as opposed to the usual passive recognizers which cla… (see more)ssifies a given static image into a predefined set of object categories. In this paper we propose to generalize these recently proposed end-to-end active visual recognizers into a controller-recognizer framework. A model in the controller-recognizer framework consists of a controller, which interfaces with an external manipulator, and a recognizer which classifies the visual input adjusted by the manipulator. We describe two most recently proposed controller-recognizer models: recurrent attention model and spatial transformer network as representative examples of controller-recognizer models. Based on this description we observe that most existing end-to-end controller-recognizers tightly, or completely, couple a controller and recognizer. We ask a question whether this tight coupling is necessary, and try to answer this empirically by building a controller-recognizer model with a decoupled controller and recognizer. Our experiments revealed that it is not always necessary to tightly couple them and that by decoupling a controller and recognizer, there is a possibility of building a generic controller that is pretrained and works together with any subsequent recognizer.

2016-02-17

ICLR.cc/2016/workshop (unknown)

Nicolas Boulanger-Lewandowski

Theano: A Python framework for fast computation of mathematical expressions

Rami Al-Rfou

Guillaume Alain

Amjad Almahairi

Christof Angermueller

Dzmitry Bahdanau

Nicolas Ballas

Frédéric Bastien

Justin Bayer

Anatoly Belikov

Alexander Belopolsky

Josh Bleecher Snyder

Xavier Bouthillier

Alexandre De Brébisson

Olivier Breuleux … (see 92 more)

Pierre-Luc Carrier

Paul Christiano

Myriam Côté

Yann N. Dauphin

Julien Demouth

Sander Dieleman

Samira Ebrahimi Kahou

Ziye Fan

Mathieu Germain

Matt Graham

Balázs Hidasi

Arjun Jain

Kai Jia

Mikhail Korobov

Vivek Kulkarni

Alex Lamb

Pascal Lamblin

Eric Larsen

César Laurent

Sean Lee

Simon Lefrancois

Simon Lemieux

Nicholas Léonard

Zhouhan Lin

Jesse A. Livezey

Cory Lorenz

Jeremiah Lowin

Qianli Ma

Pierre-Antoine Manzagol

Robert T. McGibbon

Mehdi Mirza

Alberto Orlandi

Christopher Pal

Razvan Pascanu

Mohammad Pezeshki

Colin Raffel

Daniel Renshaw

Matthew Rocklin

Adriana Romero

Markus Roth

Peter Sadowski

John Salvatier

François Savard

Jan Schlüter

John Schulman

Gabriel Schwartz

Iulian Vlad Serban

Dmitriy Serdyuk

Samira Shabanian

Etienne Simon

Sigurd Spieckermann

S. Ramana Subramanyam

Gijs van Tulder

Sebastian Urban

Dustin J. Webb

Matthew Willson

Lijun Xue

Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (see more)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models. The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.

2015-12-31

arXiv (preprint)

Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks

Aaron Courville

Whereas deep neural networks were first mostly used for classification tasks, they are rapidly expanding in the realm of structured output p… (see more)roblems, where the observed target is composed of multiple random variables that have a rich joint distribution, given the input. In this paper we focus on the case where the input also has a rich structure and the input and output structures are somehow related. We describe systems that learn to attend to different places in the input, for each element of the output, for a variety of tasks: machine translation, image caption generation, video clip description, and speech recognition. All these systems are based on a shared set of building blocks: gated recurrent neural networks and convolutional neural networks, along with trained attention mechanisms. We report on experimental results with these systems, showing impressively good performance and the advantage of the attention mechanism.

2015-10-31

IEEE Transactions on Multimedia (published)