GPAI Report & Policy Guide: Towards Substantive Equality in AI
Join us at Mila on November 26 for the launch of the report and policy guide that outlines actionable recommendations for building inclusive AI ecosystems.
We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Theano: A Python framework for fast computation of mathematical expressions
Theano is a Python library that allows to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficie… (see more)ntly. Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements. Theano is being actively and continuously developed since 2008, multiple frameworks have been built on top of it and it has been used to produce many state-of-the-art machine learning models.
The present article is structured as follows. Section I provides an overview of the Theano software and its community. Section II presents the principal features of Theano and how to use them, and compares them with other similar projects. Section III focuses on recently-introduced functionalities and improvements. Section IV compares the performance of Theano against Torch7 and TensorFlow on several machine learning models. Section V discusses current limitations of Theano and potential ways of improving it.
Associative memories allow the retrieval of previously stored messages given a part of their content. In this paper, we are interested in as… (see more)sociative memories based on c-partite graphs that were recently introduced. These memories are almost optimal in terms of the amount of storage they require (efficiency) and allow retrieving messages with low complexity. We propose a generic implementation model for the retrieval algorithm that can be readily mapped to an integrated circuit and study the retrieval performance when hardware components are affected by faults. We show using analytical and simulation results that these associative memories can be made resilient to circuit faults with a minor modification of the retrieval algorithm. In one example, the memory retains 88% of its efficiency when 1% of the storage cells are faulty, or 98% when 0.1% of the binary outputs of the retrieval algorithm are faulty. When considering storage faults, the fault tolerance exhibited by the proposed associative memory can be comparable to using a capacity-achieving error correction code for protecting the stored information.
2016-02-15
IEEE Transactions on Signal Processing (published)
Associative memories allow the retrieval of previously stored messages given a part of their content. In this paper, we are interested in as… (see more)sociative memories based on c-partite graphs that were recently introduced. These memories are almost optimal in terms of the amount of storage they require (efficiency) and allow retrieving messages with low complexity. We propose a generic implementation model for the retrieval algorithm that can be readily mapped to an integrated circuit and study the retrieval performance when hardware components are affected by faults. We show using analytical and simulation results that these associative memories can be made resilient to circuit faults with a minor modification of the retrieval algorithm. In one example, the memory retains 88% of its efficiency when 1% of the storage cells are faulty, or 98% when 0.1% of the binary outputs of the retrieval algorithm are faulty. When considering storage faults, the fault tolerance exhibited by the proposed associative memory can be comparable to using a capacity-achieving error correction code for protecting the stored information.
2016-02-01
IEEE Transactions on Signal Processing (published)
We introduce a novel probabilistic group testing framework, termed Poisson group testing, in which the number of defectives follows a right-… (see more)truncated Poisson distribution. The Poisson model has a number of new applications, including dynamic testing with diminishing relative rates of defectives. We consider both nonadaptive and semi-adaptive identification methods. For nonadaptive methods, we derive a lower bound on the number of tests required to identify the defectives with a probability of error that asymptotically converges to zero; in addition, we propose test matrix constructions for which the number of tests closely matches the lower bound. For semiadaptive methods, we describe a lower bound on the expected number of tests required to identify the defectives with zero error probability. In addition, we propose a stage-wise reconstruction algorithm for which the expected number of tests is only a constant factor away from the lower bound. The methods rely only on an estimate of the average number of defectives, rather than on the individual probabilities of subjects being defective.
2015-08-15
IEEE Transactions on Signal Processing (published)
We introduce a novel probabilistic group testing framework, termed Poisson group testing, in which the number of defectives follows a right-… (see more)truncated Poisson distribution. The Poisson model has a number of new applications, including dynamic testing with diminishing relative rates of defectives. We consider both nonadaptive and semi-adaptive identification methods. For nonadaptive methods, we derive a lower bound on the number of tests required to identify the defectives with a probability of error that asymptotically converges to zero; in addition, we propose test matrix constructions for which the number of tests closely matches the lower bound. For semiadaptive methods, we describe a lower bound on the expected number of tests required to identify the defectives with zero error probability. In addition, we propose a stage-wise reconstruction algorithm for which the expected number of tests is only a constant factor away from the lower bound. The methods rely only on an estimate of the average number of defectives, rather than on the individual probabilities of subjects being defective.
Polar codes are the first error-correcting codes to provably achieve channel capacity, asymptotically in code length, with an explicit const… (see more)ruction. However, under successive-cancellation decoding, polar codes require very long code lengths to compete with existing modern codes. Nonetheless, the successive cancellation algorithm enables very-low-complexity implementations in hardware, due to the regular structure exhibited by polar codes. In this paper, we present an improved architecture for successive-cancellation decoding of polar codes, making use of a novel semi-parallel, encoder-based partial-sum computation module. We also provide quantization results for realistic code length N=215, and explore various optimization techniques such as a chained processing element and a variable quantization scheme. This design is shown to scale to code lengths of up to N=221, enabled by its low logic use, low register use and simple datapaths, limited almost exclusively by the amount of available SRAM. It also supports an overlapped loading of frames, allowing full-throughput decoding with a single set of input buffers.
2014-10-15
IEEE Transactions on Signal Processing (published)
In this paper, we present energy-efficient architectures for decoders of low-density parity check (LDPC) codes using the differential decodi… (see more)ng with binary message passing (DD-BMP) algorithm and its modified variant (MDD-BMP). We also propose an improved differential binary (IDB) decoding algorithm. These algorithms offer significant intrinsic advantages in the energy domain: simple computations, low interconnect complexity, and very high throughput, while achieving error correction performance up to within 0.25 dB of the offset min-sum algorithm. We report on fully parallel decoder implementations of (273, 191), (1023, 781), and (4095, 3367) finite geometry-based LDPC codes in 65 nm CMOS. Using the MDD-BMP algorithm, these decoders achieve respective areas of 0.28 mm2, 1.38 mm2, and 15.37 mm2, average throughputs of 37 Gbps, 75 Gbps, and 141 Gbps, and energy efficiencies of 4.9 pJ/bit, 13.2 pJ/bit, and 37.9 pJ/bit with a 1.0 V supply voltage in post-layout simulations. At a reduced supply voltage of 0.8 V, these decoders achieve respective throughputs of 26 Gbps, 54 Gbps, and 94 Gbps, and energy efficiencies of 3.1 pJ/bit, 8.2 pJ/bit, and 23.5 pJ/bit. We also report on a fully parallel implementation of IDB for the (2048, 1723) LDPC code specified in the IEEE 802.3an (10GBASE-T) standard. This decoder achieves an area of 1.44 mm2, average throughput of 172 Gbps, and an energy efficiency of 2.8 pJ/bit with a 1.0 V supply voltage; at 0.8 V, it achieves throughput of 116 Gbps and energy efficiency of 1.7 pJ/bit.
2014-02-01
IEEE Transactions on Signal Processing (published)
In this paper, we present energy-efficient architectures for decoders of low-density parity check (LDPC) codes using the differential decodi… (see more)ng with binary message passing (DD-BMP) algorithm and its modified variant (MDD-BMP). We also propose an improved differential binary (IDB) decoding algorithm. These algorithms offer significant intrinsic advantages in the energy domain: simple computations, low interconnect complexity, and very high throughput, while achieving error correction performance up to within 0.25 dB of the offset min-sum algorithm. We report on fully parallel decoder implementations of (273, 191), (1023, 781), and (4095, 3367) finite geometry-based LDPC codes in 65 nm CMOS. Using the MDD-BMP algorithm, these decoders achieve respective areas of 0.28 mm2, 1.38 mm2, and 15.37 mm2, average throughputs of 37 Gbps, 75 Gbps, and 141 Gbps, and energy efficiencies of 4.9 pJ/bit, 13.2 pJ/bit, and 37.9 pJ/bit with a 1.0 V supply voltage in post-layout simulations. At a reduced supply voltage of 0.8 V, these decoders achieve respective throughputs of 26 Gbps, 54 Gbps, and 94 Gbps, and energy efficiencies of 3.1 pJ/bit, 8.2 pJ/bit, and 23.5 pJ/bit. We also report on a fully parallel implementation of IDB for the (2048, 1723) LDPC code specified in the IEEE 802.3an (10GBASE-T) standard. This decoder achieves an area of 1.44 mm2, average throughput of 172 Gbps, and an energy efficiency of 2.8 pJ/bit with a 1.0 V supply voltage; at 0.8 V, it achieves throughput of 116 Gbps and energy efficiency of 1.7 pJ/bit.
2014-02-01
IEEE Transactions on Signal Processing (published)