We use cookies to analyze the browsing and usage of our website and to personalize your experience. You can disable these technologies at any time, but this may limit certain functionalities of the site. Read our Privacy Policy for more information.
Setting cookies
You can enable and disable the types of cookies you wish to accept. However certain choices you make could affect the services offered on our sites (e.g. suggestions, personalised ads, etc.).
Essential cookies
These cookies are necessary for the operation of the site and cannot be deactivated. (Still active)
Analytics cookies
Do you accept the use of cookies to measure the audience of our sites?
Multimedia Player
Do you accept the use of cookies to display and allow you to watch the video content hosted by our partners (YouTube, etc.)?
Publications
Inter- and intra-year forest change detection and monitoring of aboveground biomass dynamics using Sentinel-2 and Landsat
Generative Flow Networks (GFlowNets, GFNs) are a generative framework for learning unnormalized probability mass functions over discrete spa… (see more)ces. Since their inception, GFlowNets have proven to be useful for learning generative models in applications where the majority of the discrete space is unvisited during training. This has inspired some to hypothesize that GFlowNets, when paired with deep neural networks (DNNs), have favourable generalization properties. In this work, we empirically verify some of the hypothesized mechanisms of generalization of GFlowNets. In particular, we find that the functions that GFlowNets learn to approximate have an implicit underlying structure which facilitate generalization. We also find that GFlowNets are sensitive to being trained offline and off-policy; however, the reward implicitly learned by GFlowNets is robust to changes in the training distribution.
In the era of proliferation of large language and image generation models, the phenomenon of "model collapse" refers to the situation whereb… (see more)y as a model is trained recursively on data generated from previous generations of itself over time, its performance degrades until the model eventually becomes completely useless, i.e the model collapses. In this work, we study this phenomenon in the setting of high-dimensional regression and obtain analytic formulae which quantitatively outline this phenomenon in a broad range of regimes. In the special case of polynomial decaying spectral and source conditions, we obtain modified scaling laws which exhibit new crossover phenomena from fast to slow rates. We also propose a simple strategy based on adaptive regularization to mitigate model collapse. Our theoretical results are validated with experiments.
A large number of tutorials for popular software development technologies are available online, and those about the same technology vary wid… (see more)ely in their presentation. We studied the design of tutorials in the software documentation landscape for five popular programming languages: Java, C#, Python, Javascript, and Typescript. We investigated the extent to which tutorial pages, i.e. resources, differ and report statistics of variations in resource properties. We developed a framework for characterizing resources based on their distinguishing attributes, i.e. properties that vary widely for the resource, relative to other resources. Additionally, we propose that a resource can be represented by its resource style, i.e. the combination of its distinguishing attributes. We discuss three techniques for characterizing resources based on our framework, to capture notable and relevant content and presentation properties of tutorial pages. We apply these techniques on a data set of 2551 resources to validate that our framework identifies valid and interpretable styles. We contribute this framework for reasoning about the design of resources in the online software documentation landscape.
2024-02-01
IEEE Transactions on Software Engineering (published)
A large number of tutorials for popular software development technologies are available online, and those about the same technology vary wid… (see more)ely in their presentation. We studied the design of tutorials in the software documentation landscape for five popular programming languages: Java, C#, Python, Javascript, and Typescript. We investigated the extent to which tutorial pages, i.e. resources, differ and report statistics of variations in resource properties. We developed a framework for characterizing resources based on their distinguishing attributes, i.e. properties that vary widely for the resource, relative to other resources. Additionally, we propose that a resource can be represented by its resource style, i.e. the combination of its distinguishing attributes. We discuss three techniques for characterizing resources based on our framework, to capture notable and relevant content and presentation properties of tutorial pages. We apply these techniques on a data set of 2551 resources to validate that our framework identifies valid and interpretable styles. We contribute this framework for reasoning about the design of resources in the online software documentation landscape.
2024-02-01
IEEE Transactions on Software Engineering (published)
Filters are commonly used to enhance specific structures and patterns in images, such as vessels or peritumoral regions, to enable clinical … (see more)insights beyond the visible image using radiomics. However, their lack of standardization restricts reproducibility and clinical translation of radiomics decision support tools. In this special report, teams of researchers who developed radiomics software participated in a three-phase study (September 2020 to December 2022) to establish a standardized set of filters. The first two phases focused on finding reference filtered images and reference feature values for commonly used convolutional filters: mean, Laplacian of Gaussian, Laws and Gabor kernels, separable and nonseparable wavelets (including decomposed forms), and Riesz transformations. In the first phase, 15 teams used digital phantoms to establish 33 reference filtered images of 36 filter configurations. In phase 2, 11 teams used a chest CT image to derive reference values for 323 of 396 features computed from filtered images using 22 filter and image processing configurations. Reference filtered images and feature values for Riesz transformations were not established. Reproducibility of standardized convolutional filters was validated on a public data set of multimodal imaging (CT, fluorodeoxyglucose PET, and T1-weighted MRI) in 51 patients with soft-tissue sarcoma. At validation, reproducibility of 486 features computed from filtered images using nine configurations × three imaging modalities was assessed using the lower bounds of 95% CIs of intraclass correlation coefficients. Out of 486 features, 458 were found to be reproducible across nine teams with lower bounds of 95% CIs of intraclass correlation coefficients greater than 0.75. In conclusion, eight filter types were standardized with reference filtered images and reference feature values for verifying and calibrating radiomics software packages. A web-based tool is available for compliance checking.