Feb 28, 2023
Iryna Gurevych, TU Darmstadt
Digital texts are cheap to produce, fast to update, easy to interlink, and there are a lot of them. The ability to aggregate and critically assess information from connected, evolving texts is at the core of most intellectual work – from education to business and policy-making. Yet, humans are not very good at handling large amounts of text. And while modern language models do a good job at finding documents, extracting information from them and generating natural-sounding language, the progress in helping humans read, connect, and make sense of interrelated texts has been very much limited.Funded by the European Research Council, the InterText project brings natural language processing (NLP) forward by developing a general framework for modelling and analysing fine-grained relationships between texts – intertextual relationships. This crucial milestone for AI would allow tracing the origin and evolution of texts and ideas and enable a new generation of AI applications for text work and critical reading. Using scientific peer review as a prototypical model of collaborative knowledge construction anchored in text, this talk will present the foundations of our intertextual approach to NLP, from data modelling and representation learning to task design, practical applications and intricacies of data collection. We will discuss the limitations of the state of the art, report on our latest findings and outline the open challenges on the path towards general-purpose AI for fine-grained cross-document analysis of texts.
Jan 19 2022
Emre Uğur, Boğaziçi University
Symbolic planning and reasoning are powerful tools for robots tackling complex tasks. However, the need to manually design the symbols restrict their applicability, especially for robots that are expected to act in open-ended environments. Therefore symbol formation and rule extraction should be considered part of robot learning, which, when done properly, will offer scalability, flexibility, and robustness. Towards this goal, we propose a novel general method that finds action-grounded, discrete object and effect categories and builds probabilistic rules over them for non-trivial action planning.
Jan 10, 2022
Ali Afşin Bülbül, META
The path for a successful applied machine learning project is full of potholes. An ML practitioner will need to fall into these potholes and eventually gain their own experience, derive their own list of lessons-learnt. In this talk, I’ll share my experience in case it might help others avoid some of those pitfalls. Most failed ML projects fail because they attempt to solve a non-important, non-solvable or already solved problem. Taking the time, before writing the first line of code, and working with the stakeholders side-by-side to clarify the business problem, contratints, scope, roles and responsibilities is critical for the success of the project. I’ll talk about how a typical ML project team is structured and works together to deliver impact. During the execution of the project, there are certain best practices that will help the ML practitioner to avoid technical debts. In this talk, I try to define different types and sources of technical debts and some practical tips that could help avoiding or at least minimizing them.
Jan 3, 2022
Başak Tosun & Zafer Batık, Wikimedia Türkiye
As internet users, we all are using or being exposed to the content of Wikimedia projects in our daily lives. The content in Wikimedia projects is also useful as a dataset in advancing artificial intelligence research and application. In this talk we will be presenting Wikipedia and its sister projects from an editor-perspective, introduce the global movement behind those projects and give short information about the Lexicographical data project of Wikidata.
Dec 27, 2022
Serdar Özsoy & Shadi Hamdan, KUIS AI Center, Koç University
Self-supervised learning allows AI systems to learn effective representations from large amounts of data using tasks that do not require costly labeling. Mode collapse, i.e., the model producing identical representations for all inputs, is a central problem to many self-supervised learning approaches, making self-supervised tasks, such as matching distorted variants of the inputs, ineffective. In this article, we argue that a straightforward application of information maximization among alternative latent representations of the same input naturally solves the collapse problem and achieves competitive empirical results. We propose a self-supervised learning method, CorInfoMax, that uses a second-order statistics-based mutual information measure that reflects the level of correlation among its arguments. Maximizing this correlative information measure between alternative representations of the same input serves two purposes: (1) it avoids the collapse problem by generating feature vectors with non-degenerate covariances; (2) it establishes relevance among alternative representations by increasing the linear dependence among them. An approximation of the proposed information maximization objective simplifies to a Euclidean distance-based objective function regularized by the log- determinant of the feature covariance matrix. The regularization term acts as a natural barrier against feature space degeneracy. Consequently, beyond avoiding complete output collapse to a single point, the proposed approach also prevents dimensional collapse by encouraging the spread of information across the whole feature space. Numerical experiments demonstrate that CorInfoMax achieves better or competitive performance results relative to the state-of-the-art SSL approaches.
Dec 22, 2022
İnanç Birol, University of British Columbia (UBC)
The silent pandemic due to superbugs – pathogens resistant to multiple antimicrobial drugs – kills 1.5 million people every year. Threat from superbugs will only grow if the current practice of wide antibiotics use continues, and if we do not develop new alternatives to replace the ineffective drugs on the market. To fight this trend, drug development efforts are increasingly focusing on members of a certain biomolecule family called antimicrobial peptides (AMPs). These biomolecules have evolved together with the bacteria in their environment, and are known not to induce resistance to the same extent the conventional antibiotics do.
AMPs are employed by all classes of life, and their sequences are encoded in the species’ genomes. There is a rich repertoire of genomics data waiting to be mined to discover AMPs. In this presentation, I will describe the sequencing, bioinformatics, and testing technologies required to discover and validate AMPs in high throughput. Special emphasis will be on de novo sequence assembly methods and machine learning models for sequence annotation.
Dec 13, 2022
Desmond Elliott, University of Copenhagen
Language models are defined over a finite set of inputs, which creates a bottleneck if we attempt to scale the number of languages supported by a model. Tackling this bottleneck usually results in a trade-off between what can be represented in the embedding matrix and computational issues in the output layer. I will present PIXEL, the Pixel-based Encoder of Language, which suffers from neither of these issues. PIXEL is a pretrained language model that renders text as images, making it possible to transfer representations across languages based on orthographic similarity or the co-activation of pixels. PIXEL is trained on predominantly English data in the Wikipedia and Bookcorpus datasets to reconstruct the pixels of masked patches instead of predicting a probability distribution over tokens. I will present the results of an 86M parameter model on downstream syntactic and semantic tasks in 32 typologically diverse languages across 14 scripts. PIXEL substantially outperforms BERT when the script is not seen in the pretraining data but it lags behind BERT when working with Latin scripts. I will finish by showing that PIXEL is robust to noisy text inputs, further confirming the benefits of modelling language with pixels.
Dec 6, 2022
Jacob Chakareski, New Jersey Institute for Technology
The talk reflects the recent paradigm shift in wireless networks research from the traditional objective of enabling ever higher transmission rates at the physical layer to enabling for the network system higher resilience to attacks, higher robustness to system components’ failures, closer vertical integration with key emerging applications and their quality of experience needs, and intelligent self-coordination. The talk will comprise three stories of related recent research (the number three is good ). I will first talk about multi-connectivity enabled NextG wireless multi-user VR systems. Then, I will outline our advances in domain-aware fast RL for IoT systems. Third, I will talk about enabling real-time human AR streaming in NextG classrooms featuring real and virtual participants. The presentation of each of these studies will comprise a brief outline of the overall NSF project in which they are embedded. Next, I will highlight an interdisciplinary NIH R01 study I lead at the nexus of VR and AI aimed at addressing the societal need of low-vision rehabilitation. Finally, I will leave the floor open for questions and discussions.
Dec 1, 2022
Ekin Akyürek, Massachusetts Institute of Technology (MIT)
Neural sequence models, especially transformers, exhibit a remarkable capacity for in-context learning. They can construct new predictors from sequences of labeled examples (x, f(x)) presented in the input without further parameter updates. We investigate the hypothesis that transformer-based in-context learners implement standard learning algorithms implicitly, by encoding smaller models in their activations, and updating these implicit models as new examples appear in the context. Using linear regression as a prototypical problem, we offer three sources of evidence for this hypothesis. First, we prove by construction that transformers can implement learning algorithms for linear models based on gradient descent and closed-form ridge regression. Second, we show that trained in-context learners closely match the predictors computed by gradient descent, ridge regression, and exact least-squares regression, transitioning between different predictors as transformer depth and dataset noise vary, and converging to Bayesian estimators for large widths and depths. Third, we present preliminary evidence that in-context learners share algorithmic features with these predictors: learners’ late layers non-linearly encode weight vectors and moment matrices. These results suggest that in-context learning is understandable in algorithmic terms, and that (at least in the linear case) learners may rediscover standard estimation algorithms. Code and reference implementations released at this http link.
Nov 24, 2022
Utku Günay Acer, Nokia Bell Labs
This talk presents SensiX, a multi-tenant runtime for adaptive model execution with integrated MLOps on edge devices, e.g., a camera, a microphone, or IoT sensors. Through its highly modular componentisation to externalise data operations with clear abstractions and document-centric manifestation for system-wide orchestration, SensiX can serve multiple models efficiently with fine-grained control on edge devices while minimising data operation redundancy, managing data and device heterogeneity, reducing resource contention and removing manual MLOps.
A particular deployment of SensiX is an urban conversational agent. Lingo is a hyper-local conversational agent embedded deeply into the urban infrastructure that provides rich, purposeful, detailed, and in some cases, playful information relevant to a neighbourhood. Lingo provides hyper-local responses to user queries. The responses are computed by SensiX to act as an information source. These queries are served through a covert communication mechanism over Wi-Fi management frames to enable privacy-preserving proxemic interactions.
Nov 8, 2022
Chi-Chun Lee (Jeremy) ,National Tsing Hua University
Speech technology has proliferated into our life, and speech emotion recognition (SER) modules add humaine aspect to the wide-spread use of speech based services. Deep learning techniques play a key role in realizing SER for into-life application. In this talk, we will talk briefly about three main components of using deep models for SER: robustness, generalization and usability, and share several of our recent developments in each of the three main components.
Nov 1, 2022
Mehmet Esat Belviranli, Colorado School of Mines
Computing systems are becoming more complex by integrating specialized processing units, i.e., accelerators, that are optimized to perform a specific type of operation. This demand is fueled by the need to run distinct workloads in mobile and autonomous platforms. Such systems often embed diversely heterogeneous System-on-Chips(SoC) where an operation can be executed by more than a single type of accelerator with varying performance, energy, and latency characteristics. A hybrid (i.e., multi-accelerator) execution of popular workloads, such as neural network (NN) inference, collaboratively and concurrently on different types of accelerators in a diversely heterogeneous SoC is a relatively new and unexplored scheme. Multi-accelerator execution has the potential to provide unique benefits for computing systems with limited resources. In this talk, we investigate a framework that enables resource-constraint aware multi-accelerator execution for diversely heterogeneous SoCs. We achieve this by distributing the layers of a NN inference across different accelerators so that the trade-off between performance and energy satisfies system constraints. We further explore improving total throughput by concurrently using different types of accelerators for executing NNs in parallel. Our proposed methodology uniquely considers inter-accelerator transition costs, shared-memory contention and accelerator architectures that embed internal hardware pipelines. We employ empirical performance models and constraint-based optimization problems to determine optimal multi-accelerator execution schedules.
Oct 25, 2022
Erol Şahin, METU-ROMER (Center for Robotics and AI)
Industrial robots, a shining example of the success of robotics in the manufacturing domain, are developed as manipulators without any support for human-robot interaction (HRI). However, a new generation of manipulators, called Collaborative robots (Cobots), designed with embedded safety features, are being deployed to operate alongside humans. These advances are pushing HRI research, most of which is being conducted on “toy robots that do not do much work,” towards deployment on Cobots. In our two TUBITAK projects, called CIRAK and KALFA, we study how Cobots can be imbued with HRI capabilities in a collaborative assembly task. Based on the observation that manipulation skills of Cobots being (and will remain in the near future) inferior to the skill of workers, we envision Cobots positioning themselves as unskilled coworkers (hence the name CIRAK and KALFA) in which they hand in proper tools and parts to the worker. Within this talk, I will summarize our work towards imbuing HRI skills on Cobots through the use of some animation principles through behaviors such as “breathing” and “gazing”, as well as automatic assembly learning. Finally, I will briefly share the developments about METU-ROMER.
Oct 18, 2022
Mehmet Doğar, University of Leeds
I will give an overview of our work on robotic object manipulation. First, I will talk about physics-based planning. This refers to robot motion planners that use predictions about the motion of contacted objects. We have particularly been interested in developing such planners for cluttered scenes, where multiple objects might simultaneously move as a result of robot contact. Second, I will talk about a more conventional grasping-based problem, where a robot must manipulate an object for the application of external forceful operations on it. Imagine a robot holding and moving a wooden board for a human, while the human drills holes into the board and cuts parts of it. I will describe our efforts in developing a planner that addresses the geometric, force stability, and human-comfort constraints for such a system.
Oct 04, 2022
Zeyu Wang, Hong Kong University of Science and Technology
Despite advances in computer-aided design (CAD) systems and video editing software, digital content creation for design, storytelling, and interactive experiences remains a challenging problem. This talk introduces a series of studies, techniques, and systems along three thrusts that engage creators more directly and enhance the user experience in authoring digital content. First, we present a drawing dataset and spatiotemporal analysis that provide insight into how people draw by comparing tracing, freehand drawing, and computer-generated approximations. We found a high degree of similarity in stroke placement and types of strokes used over time, which informs methods for customized stroke treatment and emulating drawing processes. We also propose a deep learning-based technique for line drawing synthesis from animated 3D models, where our learned style space and optimization-based embedding enable the generation of line drawing animations while allowing interactive user control across frames. Second, we demonstrate the importance of utilizing spatial context in the creative process in augmented reality (AR) through two tablet-based interfaces. DistanciAR enables designers to create site-specific AR experiences for remote environments using LiDAR capture and new authoring modes, such as Dollhouse and Peek. PointShopAR integrates point cloud capture and editing in a single AR workflow to help users quickly prototype design ideas in their spatial context. Our user studies show that LiDAR capture and the point cloud representation in these systems can make rapid AR prototyping more accessible and versatile. Last, we introduce two procedural methods to generate time-based media for visual communication and storytelling. AniCode supports authoring and on-the-fly consumption of personalized animations in a network-free environment via a printed code. CHER-Ob generates video flythroughs for storytelling from annotated heterogeneous 2D and 3D data for cultural heritage. Our user studies show that these methods can benefit the video-oriented digital prototyping experience and facilitate the dissemination of creative and cultural ideas.
In recent years model sizes have increased substantially, and so did the cost for training them. This is problematic for two reasons: 1) it excludes organizations that do not have thousands of GPUs at hand for training such models, and 2) it becomes apparent that the hardware will not able to scale along with the growth of the models. Both can be alleviated by improving the efficiency of NLP models. This talk will first provide an overview of where efficiency may be improved within a typical NLP pipeline. We will then have a closer look at methods that improve data efficiency. Finally, we will discuss how we can quantify efficiency using different kinds of metrics.