{ Stuart James }

2025

Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-view Images

Matteo Toso, Stefano Fiorini, Stuart James, Alessio Del Bue

International Conference on 3D Vision (3DV'25) | Singapore

PDF Abstract

World-wide detailed 2D maps require enormous collective efforts. OpenStreetMap is the result of 11 million registered users manually annotating the GPS location of over 1.75 billion entries, including distinctive landmarks and common urban objects. At the same time, manual annotations can include errors and are slow to update, limiting the map's accuracy. Maps from Motion (MfM) is a step forward to automatize such time-consuming map making procedure by computing 2D maps of semantic objects directly from a collection of uncalibrated multi-view images. From each image, we extract a set of object detections, and estimate their spatial arrangement in a top-down local map centered in the reference frame of the camera that captured the image. Aligning these local maps is not a trivial problem, since they provide incomplete, noisy fragments of the scene, and matching detections across them is unreliable because of the presence of repeated pattern and the limited appearance variability of urban objects. We address this with a novel graph-based framework, that encodes the spatial and semantic distribution of the objects detected in each image, and learns how to combine them to predict the objects' poses in a global reference system, while taking into account all possible detection matches and preserving the topology observed in each image. Despite the complexity of the problem, our best model achieves global 2D registration with an average accuracy within 4 meters (i.e., below GPS accuracy) even on sparse sequences with strong viewpoint change, on which COLMAP has an 80% failure rate. We provide extensive evaluation on synthetic and real-world data, showing how the method obtains a solution even in scenarios where standard optimization techniques fail.

PaintBranch: Asynchronous Collaborative Art in Virtual Reality

Ana David, Daniele Giunchi, Stuart James, Anthony Steed, Augusto Esteves

IEEE VR Workshops (VRW'25) | Saint-Marlo, France

PDF Abstract

We describe the development of PaintBranch, a virtual reality prototype designed to support asynchronous collaborative art. By incorporating version control (VC), PaintBranch aims to promote creative idea generation and reduce conflicts during collaboration. In a user study, eight participants were organized into four pairs and worked asynchronously for a week, with each participant having four painting sessions. We analyzed the emerging collaboration patterns and uses. Results indicated that experienced artists used these features effectively to meet collaborative and personal goals.

2024

Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving

Theodore Tsesmelis, Luca Palmieri, Marina Khoroshiltseva, Adeela Islam, Gur Elkin, Ofir Itzhak Shahar, Gianluca Scarpellini, Stefano Fiorini, Yaniv Ohayon, Nadav Alali, Sinem Aslan, Pietro Morerio, Sebastiano Vascon, Elena Gravina, Maria Cristina Napolitano, Giuseppe Scarpati, Gabriel Zuchtriegel, Alexandra Spühler, Michel E. Fuchs, Stuart James, Ohad Ben-Shahar, Marcello Pelillo, Alessio Del Bue.

Conference on Neural Information Processing Systems (NeurIPS'24) Datasets and Benchmarks Track | Vancouver, Canada

PDF Site Abstract

This paper proposes the RePAIR dataset that represents a challenging benchmark to test modern computational and data driven methods for puzzle-solving and reassembly tasks. Our dataset has unique properties that are uncommon to current benchmarks for 2D and 3D puzzle solving. The fragments and fractures are realistic, caused by a collapse of a fresco during a World War II bombing at the Pompeii archaeological park. The fragments are also eroded and have missing pieces with irregular shapes and different dimensions, challenging further the reassembly algorithms. The dataset is multi-modal providing hi-res images with characteristic pictorial elements, detailed 3D scans of the fragments and meta-data annotated by the archaeologists. Ground truth has been generated through several years of unceasing fieldwork, including the excavation and cleaning of each fragment, followed by manual puzzle solving by archaeologists of a subset of 1,000 pieces among the 16,000 available. After digitizing all the fragments in 3D, a benchmark was prepared to challenge current reassembly and puzzle-solving methods that often solve more simplistic synthetic scenarios. The tested baselines show that there clearly exists a gap to fill in solving this computationally complex problem.

GANzzle++: Generative approaches for jigsaw puzzle solving as local to global assignment in latent spatial representations

Davide Talon, Alessio Del Bue, and Stuart James

Pattern Recognition Letters |

PDF Abstract

Jigsaw puzzles are a popular and enjoyable pastime that humans can easily solve, even with many pieces. However, solving a jigsaw is a combinatorial problem, and the space of possible solutions is exponential in the number of pieces, intractable for pairwise solutions. In contrast to the classical pairwise local matching of pieces based on edge heuristics, we estimate an approximate solution image, i.e., a mental image, of the puzzle and exploit it to guide the placement of pieces as a piece-to-global assignment problem. Therefore, from unordered pieces, we consider conditioned generation approaches, including Generative Adversarial Networks (GAN) models, Slot Attention (SA) and Vision Transformers (ViT), to recover the solution image. Given the generated solution representation, we cast the jigsaw solving as a 1-to-1 assignment matching problem using Hungarian attention, which places pieces in corresponding positions in the global solution estimate. Results show that the newly proposed GANzzle-SA and GANzzle-VIT benefit from the early fusion strategy where pieces are jointly compressed and gathered for global structure recovery. A single deep learning model generalizes to puzzles of different sizes and improves the performances by a large margin. Evaluated on PuzzleCelebA and PuzzleWikiArts, our approaches bridge the gap of deep learning strategies with respect to optimization-based classic puzzle solvers.

Positional diffusion: Graph-based diffusion models for set ordering

Francesco Giuliari, Gianluca Scarpellini, Stefano Fiorini, Stuart James, Pietro Morerio, Yiming Wang, Alessio Del Bue

Pattern Recognition Letters |

PDF Site Abstract

Positional reasoning is the process of ordering an unsorted set of parts into a consistent structure. To address this problem, we present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models. Using a diffusion process, we add Gaussian noise to the set elements’ position and map them to a random position in a continuous space. Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. To evaluate our method, we conduct extensive experiments on three different tasks and seven datasets, comparing our approach against the state-of-the-art methods for visual puzzle-solving, sentence ordering, and room arrangement, demonstrating that our method outperforms long-lasting research on puzzle solving with up to compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and room rearrangement. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks. We release our code at https://github.com/IIT-PAVIS/Positional_Diffusion

ArtAI4DS: AI Art and its Empowering Role in Digital Storytelling

Teresa Fernandes, Valentina Nisi, Nuno Nunes, and Stuart James

IFIP International Conference on Entertainment Computing (IFIP-ICEC'24) | Manaus/Amazonas, Brazil

PDF Site Abstract

In an era of global interconnections, storytelling is a compelling medium for fostering understanding, building connections, and facilitating cultural exchange. Throughout history, visual imagery has been used to enrich narratives. However, this has been a privilege for those with artistic skills. Artificial Intelligence, specifically Generative AI, has the potential to democratize the process, allowing individuals to bring their narratives to life visually, regardless of their artistic prowess. To address this challenge, we developed an AI-powered tool called ArtAI4DS (Art AI for Digital Storytelling), that employs generative images (i.e., from Stable Diffusion) created from story-derived keywords. ArtAI4DS emerged from a research process starting with a `Wizard of Oz' pre-workshop, which informed the structure of a subsequent co-design workshop. Here, participants' hand-drawn images were compared with AI-generated ones, providing insights into user preferences and tool efficacy. The ArtAI4DS then went through four iterative prototypes, drawing valuable insights from various participants. The tool’s refinement process balanced the intricate duality of human creativity and technological innovation, culminating in an artistic expression platform that transforms stories into vivid and captivating images. The final tool, evaluated through user interviews and AttrakDiff questionnaire, showcases its potential as an engaging platform for transforming narratives with solid user affirmation of its motivational and emotional resonance.

Interactive Digital Storytelling Navigating the Inherent Currents of the Diasporic Mind

Valentina Nisi, Paulo Bala, Miguel Pessoa, Stuart James, Nuno Nunes

International Conference on Interactive Digital Storytelling (ICIDS'24) | Manaus/Amazonas, Brazil

PDF Abstract

Due to a recent increase in conflicts, natural disasters, and economic crises, a growing wave of migrant populations has been searching for asylum in Europe. For this population of asylum seekers, the migration process, like currents and rapids, can be dangerous, uneven, and violent, and the integration into their host communities can add to the preexisting trauma. Extending on HCI increasing attention to the caring understanding of human life values, this paper presents initial research focused on refugees' storytelling activities to support their well-being. Here, we describe and discuss the results from a set of studies with the \FirstCPR{}\CPRFootnote{} to design and refine a bespoke interactive digital storytelling authoring tool. This study aims to promote social cohesion and equal participation in European society by using Digital Storytelling to allow migrant communities to share and connect their stories and experiences. The authors contribute with a novel digital storytelling prototype tool and the discussion and reflections stemming from the user-centered design approach. The insights gained from this work are relevant for interaction designers and researchers seeking to support vulnerable populations through Interactive Digital Storytelling.

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue

European Conference for Computer Vision (ECCV'24) | Milan, Italy

PDF Site Abstract

We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (\eg iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the object surface, we define a radiant Ellicell that uniformly generates rays departing from each ellipsoid that parameterize the 3DGS model. Each Ellicell ray is associated with the rendering parameters of each ellipsoid, which in turn is used to obtain the best bindings between the target image pixels and the cast rays. These pixel-ray bindings are then ranked to select the best scoring bundle of rays, which their intersection provides the camera center and, in turn, the camera rotation.

IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model

Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue

International Conference on Robotics and Automation (ICRA'24) | Yokohama, Japan

PDF Site Abstract

We introduce IFFNeRF to estimate the six degrees-of-freedom (6DoF) camera pose of a given image, building on the Neural Radiance Fields (NeRF) formulation. IFFNeRF is specifically designed to operate in real-time and eliminates the need for an initial pose guess that is proximate to the sought solution. IFFNeRF utilizes the Metropolis-Hasting algorithm to sample surface points from within the NeRF model. From these sampled points, we cast rays and deduce the color for each ray through pixel-level view synthesis. The camera pose can then be estimated as the solution to a Least Squares problem by selecting correspondences between the query image and the resulting bundle. We facilitate this process through a learned attention mechanism, bridging the query image embedding with the embedding of parameterized rays, thereby matching rays pertinent to the image. Through synthetic and real evaluation settings, we show that our method can improve the angular and translation error accuracy by 80.1% and 67.3%, respectively, compared to iNeRF while performing at 34fps on consumer hardware and not requiring the initial pose guess.

PRAGO: Differentiable multi-view pose optimization from objectness detections

Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

International Conference on 3D Vision (3DV'24) | Davos, Swirzerland

PDF Abstract

Robustly estimating camera poses from a set of images is a fundamental task which remains challenging for differentiable methods, especially in the case of small and sparse camera pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From a set of objectness detections on unordered images, our method reconstructs the rotational pose, and in turn, the absolute pose, in a differentiable manner benefiting from the optimization of a sequence of geometrical tasks. We show how our objectness pose-refinement module in PRAGO is able to refine the inherent ambiguities in pairwise relative pose estimation without removing edges and avoiding making early decisions on the viability of graph edges. PRAGO then refines the absolute rotations through iterative graph construction, reweighting the graph edges to compute the final rotational pose, which can be converted into absolute poses using translation averaging. We show that PRAGO is able to outperform non-differentiable solvers on small and sparse scenes extracted from 7-Scenes achieving a relative improvement of 21% for rotations while achieving similar translation estimates.

Towards the Reusability and Compositionality of Causal Representations

Davide Talon, Phillip Lippe, Stuart James, Alessio Del Bue, Sara Magliacane

Causal Representation Learning | New Orleans, USA

PDF Abstract

Causal Representation Learning (CRL) aims at identifying high-level causal factors and their relationships from high-dimensional observations, e.g., images. While most CRL works focus on learning causal representations in a single environment, in this work we instead propose a first step towards learning causal representations from temporal sequences of images that can be adapted in a new environment, or composed across multiple related environments. In particular, we introduce DECAF, a framework that detects which causal factors can be reused and which need to be adapted from previously learned causal representations. Our approach is based on the availability of intervention targets, that indicate which variables are perturbed at each time step. Experiments on three benchmark datasets show that integrating our framework with four state-of-the-art CRL approaches leads to accurate representations in a new environment with only a few samples.

2023

Inclusive Digital Storytelling: Artificial Intelligence and Augmented Reality to re-centre Stories from the Margins

Valentina Nisi, Stuart James, Paulo Bala, Alessio Del Bue, Nuno Jardim Nunes

International Conference on Interactive Digital Storytelling (ICIDS) | Kobe, Japan

PDF Abstract

As the concept of the Metaverse becomes a reality, storytelling tools sharpen their teeth to include Artificial Intelligence and Augmented Reality as prominent enabling features. While digitally savvy and privileged populations are well-positioned to use technology, marginalized groups risk being left behind and excluded from societal progress, deepening the digital divide. In this paper, we describe MEMEX, an interactive digital storytelling tool where Artificial Intelligence and Augmented Reality play enabling roles in support of the cultural integration of communities at risk of exclusion. The tool was developed in the context of 3 years EU-funded project, and in this paper, we focus on describing its final working prototype with its pilot study.

Connected to the people : Social Inclusion & Cohesion in Action through a Cultural Heritage Digital Tool

Valentina Nisi, Paulo Bala, Vanessa Cesário, Stuart James, Alessio Del Bue, and Nuno Jardim Nunes

ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW) | Minneapolis, USA

PDF

Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models

Francesco Giuliari, Gianluca Scarpellini, Stuart James, Yiming Wang, Alessio Del Bue

arXiv | preprint

PDF Site Abstract

Positional reasoning is the process of ordering unsorted parts contained in a set into a consistent structure. We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning. We use the forward process to map elements' positions in a set to random positions in a continuous space. Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. We conduct extensive experiments with benchmark datasets including two puzzle datasets, three sentence ordering datasets, and one visual storytelling dataset, demonstrating that our method outperforms long-lasting research on puzzle solving with up to +18% compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and visual storytelling. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks. Project website at https://iit-pavis.github.io/Positional_Diffusion/

You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset

Matteo Toso, Matteo Taiana, Stuart James, Alessio Del Bue

arXiv | preprint

PDF Site Abstract

We introduce Flatlandia, a novel problem for visual localization of an image from object detections composed of two specific tasks: i) Coarse Map Localization: localizing a single image observing a set of objects in respect to a 2D map of object landmarks; ii) Fine-grained 3DoF Localization: estimating latitude, longitude, and orientation of the image within a 2D map. Solutions for these new tasks exploit the wide availability of open urban maps annotated with GPS locations of common objects (\eg via surveying or crowd-sourced). Such maps are also more storage-friendly than standard large-scale 3D models often used in visual localization while additionally being privacy-preserving. As existing datasets are unsuited for the proposed problem, we provide the Flatlandia dataset, designed for 3DoF visual localization in multiple urban settings and based on crowd-sourced data from five European cities. We use the Flatlandia dataset to validate the complexity of the proposed tasks.

Locality-aware subgraphs for inductive link prediction in knowledge graphs

Hebatallah A. Mohamed, Diego Pilutti, Stuart James, Alessio Del Bue, Marcello Pelillo, Sebastiano Vascon

Pattern Recognition Letters (PR-L) | Journal

PDF Site Abstract

Recent methods of inductive reasoning on Knowledge Graphs (KGs) transform the link prediction problem into a graph classification task. They first extract a subgraph around each target link based on the -hop neighborhood of the target entities, encode the subgraphs using a Graph Neural Network (GNN), then learn a function that maps subgraph structural patterns to link existence. Although these methods have witnessed great successes, increasing often leads to an exponential expansion of the neighborhood, thereby degrading the GNN expressivity due to oversmoothing. In this paper, we formulate the subgraph extraction as a local clustering procedure that aims at sampling tightly-related subgraphs around the target links, based on a personalized PageRank (PPR) approach. Empirically, on three real-world KGs, we show that reasoning over subgraphs extracted by PPR-based local clustering can lead to a more accurate link prediction model than relying on neighbors within fixed hop distances. Furthermore, we investigate graph properties such as average clustering coefficient and node degree, and show that there is a relation between these and the performance of subgraph-based link prediction.

2022

Writing with (Digital) Scissors: Designing a Text Editing Tool for Assisted Storytelling using Crowd-Generated Content

Paulo Bala, Stuart James, Alessio Del Bue, Valentina Nisi

International Conference on Interactive Digital Storytelling (ICIDS 2022) | Santa Cruz, USA

PDF Abstract

Digital Storytelling can exploit numerous technologies and sources of information to support the creation, refinement and enhancement of a narrative. Research on text editing tools has created novel interactions that support authors in different stages of the creative process, such as the inclusion of crowd-generated content for writing. While these interactions have the potential to change workflows, integration of these in a way that is useful and matches users’ needs is unclear. In order to investigate the space of Assisted Storytelling, we designed and conducted a study to analyze how users write and edit a story about Cultural Heritage using an auxiliary source like Wikipedia. Through a diffractive analysis of stories, creative processes, and social and cultural contexts, we reflect and derive implications for design. These were applied to develop an AI-supported text editing tool using crowd-sourced content from Wikipedia and Wikidata.

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

European Conference on Computer Vision (ECCV 2022) | Tal Aviv, Israel

PDF Site Abstract Bibtex

The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62 ◦ with respect to the initial estimates obtained based on bounding boxes. Code and data are available at github.com/IIT-PAVIS/PoserNet.

@inproceedings{posernet_eccv2022,
Title = {PoserNet: Refining Relative Camera Poses Exploiting Object Detections},
Author = {Matteo Taiana and Matteo Toso and Stuart James and Alessio Del Bue},
booktitle = {Proceedings of the European Conference on Computer Vision ({ECCV})},
Year = {2022},
}

Geolocation of Cultural Heritage using Multi-View Knowledge Graph Embedding

Hebatallah A. Mohamed, Sebastiano Vascon, Feliks Hibraj, Stuart James, Diego Pilutti, Alessio Del Bue, Marcello Pelillo

International Workshop on Pattern Recognition for Cultural Heritage (PatReCH 2022) | Montréal Québec

PDF Abstract

Knowledge Graphs (KGs) have proven to be a reliable way of structuring data. They can provide a rich source of contextual information about cultural heritage collections. However, cultural heritage KGs are far from being complete. They are often missing important attributes such as geographical location, especially for sculptures and mobile or indoor entities such as paintings. In this paper, we first present a framework for ingesting knowledge about tangible cultural heritage entities from various data sources and their connected multi-hop knowledge into a geolocalized KG. Secondly, we propose a multi-view learning model for estimating the relative distance between a given pair of cultural heritage entities, based on the geographical as well as the knowledge connections of the entities.

GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a generative mental image

Davide Talon, Alessio Del Bue, Stuart James

IEEE International Conference on Image Processing (ICIP 2022) | Bordeaux, France`

PDF Site Abstract

Puzzle solving is a combinatorial challenge due to the difficulty of matching adjacent pieces. Instead, we infer a mental image from all pieces, which a given piece can then be matched against avoiding the combinatorial explosion. Exploiting advancements in Generative Adversarial methods, we learn how to reconstruct the image given a set of unordered pieces, allowing the model to learn a joint embedding space to match an encoding of each piece to the cropped layer of the generator. Therefore we frame the problem as a R@1 retrieval task, and then solve the linear assignment using differentiable Hungarian attention, making the process end-to-end. In doing so our model is puzzle size agnostic, in contrast to prior deep learning methods which are single size. We evaluate on two new large-scale datasets, where our model is on par with deep learning methods, while generalizing to multiple puzzle sizes.

Emerging Strategies in Asymmetric Sketch Interactions for Object Retrieval in Virtual Reality

Daniele Giunchi, Riccardo Bovo, Donald Degraen, Stuart James, and Anthony Steed

Interactive Media,Smart Systems and Emerging Technologiess (IMET 2022) | Cyprus

PDF

Multi-view 3D Objects Localization from Street-level Scenes

Javed Ahmad, Matteo Taiana, Matteo Toso, Stuart James, and Alessio Del Bue

International Conference on Image Analysis and Processing (ICIAP 2021) | Lecce, Italy

PDF Site Abstract

This paper presents a method to localize street-level objects in 3D from images of an urban area. Our method processes 3D sparse point clouds reconstructed from multi-view images and leverages 2D instance segmentation to find all objects within the scene and to generate for each object the corresponding cluster of 3D points and matched 2D detections. The proposed approach is robust to changes in image sizes, viewpoint changes, and changes in the object’s appearance across different views. We validate our approach on challenging street-level crowdsourced images from the Mapillary platform, showing a significant improvement in the mean average precision of object localization for the available Mapillary annotations. These results showcase our method’s effectiveness in localizing objects in 3D, which could potentially be used in applications such as high-definition map generation of urban environments.

2021

Perceived realism of pedestrian crowds trajectories in vr

Daniele Giunchi, Riccardo Bovo, Panayiotis Charalambous, Fotis Liarokapis, Alastair Shipman, Stuart James, Anthony Steed, Thomas Heinis

27th ACM Symposium on Virtual Reality Software and Technology | Virtual

PDF Abstract

Crowd simulation algorithms play an essential role in populating Virtual Reality (VR) environments with multiple autonomous humanoid agents. The generation of plausible trajectories can be a significant computational cost for real-time graphics engines, especially in untethered and mobile devices such as portable VR devices. Previous research explores the plausibility and realism of crowd simulations on desktop computers but fails to account the impact it has on immersion. This study explores how the realism of crowd trajectories affects the perceived immersion in VR. We do so by running a psychophysical experiment in which participants rate the realism of real/synthetic trajectories data, showing similar level of perceived realism.

2021

Square peg, round hole: A case study on using Visual Question & Answering in Games

Paulo Bala, Valentina Nisi, Mara Dionı́sio, Nuno Jardim Nunes, Stuart James

CHI Play - WIP Track | Virtual

PDF Abstract

The discussion about what can Artificial Intelligence (AI) contribute to games has been running for a long time, however, recent advances in AI show promise of providing new kind of experiences for players and new tools for game developers. In contrast with the traditional Finite State Machine for interaction and response, we consider the scenario of Visual Question & Answering (VQA) - the automatic answering of a textual question about an image. VQA is a tool that can enrich possible answers by combining both visual and textual information. It is also trivial to extrapolate to a game setting without going away from the training domain. In this Work In Progress, we present two original prototypes designed to explore the potential of VQA in games and discuss preliminary findings originated through a Wizard of Oz (WOz) pilot study using VQA to investigate how people interact with such an AI algorithm.

Amnesia in the Atlantic: an AI Driven Serious Game on Marine Biodiversity

Mara Dionı́sio, Valentina Nisi, Jin Xin, Paulo Bala, Stuart James, Nuno Jardim Nunes

International Federation for Information Processing – International Conference on Entertainment Computing (IFIP-ICEC) - Work In Progress (WIP) Track | Coimbra, Portugal

PDF Abstract

The use of Conversational Interfaces has evolved rapidly in numerous fields; in particular, they are an interesting tool for Serious Games to leverage on. Conversational Interfaces can assist Serious Games' goals, namely in presenting knowledge through dialogue. With the global acknowledgment of the joint crisis in nature and climate change, it is essential to raise awareness to the fact that many ecosystems are being destroyed and that the biodiversity of our planet is at risk. Therefore in this paper, we present Amnesia in the Atlantic, a Serious Game enhanced with a Conversational Interface embracing the challenge of critically engaging players with marine biodiversity issues.

Artificial Intelligence and Art History: A Necessary Debate?

Mathieu Aubry, Lisandra Costiner, Stuart James

Histoire de l'art | Debate

Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality

Daniele Giunchi, Alejandro Sztrajman, Stuart James, Anthony Steed

IMX'21 | New York

PDF Abstract Bibtex

Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the components (arms, legs, seat, back) are independently colored. To overcome this, we implement a multimodal interface for querying 3D model databases within a virtual environment. We base the sketch on the state-of-the-art for 3D Sketch Retrieval, and use a Wizard-of-Oz style experiment to process the voice input. In this way, we avoid the complexities of natural language processing which frequently requires fine-tuning to be robust. We conduct two user studies and show that hybrid search strategies emerge from the combination of interactions, fostering the advantages provided by both modalities.

@inbook{10.1145/3452918.3458806,
author = {Giunchi, Daniele and Sztrajman, Alejandro and James, Stuart and Steed, Anthony},
title = {Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality},
year = {2021},
isbn = {9781450383899},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3452918.3458806},
booktitle = {ACM International Conference on Interactive Media Experiences},
pages = {144–155},
numpages = {12}}

Consistent Mesh Colors for Multi-View Reconstructed 3D Scenes

Mohamed Dahy Elkhouly, Alessio Del Bue, Stuart James

arXiv | preprint

PDF Abstract

Specular highlights are commonplace in images, however, methods for detecting them and in turn removing the phenomenon are particularly challenging. A reason for this, is due to the difficulty of creating a dataset for training or evaluation, as in the real-world we lack the necessary control over the environment. Therefore, we propose a novel physically-based rendered LIGHT Specularity (LIGHTS) Dataset for the evaluation of the specular highlight detection task. Our dataset consists of 18 high quality architectural scenes, where each scene is rendered with multiple views. In total we have 2,603 views with an average of 145 views per scene. Additionally we propose a simple aggregation based method for specular highlight detection that outperforms prior work by 3.6% in two orders of magnitude less time on our dataset.

LIGHTS: LIGHT Specularity Dataset for specular detection in Multi-view

Mohamed Dahy Elkhouly, Theodore Tsesmelis, Alessio Del Bue, Stuart James

IEEE International Conference on Image Processing | Anchorage, Alaska

PDF Abstract Bibtex

We address the issue of creating consistent mesh texture maps captured from scenes without color calibration. We find that the method for aggregation of the multiple views is crucial for creating spatially consistent meshes without the need to explicitly optimize for spatial consistency. We compute a color prior from the cross-correlation of observable view faces and the faces per view to identify an optimal per-face color. We then use this color in a re-weighting ratio for the best-view texture, which is identified by prior mesh texturing work, to create a spatial consistent texture map. Despite our method not explicitly handling spatial consistency, our results show qualitatively more consistent results than other state-of-the-art techniques while being computationally more efficient. We evaluate on prior datasets and additionally Matterport3D showing qualitative improvements.

@inproceedings{ElkhoulyICIP21lights,
author={Elkhouly, Mohamed Dahy and Tsesmelis, Theodore and Bue, Alessio Del and James, Stuart},
booktitle={2021 IEEE International Conference on Image Processing (ICIP)},
title={Lights: Light Specularity Dataset For Specular Detection In Multi-View},
year={2021},
volume={},
number={},
pages={2908-2912},
doi={10.1109/ICIP42928.2021.9506354}}

2020

Machine Learning for Cultural Heritage: A Survey

Marco Fiorucci, Marina Khoroshiltseva, Massimilano Pontil, Ariana Traviglia, Alessio Del Bue and Stuart James

Pattern Recognition Letters (PR-L) | Elsevier

PDF Abstract Bibtex

The application of Machine Learning (ML) to Cultural Heritage (CH) has evolved since basic statistical approaches such as Linear Regression to complex Deep Learning models. The question remains how much of this actively improves on the underlying algorithm versus using it within a ‘black box’ setting. We survey across ML and CH literature to identify the theoretical changes which contribute to the algorithm and in turn them suitable for CH applications. Alternatively, and most commonly, when there are no changes, we review the CH applications, features and pre/post-processing which make the algorithm suitable for its use. We analyse the dominant divides within ML, Supervised, Semi-supervised and Unsupervised, and reflect on a variety of algorithms that have been extensively used. From such an analysis, we give a critical look at the use of ML in CH and consider why CH has only limited adoption of ML.

@article{FiorucciPRL20ml4ch, title = "Machine Learning for Cultural Heritage: A Survey",
journal = "Pattern Recognition Letters",
volume = "133",
pages = "102 - 108",
year = "2020",
issn = "0167-8655",
doi = "https://doi.org/10.1016/j.patrec.2020.02.017",
url = "http://www.sciencedirect.com/science/article/pii/S0167865520300532",
author = "Marco Fiorucci and Marina Khoroshiltseva and Massimiliano Pontil and Arianna Traviglia and Alessio [Del Bue] and Stuart James",
keywords = "Artificial Intelligence, Machine Learning, Cultural Heritage, Digital Humanities",
abstract = "The application of Machine Learning (ML) to Cultural Heritage (CH) has evolved since basic statistical approaches such as Linear Regression to complex Deep Learning models. The question remains how much of this actively improves on the underlying algorithm versus using it within a ‘black box’ setting. We survey across ML and CH literature to identify the theoretical changes which contribute to the algorithm and in turn them suitable for CH applications. Alternatively, and most commonly, when there are no changes, we review the CH applications, features and pre/post-processing which make the algorithm suitable for its use. We analyse the dominant divides within ML, Supervised, Semi-supervised and Unsupervised, and reflect on a variety of algorithms that have been extensively used. From such an analysis, we give a critical look at the use of ML in CH and consider why CH has only limited adoption of ML."}

2019

Mixing realities for sketch retrieval in Virtual Reality

Daniele Giunchi, Stuart James, Donald Degraen and Anthony Steed

VRCAI'19 | Brisbane, Austrailia

PDF Abstract Bibtex

Users within a Virtual Environment often need support designing the environment around them with the need to find relevant content while remaining immersed. We, therefore, focus on the familiar sketch-based interaction to support the process of content placing and specifically investigate how interactions from a tablet or desktop translate into the virtual environment. To understand sketching interaction within a virtual environment, we compare different methods of sketch interaction, i.e., 3D mid-air sketching, 2D sketching on a virtual tablet, 2D sketching on a fixed virtual whiteboard, and 2D sketching on a real tablet. The user remains immersed within the environment and queries a database containing detailed 3D models and replace them into the virtual environment. Our results show that 3D mid-air sketching is considered to be a more intuitive method to search a collection of models; while the addition of physical devices creates confusion due to the complications of their inclusion within a virtual environment. While we pose our work as a retrieval problem for 3D models of chairs, our results are extendable to other sketching tasks for virtual environments.

@inproceedings{GiunchiVRCAI19mixingReal,
author = {Giunchi, Daniele and James, Stuart and Degraen, Donald and Steed, Anthony},
title = {Mixing Realities for Sketch Retrieval in Virtual Reality},
year = {2019},
isbn = {9781450370028},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3359997.3365751},
doi = {10.1145/3359997.3365751},
booktitle = {The 17th International Conference on Virtual-Reality Continuum and Its Applications in Industry},
articleno = {Article 50},
numpages = {2},
keywords = {HCI, Sketch, CNN, Virtual Reality},
location = {Brisbane, QLD, Australia},
series = {VRCAI ’19}}

re-OBJ:Jointly learning the foreground and background for object instance re-identification

Viabhav Bansal, Stuart James and Alessio Del Bue

ICIAP'19 | Trento, Italy Best Student Paper Award

PDF Abstract Bibtex

Conventional approaches to object instance re-identification rely on matching appearances of the target objects among a set of frames. However, learning appearances of the objects alone might fail when there are multiple objects with similar appearance or multiple instances of same object class present in the scene. This paper proposes that partial observations of the background can be utilized to aid in the object re-identification task for a rigid scene, especially a rigid environment with a lot of reoccurring identical models of objects. Using an extension to the Mask R-CNN architecture, we learn to encode the important and distinct information in the background jointly with the foreground relevant to rigid real-world scenarios such as an indoor environment where objects are static and the camera moves around the scene. We demonstrate the effectiveness of our joint visual feature in the re-identification of objects in the ScanNet dataset and show a relative improvement of around 28.25% in the rank-1 accuracy over the deepSort method.

@inproceedings{BansalICIAP19reobj,
author = {Vaibhav Bansal and Stuart James and Alessio {Del Bue}},
editor = {Elisa Ricci and Samuel Rota Bul{\`{o}} and Cees Snoek and Oswald Lanz and Stefano Messelodi and Nicu Sebe},
title = {re-OBJ: Jointly Learning the Foreground and Background for Object Instance Re-identification},
booktitle = {Image Analysis and Processing - {ICIAP} 2019 - 20th International Conference,
Trento, Italy, September 9-13, 2019, Proceedings, Part{II}},
series = {Lecture Notes in Computer Science},
volume = {11752}, pages = {402--413},
publisher = {Springer},
year = {2019},
url = {https://doi.org/10.1007/978-3-030-30645-8\_37},
doi = {10.1007/978-3-030-30645-8\_37}}

Augmenting datasets for Visual Question and Answering for complex spatial reasoning

Stuart James and Alessio Del Bue

CVPR Workshop on VQA | California, USA

Autonomous 3D reconstruction, mapping and exploration of indoor environments with a robotic arm

Yiming Wang$, Stuart James,Elisavet Konstantina Stathopoulou, Carlos Beltran-Gonzalez, Yoshinori Konishi and Alessio Del Bue

IEEE Robotics and Automation Letters | Macau

PDF Abstract Bibtex

We propose a novel information gain metric that combines hand-crafted and data-driven metrics to address the next best view problem for autonomous 3D mapping of unknown indoor environments. For the hand-crafted metric, we propose an entropy-based information gain that accounts for the previous view points to avoid the camera to revisit the same location and to promote the motion toward unexplored or occluded areas. Whereas for the learnt metric, we adopt a Convolutional Neural Network (CNN) architecture and formulate the problem as a classification problem. The CNN takes as input the current depth image and outputs the motion direction that suggests the largest unexplored surface. We train and test the CNN using a new synthetic dataset based on the SUNCG dataset. The learnt motion direction is then combined with the proposed hand-crafted metric to help handle situations where using only the hand-crafted metric tends to face ambiguities. We finally evaluate the autonomous paths over several real and synthetic indoor scenes including complex industrial and domestic settings and prove that our combined metric is able to further improve the exploration coverage compared to using only the proposed hand-crafted metric.

@ARTICLE{WangRAL19explore, author={Y. {Wang} and S. {James} and E. K. {Stathopoulou} and C. {Beltrán-González} and Y. {Konishi} and A. {Del Bue}}, journal={IEEE Robotics and Automation Letters}, title={Autonomous 3-D Reconstruction, Mapping, and Exploration of Indoor Environments With a Robotic Arm}, year={2019}, volume={4}, number={4}, pages={3340-3347},}

2018

Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning

Paul Gay, Stuart James, Alessio Del Bue

ACCV'18 | Perth, Australia

PDF Site Abstract Bibtex

Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motion. Indeed, in such cases, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We report results on a dataset we created for the task of 3D scene graph generation in multiple views.

@InProceedings{GayACCV19vgfm,
author="Gay, Paul and Stuart, James and Del Bue, Alessio",
editor="Jawahar, C. V.and Li, Hongdong and Mori, Greg and Schindler, Konrad",
title="Visual Graphs from Motion (VGfM): Scene Understanding with Object Geometry Reasoning",
booktitle="Computer Vision -- ACCV 2018",
year="2019",
publisher="Springer International Publishing",address="Cham",
pages="330--346",
abstract="Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motions. Indeed, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We report results on a dataset we created for the task of 3D scene graph generation in multiple views.",
isbn="978-3-030-20893-6"}

Multi-view Aggregation for Color Naming with Shadow Detection and Removal

Mohamed Dahy Elkhouly, Stuart James, Alessio Del Bue

IPAS'18 | Nice, France Best Paper Award

PDF Abstract Bibtex

This paper presents a set of methods for classifying the color attribute of objects when multiple images of the same objects are available. This problem is more complex than the single image estimation since varying environmental effects, such as, shadows or specularities from light sources, can result in poor accuracy. These depend primarily on the camera positions and the material type of the objects. Single image techniques focus on improving the discrimination of between colors, whereas in multi-view systems additional information is available but should be utilized wisely. To this end, we propose three methods to aggregate image pixel information in multi-view that boost the performance of color name classification. Moreover, we study the effect of shadows by employing automatic shadow detection and correction techniques on the color naming problem. We tested our proposals on a new multi-view color names dataset (M3DCN) which contain indoor and outdoor objects. The experimental evaluation shows that one out of the three presented aggregation methods is very efficient and it achieves the highest accuracy in term of classification results. Also, we experimentally show that addressing visual outliers like shadow in multi-view images improves the performance of the color attribute decision process.

@INPROCEEDINGS{ElkhoulyIPAS18mvcolor, author={M. D. {Elkhouly} and S. {James} and A. {Del Bue}}, booktitle={2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS)}, title={Multi-view Aggregation for Color Naming with Shadow Detection and Removal}, year={2018}, volume={}, number={}, pages={115-120},}

3D Sketching for Interactive Model Retrieval in Virtual Reality

Daniele Giunchi, Stuart James, Anthony Steed

Expressive | Victoria, British Columbia, Canada

PDF Abstract Bibtex

Users within a Virtual Environment often need support designing the environment around them with the need to find relevant content while remaining immersed. We, therefore, focus on the familiar sketch-based interaction to support the process of content placing and specifically investigate how interactions from a tablet or desktop translate into the virtual environment. To understand sketching interaction within a virtual environment, we compare different methods of sketch interaction, i.e., 3D mid-air sketching, 2D sketching on a virtual tablet, 2D sketching on a fixed virtual whiteboard, and 2D sketching on a real tablet. The user remains immersed within the environment and queries a database containing detailed 3D models and replace them into the virtual environment. Our results show that 3D mid-air sketching is considered to be a more intuitive method to search a collection of models; while the addition of physical devices creates confusion due to the complications of their inclusion within a virtual environment. While we pose our work as a retrieval problem for 3D models of chairs, our results are extendable to other sketching tasks for virtual environments.

@inproceedings{10.1145/3229147.3229166,
author = {Giunchi, Daniele and James, Stuart and Steed, Anthony},
title = {3D Sketching for Interactive Model Retrieval in Virtual Reality},
year = {2018},
isbn = {9781450358927},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3229147.3229166},
doi = {10.1145/3229147.3229166},
booktitle = {Proceedings of the Joint Symposium on Computational Aesthetics and Sketch-Based Interfaces and Modeling and Non-Photorealistic Animation and Rendering},
articleno = {Article 1},
numpages = {12},
keywords = {HCI, sketch, virtual reality, CNN},
location = {Victoria, British Columbia, Canada},
series = {Expressive ’18}}

Model Retrieval by 3D Sketching in Immersive Virtual Reality

Daniele Giunchi, Stuart James, Anthony Steed

IEEE VR Poster | Reutlingen, Germany

PDF Abstract

We describe a novel method for searching 3D model collections using free-form sketches within a virtual environment as queries. As opposed to traditional Sketch Retrieval, our queries are drawn directly onto an example model. Using immersive virtual reality the user can express their query through a sketch that demonstrates the desired structure, color and texture. Unlike previous sketch-based retrieval methods, users remain immersed within the environment without relying on textual queries or 2D projections which can disconnect the user from the environment. We show how a convolutional neural network (CNN) can create multi-view representations of colored 3D sketches. Using such a descriptor representation, our system is able to rapidly retrieve models and in this way, we provide the user with an interactive method of navigating large object datasets. Through a preliminary user study we demonstrate that by using our VR 3D model retrieval system, users can perform quick and intuitive search. Using our system users can rapidly populate a virtual environment with specific models from a very large database,and thus the technique has the potential to be broadly applicable in immersive editing systems.

2017

Texture Stationarization: Turning Photos into Tileable Textures

Joep Moritz, Stuart James, Tom S.F. Haines, Tobias Ritschel, Tim Weyrich

Computer Graphics Forum (Proc. Eurographics) | Lyon, France

PDF Site Abstract

Texture synthesis has grown into a mature field in computer graphics, allowing the synthesis of naturalistic textures and images from photographic exemplars. Surprisingly little work, however, has been dedicated to synthesizing tileable textures, that is, textures that when laid out in a regular grid of tiles form a homogeneous appearance suitable for use in memory-sensitive real-time graphics applications. One of the key challenges in doing so is that most natural input exemplars exhibit uneven spatial variations that, when tiled, show as repetitive patterns. We propose an approach to synthesize tileable textures while enforcing stationarity properties that effectively mask repetitions while maintaining the unique characteristics of the exemplar. We explore a number of alternative measures for texture stationarity and show how each measure can be integrated into a standard texture synthesis method (PatchMatch) to enforce stationarity at user-controlled scales. We demonstrate the efficacy of our approach using a database of 118 exemplar images, both from publicly available sources as well as new ones captured under uncontrolled conditions, and we quantitatively analyze alternative stationarity measures for their robustness across many test runs using different random seeds. In conclusion, we suggest a novel synthesis approach that employs local histogram matching to reliably turn input photographs of natural surfaces into tiles well suited for artifact-free tiling.

Digital Photographic Practices as Expressions of Personhood and Identity: Variations Across School Leavers and Recent Retirees

K Orzech, W Moncur, A Durrant, S James, J Collomosse

Journal of Visual Studies |

2016

Evolutionary Data Purification for Social Media Classification

Stuart James, John Collomosse

International Conference on Pattern Recognition (ICPR'16) | Cancun, Mexico

PDF

Towards Sketched Visual Narratives for Retrieval

Stuart James

SketchX - Human Sketch Analysis and its Applications | London, UK

PDF

2015

Visual Narratives: Free-hand Sketch for Visual Search and Navigation of Video

Stuart James

PhD Thesis | University of Surrey, Guildford, UK

PDF Abstract

Humans have an innate ability to communicate visually; the earliest forms of communication were cave drawings, and children can communicate visual descriptions of scenes through drawings well before they can write. Drawings and sketches offer an intuitive and efficient means for communicating visual concepts. Today, society faces a deluge of digital visual content driven by a surge in the generation of video on social media and the online availability of video archives. Mobile devices are emerging as the dominant platform for consuming this content, with Cisco predicting that by 2018 over 80% of mobile traffic will be video. Sketch offers a familiar and expressive modality for interacting with video on the touch-screens commonly present on such devices. This thesis contributes several new algorithms for searching and manipulating video using free-hand sketches. We propose the Visual Narrative (VN); a storyboarded sequence of one or more actions in the form of sketch that collectively describe an event. We show that VNs can be used to both efficiently search video repositories, and to synthesise video clips. First, we describe a sketch based video retrieval (SBVR) system that fuses multiple modalities (shape, colour, semantics, and motion) in order to find relevant video clips. An efficient multi-modal video descriptor is proposed enabling the search of hundreds of videos in milliseconds. This contrasts with prior SBVR that lacks an efficient index representation, and take minutes or hours to search similar datasets. This contribution not only makes SBVR practical at interactive speeds, but also enables user-refinement of results through relevance feedback to resolve sketch ambiguity, including the relative priority of the different VN modalities. Second, we present the first algorithm for sketch based pose retrieval. A pictographic representation (stick-men) is used to specify a desired human pose within the VN, and similar poses found within a video dataset. We use archival dance performance footage from the UK National Resource Centre for Dance (UK-NRCD), containing diverse examples of human pose. We investigate appropriate descriptors for sketch and video, and propose a novel manifold learning technique for mapping between the two descriptor spaces and so performing sketched pose retrieval. We show that domain adaptation can be applied to boost the performance of this system through a novel piece-wise feature-space warping technique. Third, we present a graph representation for VNs comprising multiple actions. We focus on the extension of our pose retrieval system to a sequence of poses interspersed with actions (e.g. jump, twirl). We show that our graph representation can be used for multiple applications: 1) to retrieve sequences of video comprising multiple actions; 2) to navigate in pictorial form, the retrieved video sequences; 3) to synthesise new video sequences by retrieving and concatenating video fragments from archival footage.

2014

Enhanced Digital Literacy by Multi-modal Data Mining of the Digital Lifespan

John Collomosse, Stuart James, Abigail Durrant, Diego Trujillo-Pisanty, Wendy Moncur, Kathryn Orzech, Sarah Martindale, Mike Chantler.

DE2015 | London, UK

PDF

Interactive Video Asset Retrieval using Sketched Queries

Stuart James and John Collomosse

CVMP'14 | London

PDF Site

Particle Filtering approach to salient video object localization

C Gray, S James, J Collomosse and P Asente

ICIP'14 | Switzerland

PDF

ReEnact Sketch based Choreographic Design from Archival Dance Footage

S James, M Fonseca and J Collomosse

ACM International Conference on Multimedia Retrieval (ICMR'14) | Glasgow, UK

PDF

Admixed Portrait Design Intervention to Prompt Reflection on Being Online as a New Parent

D Trujillo-Pisanty, A Durrant, S Martindale, S James, J Collomosse

ACM DIS'14 |

PDF

2013

Markov Random Fields for Sketch based Video Retrieval

R Hu, S James, T Wang and J Collomosse

ACM International Conference on Multimedia Retrieval (ICMR'13) |

PDF

2012

Skeletons from Sketches of Dancing Poses

M Fonseca, S James and J Collomosse

IEEE VL/HCC'12 |

PDF

Annotated Free-hand Sketches for Video Retrieval using Object Semantics and Motion

R Hu, S James and J Collomosse

Springer ACM MultiMedia Modelling (MMM'12) |

PDF

2011

Annotated Sketches for Intuitive Video Retrieval

Stuart James and John Collomosse

BMVA / AVA Workshop on Biological and Machine Vision. Perception Journal | Cardiff, UK

PDF

2011

Sketched Visual Narratives for Content Based Video Retrieval

Stuart James

MPhil Transfer Report | University of Surrey, UK

Publications

Maps from Motion (MfM): Generating 2D Semantic Maps from Sparse Multi-view Images

PaintBranch: Asynchronous Collaborative Art in Virtual Reality

Re-assembling the past: The RePAIR dataset and benchmark for real world 2D and 3D puzzle solving

GANzzle++: Generative approaches for jigsaw puzzle solving as local to global assignment in latent spatial representations

Positional diffusion: Graph-based diffusion models for set ordering

ArtAI4DS: AI Art and its Empowering Role in Digital Storytelling

Interactive Digital Storytelling Navigating the Inherent Currents of the Diasporic Mind

6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model

PRAGO: Differentiable multi-view pose optimization from objectness detections

Towards the Reusability and Compositionality of Causal Representations

Inclusive Digital Storytelling: Artificial Intelligence and Augmented Reality to re-centre Stories from the Margins

Connected to the people : Social Inclusion & Cohesion in Action through a Cultural Heritage Digital Tool

Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models

You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset

Locality-aware subgraphs for inductive link prediction in knowledge graphs

Writing with (Digital) Scissors: Designing a Text Editing Tool for Assisted Storytelling using Crowd-Generated Content

PoserNet: Refining Relative Camera Poses Exploiting Object Detections

Geolocation of Cultural Heritage using Multi-View Knowledge Graph Embedding

GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a generative mental image

Emerging Strategies in Asymmetric Sketch Interactions for Object Retrieval in Virtual Reality

Multi-view 3D Objects Localization from Street-level Scenes

Perceived realism of pedestrian crowds trajectories in vr

Square peg, round hole: A case study on using Visual Question & Answering in Games

Amnesia in the Atlantic: an AI Driven Serious Game on Marine Biodiversity

Artificial Intelligence and Art History: A Necessary Debate?

Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality

Consistent Mesh Colors for Multi-View Reconstructed 3D Scenes

LIGHTS: LIGHT Specularity Dataset for specular detection in Multi-view

Machine Learning for Cultural Heritage: A Survey

Mixing realities for sketch retrieval in Virtual Reality

re-OBJ:Jointly learning the foreground and background for object instance re-identification

Augmenting datasets for Visual Question and Answering for complex spatial reasoning

Autonomous 3D reconstruction, mapping and exploration of indoor environments with a robotic arm

Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning

Multi-view Aggregation for Color Naming with Shadow Detection and Removal

3D Sketching for Interactive Model Retrieval in Virtual Reality

Model Retrieval by 3D Sketching in Immersive Virtual Reality

Texture Stationarization: Turning Photos into Tileable Textures

Digital Photographic Practices as Expressions of Personhood and Identity: Variations Across School Leavers and Recent Retirees

Evolutionary Data Purification for Social Media Classification

Towards Sketched Visual Narratives for Retrieval

Visual Narratives: Free-hand Sketch for Visual Search and Navigation of Video

Enhanced Digital Literacy by Multi-modal Data Mining of the Digital Lifespan

Interactive Video Asset Retrieval using Sketched Queries

Particle Filtering approach to salient video object localization

ReEnact Sketch based Choreographic Design from Archival Dance Footage

Admixed Portrait Design Intervention to Prompt Reflection on Being Online as a New Parent

Markov Random Fields for Sketch based Video Retrieval

Skeletons from Sketches of Dancing Poses

Annotated Free-hand Sketches for Video Retrieval using Object Semantics and Motion

Annotated Sketches for Intuitive Video Retrieval

Sketched Visual Narratives for Content Based Video Retrieval