Publications

  • 2024

    ArtAI4DS: AI Art and its Empowering Role in Digital Storytelling

    Teresa Fernandes, Valentina Nisi, Nuno Nunes, and Stuart James

    IFIP International Conference on Entertainment Computing (IFIP-ICEC 2024) | Manaus/Amazonas, Brazil

    In an era of global interconnections, storytelling is a compelling medium for fostering understanding, building connections, and facilitating cultural exchange. Throughout history, visual imagery has been used to enrich narratives. However, this has been a privilege for those with artistic skills. Artificial Intelligence, specifically Generative AI, has the potential to democratize the process, allowing individuals to bring their narratives to life visually, regardless of their artistic prowess. To address this challenge, we developed an AI-powered tool called ArtAI4DS (Art AI for Digital Storytelling), that employs generative images (i.e., from Stable Diffusion) created from story-derived keywords. ArtAI4DS emerged from a research process starting with a `Wizard of Oz' pre-workshop, which informed the structure of a subsequent co-design workshop. Here, participants' hand-drawn images were compared with AI-generated ones, providing insights into user preferences and tool efficacy. The ArtAI4DS then went through four iterative prototypes, drawing valuable insights from various participants. The tool’s refinement process balanced the intricate duality of human creativity and technological innovation, culminating in an artistic expression platform that transforms stories into vivid and captivating images. The final tool, evaluated through user interviews and AttrakDiff questionnaire, showcases its potential as an engaging platform for transforming narratives with solid user affirmation of its motivational and emotional resonance.
  • 6DGS: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model

    Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue

    European Conference for Computer Vision (ECCV) | Milan, Italy

    We propose 6DGS to estimate the camera pose of a target RGB image given a 3D Gaussian Splatting (3DGS) model representing the scene. 6DGS avoids the iterative process typical of analysis-by-synthesis methods (\eg iNeRF) that also require an initialization of the camera pose in order to converge. Instead, our method estimates a 6DoF pose by inverting the 3DGS rendering process. Starting from the object surface, we define a radiant Ellicell that uniformly generates rays departing from each ellipsoid that parameterize the 3DGS model. Each Ellicell ray is associated with the rendering parameters of each ellipsoid, which in turn is used to obtain the best bindings between the target image pixels and the cast rays. These pixel-ray bindings are then ranked to select the best scoring bundle of rays, which their intersection provides the camera center and, in turn, the camera rotation.
  • IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model

    Matteo Bortolon, Theodore Tsesmelis, Stuart James, Fabio Poiesi, Alessio Del Bue

    International Conference on Robotics and Automation (ICRA) | Yokohama, Japan

    We introduce IFFNeRF to estimate the six degrees-of-freedom (6DoF) camera pose of a given image, building on the Neural Radiance Fields (NeRF) formulation. IFFNeRF is specifically designed to operate in real-time and eliminates the need for an initial pose guess that is proximate to the sought solution. IFFNeRF utilizes the Metropolis-Hasting algorithm to sample surface points from within the NeRF model. From these sampled points, we cast rays and deduce the color for each ray through pixel-level view synthesis. The camera pose can then be estimated as the solution to a Least Squares problem by selecting correspondences between the query image and the resulting bundle. We facilitate this process through a learned attention mechanism, bridging the query image embedding with the embedding of parameterized rays, thereby matching rays pertinent to the image. Through synthetic and real evaluation settings, we show that our method can improve the angular and translation error accuracy by 80.1% and 67.3%, respectively, compared to iNeRF while performing at 34fps on consumer hardware and not requiring the initial pose guess.
  • PRAGO: Differentiable multi-view pose optimization from objectness detections

    Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

    International Conference on 3D Vision (3DV) | Davos, Swirzerland

    Robustly estimating camera poses from a set of images is a fundamental task which remains challenging for differentiable methods, especially in the case of small and sparse camera pose graphs. To overcome this challenge, we propose Pose-refined Rotation Averaging Graph Optimization (PRAGO). From a set of objectness detections on unordered images, our method reconstructs the rotational pose, and in turn, the absolute pose, in a differentiable manner benefiting from the optimization of a sequence of geometrical tasks. We show how our objectness pose-refinement module in PRAGO is able to refine the inherent ambiguities in pairwise relative pose estimation without removing edges and avoiding making early decisions on the viability of graph edges. PRAGO then refines the absolute rotations through iterative graph construction, reweighting the graph edges to compute the final rotational pose, which can be converted into absolute poses using translation averaging. We show that PRAGO is able to outperform non-differentiable solvers on small and sparse scenes extracted from 7-Scenes achieving a relative improvement of 21% for rotations while achieving similar translation estimates.
  • 2023

    Inclusive Digital Storytelling: Artificial Intelligence and Augmented Reality to re-centre Stories from the Margins

    Valentina Nisi, Stuart James, Paulo Bala, Alessio Del Bue, Nuno Jardim Nunes

    International Conference on Interactive Digital Storytelling (ICIDS) | Kobe, Japan

    As the concept of the Metaverse becomes a reality, storytelling tools sharpen their teeth to include Artificial Intelligence and Augmented Reality as prominent enabling features. While digitally savvy and privileged populations are well-positioned to use technology, marginalized groups risk being left behind and excluded from societal progress, deepening the digital divide. In this paper, we describe MEMEX, an interactive digital storytelling tool where Artificial Intelligence and Augmented Reality play enabling roles in support of the cultural integration of communities at risk of exclusion. The tool was developed in the context of 3 years EU-funded project, and in this paper, we focus on describing its final working prototype with its pilot study.
  • Connected to the people : Social Inclusion & Cohesion in Action through a Cultural Heritage Digital Tool

    Valentina Nisi, Paulo Bala, Vanessa Cesário, Stuart James, Alessio Del Bue, and Nuno Jardim Nunes

    ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW) | Minneapolis, USA

  • Positional Diffusion: Ordering Unordered Sets with Diffusion Probabilistic Models

    Francesco Giuliari, Gianluca Scarpellini, Stuart James, Yiming Wang, Alessio Del Bue

    arXiv | preprint

    Positional reasoning is the process of ordering unsorted parts contained in a set into a consistent structure. We present Positional Diffusion, a plug-and-play graph formulation with Diffusion Probabilistic Models to address positional reasoning. We use the forward process to map elements' positions in a set to random positions in a continuous space. Positional Diffusion learns to reverse the noising process and recover the original positions through an Attention-based Graph Neural Network. We conduct extensive experiments with benchmark datasets including two puzzle datasets, three sentence ordering datasets, and one visual storytelling dataset, demonstrating that our method outperforms long-lasting research on puzzle solving with up to +18% compared to the second-best deep learning method, and performs on par against the state-of-the-art methods on sentence ordering and visual storytelling. Our work highlights the suitability of diffusion models for ordering problems and proposes a novel formulation and method for solving various ordering tasks. Project website at https://iit-pavis.github.io/Positional_Diffusion/
  • You are here! Finding position and orientation on a 2D map from a single image: The Flatlandia localization problem and dataset

    Matteo Toso, Matteo Taiana, Stuart James, Alessio Del Bue

    arXiv | preprint

    We introduce Flatlandia, a novel problem for visual localization of an image from object detections composed of two specific tasks: i) Coarse Map Localization: localizing a single image observing a set of objects in respect to a 2D map of object landmarks; ii) Fine-grained 3DoF Localization: estimating latitude, longitude, and orientation of the image within a 2D map. Solutions for these new tasks exploit the wide availability of open urban maps annotated with GPS locations of common objects (\eg via surveying or crowd-sourced). Such maps are also more storage-friendly than standard large-scale 3D models often used in visual localization while additionally being privacy-preserving. As existing datasets are unsuited for the proposed problem, we provide the Flatlandia dataset, designed for 3DoF visual localization in multiple urban settings and based on crowd-sourced data from five European cities. We use the Flatlandia dataset to validate the complexity of the proposed tasks.
  • Locality-aware subgraphs for inductive link prediction in knowledge graphs

    Hebatallah A. Mohamed, Diego Pilutti, Stuart James, Alessio Del Bue, Marcello Pelillo, Sebastiano Vascon

    Pattern Recognition Letters (PR-L) | Journal

    Recent methods of inductive reasoning on Knowledge Graphs (KGs) transform the link prediction problem into a graph classification task. They first extract a subgraph around each target link based on the -hop neighborhood of the target entities, encode the subgraphs using a Graph Neural Network (GNN), then learn a function that maps subgraph structural patterns to link existence. Although these methods have witnessed great successes, increasing often leads to an exponential expansion of the neighborhood, thereby degrading the GNN expressivity due to oversmoothing. In this paper, we formulate the subgraph extraction as a local clustering procedure that aims at sampling tightly-related subgraphs around the target links, based on a personalized PageRank (PPR) approach. Empirically, on three real-world KGs, we show that reasoning over subgraphs extracted by PPR-based local clustering can lead to a more accurate link prediction model than relying on neighbors within fixed hop distances. Furthermore, we investigate graph properties such as average clustering coefficient and node degree, and show that there is a relation between these and the performance of subgraph-based link prediction.
  • 2022

    Writing with (Digital) Scissors: Designing a Text Editing Tool for Assisted Storytelling using Crowd-Generated Content

    Paulo Bala, Stuart James, Alessio Del Bue, Valentina Nisi

    International Conference on Interactive Digital Storytelling (ICIDS 2022) | Santa Cruz, USA

    Digital Storytelling can exploit numerous technologies and sources of information to support the creation, refinement and enhancement of a narrative. Research on text editing tools has created novel interactions that support authors in different stages of the creative process, such as the inclusion of crowd-generated content for writing. While these interactions have the potential to change workflows, integration of these in a way that is useful and matches users’ needs is unclear. In order to investigate the space of Assisted Storytelling, we designed and conducted a study to analyze how users write and edit a story about Cultural Heritage using an auxiliary source like Wikipedia. Through a diffractive analysis of stories, creative processes, and social and cultural contexts, we reflect and derive implications for design. These were applied to develop an AI-supported text editing tool using crowd-sourced content from Wikipedia and Wikidata.
  • PoserNet: Refining Relative Camera Poses Exploiting Object Detections

    Matteo Taiana, Matteo Toso, Stuart James, Alessio Del Bue

    European Conference on Computer Vision (ECCV 2022) | Tal Aviv, Israel

    The estimation of the camera poses associated with a set of images commonly relies on feature matches between the images. In contrast, we are the first to address this challenge by using objectness regions to guide the pose estimation problem rather than explicit semantic object detections. We propose Pose Refiner Network (PoserNet) a light-weight Graph Neural Network to refine the approximate pair-wise relative camera poses. PoserNet exploits associations between the objectness regions - concisely expressed as bounding boxes - across multiple views to globally refine sparsely connected view graphs. We evaluate on the 7-Scenes dataset across varied sizes of graphs and show how this process can be beneficial to optimisation-based Motion Averaging algorithms improving the median error on the rotation by 62 ◦ with respect to the initial estimates obtained based on bounding boxes. Code and data are available at github.com/IIT-PAVIS/PoserNet.
    @inproceedings{posernet_eccv2022,
    Title = {PoserNet: Refining Relative Camera Poses Exploiting Object Detections},
    Author = {Matteo Taiana and Matteo Toso and Stuart James and Alessio Del Bue},
    booktitle = {Proceedings of the European Conference on Computer Vision ({ECCV})},
    Year = {2022},
    }
  • Geolocation of Cultural Heritage using Multi-View Knowledge Graph Embedding

    Hebatallah A. Mohamed, Sebastiano Vascon, Feliks Hibraj, Stuart James, Diego Pilutti, Alessio Del Bue, Marcello Pelillo

    International Workshop on Pattern Recognition for Cultural Heritage (PatReCH 2022) | Montréal Québec

    Knowledge Graphs (KGs) have proven to be a reliable way of structuring data. They can provide a rich source of contextual information about cultural heritage collections. However, cultural heritage KGs are far from being complete. They are often missing important attributes such as geographical location, especially for sculptures and mobile or indoor entities such as paintings. In this paper, we first present a framework for ingesting knowledge about tangible cultural heritage entities from various data sources and their connected multi-hop knowledge into a geolocalized KG. Secondly, we propose a multi-view learning model for estimating the relative distance between a given pair of cultural heritage entities, based on the geographical as well as the knowledge connections of the entities.
  • GANzzle: Reframing jigsaw puzzle solving as a retrieval task using a generative mental image

    Davide Talon, Alessio Del Bue, Stuart James

    IEEE International Conference on Image Processing (ICIP 2022) | Bordeaux, France`

    Puzzle solving is a combinatorial challenge due to the difficulty of matching adjacent pieces. Instead, we infer a mental image from all pieces, which a given piece can then be matched against avoiding the combinatorial explosion. Exploiting advancements in Generative Adversarial methods, we learn how to reconstruct the image given a set of unordered pieces, allowing the model to learn a joint embedding space to match an encoding of each piece to the cropped layer of the generator. Therefore we frame the problem as a R@1 retrieval task, and then solve the linear assignment using differentiable Hungarian attention, making the process end-to-end. In doing so our model is puzzle size agnostic, in contrast to prior deep learning methods which are single size. We evaluate on two new large-scale datasets, where our model is on par with deep learning methods, while generalizing to multiple puzzle sizes.
  • Emerging Strategies in Asymmetric Sketch Interactions for Object Retrieval in Virtual Reality

    Daniele Giunchi, Riccardo Bovo, Donald Degraen, Stuart James, and Anthony Steed

    Interactive Media,Smart Systems and Emerging Technologiess (IMET 2022) | Cyprus

  • Multi-view 3D Objects Localization from Street-level Scenes

    Javed Ahmad, Matteo Taiana, Matteo Toso, Stuart James, and Alessio Del Bue

    International Conference on Image Analysis and Processing (ICIAP 2021) | Lecce, Italy

    This paper presents a method to localize street-level objects in 3D from images of an urban area. Our method processes 3D sparse point clouds reconstructed from multi-view images and leverages 2D instance segmentation to find all objects within the scene and to generate for each object the corresponding cluster of 3D points and matched 2D detections. The proposed approach is robust to changes in image sizes, viewpoint changes, and changes in the object’s appearance across different views. We validate our approach on challenging street-level crowdsourced images from the Mapillary platform, showing a significant improvement in the mean average precision of object localization for the available Mapillary annotations. These results showcase our method’s effectiveness in localizing objects in 3D, which could potentially be used in applications such as high-definition map generation of urban environments.
  • 2021

    Square peg, round hole: A case study on using Visual Question & Answering in Games

    Paulo Bala, Valentina Nisi, Mara Dionı́sio, Nuno Jardim Nunes, Stuart James

    CHI Play - WIP Track | Virtual

    The discussion about what can Artificial Intelligence (AI) contribute to games has been running for a long time, however, recent advances in AI show promise of providing new kind of experiences for players and new tools for game developers. In contrast with the traditional Finite State Machine for interaction and response, we consider the scenario of Visual Question & Answering (VQA) - the automatic answering of a textual question about an image. VQA is a tool that can enrich possible answers by combining both visual and textual information. It is also trivial to extrapolate to a game setting without going away from the training domain. In this Work In Progress, we present two original prototypes designed to explore the potential of VQA in games and discuss preliminary findings originated through a Wizard of Oz (WOz) pilot study using VQA to investigate how people interact with such an AI algorithm.
  • Amnesia in the Atlantic: an AI Driven Serious Game on Marine Biodiversity

    Mara Dionı́sio, Valentina Nisi, Jin Xin, Paulo Bala, Stuart James, Nuno Jardim Nunes

    International Federation for Information Processing – International Conference on Entertainment Computing (IFIP-ICEC) - Work In Progress (WIP) Track | Coimbra, Portugal

    The use of Conversational Interfaces has evolved rapidly in numerous fields; in particular, they are an interesting tool for Serious Games to leverage on. Conversational Interfaces can assist Serious Games' goals, namely in presenting knowledge through dialogue. With the global acknowledgment of the joint crisis in nature and climate change, it is essential to raise awareness to the fact that many ecosystems are being destroyed and that the biodiversity of our planet is at risk. Therefore in this paper, we present Amnesia in the Atlantic, a Serious Game enhanced with a Conversational Interface embracing the challenge of critically engaging players with marine biodiversity issues.
  • Artificial Intelligence and Art History: A Necessary Debate?

    Mathieu Aubry, Lisandra Costiner, Stuart James

    Histoire de l'art | Debate

  • Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality

    Daniele Giunchi, Alejandro Sztrajman, Stuart James, Anthony Steed

    IMX'21 | New York

    Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the components (arms, legs, seat, back) are independently colored. To overcome this, we implement a multimodal interface for querying 3D model databases within a virtual environment. We base the sketch on the state-of-the-art for 3D Sketch Retrieval, and use a Wizard-of-Oz style experiment to process the voice input. In this way, we avoid the complexities of natural language processing which frequently requires fine-tuning to be robust. We conduct two user studies and show that hybrid search strategies emerge from the combination of interactions, fostering the advantages provided by both modalities.
    @inbook{10.1145/3452918.3458806,
    author = {Giunchi, Daniele and Sztrajman, Alejandro and James, Stuart and Steed, Anthony},
    title = {Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality},
    year = {2021},
    isbn = {9781450383899},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3452918.3458806},
    booktitle = {ACM International Conference on Interactive Media Experiences},
    pages = {144–155},
    numpages = {12}}
  • Consistent Mesh Colors for Multi-View Reconstructed 3D Scenes

    Mohamed Dahy Elkhouly, Alessio Del Bue, Stuart James

    arXiv | preprint

    Specular highlights are commonplace in images, however, methods for detecting them and in turn removing the phenomenon are particularly challenging. A reason for this, is due to the difficulty of creating a dataset for training or evaluation, as in the real-world we lack the necessary control over the environment. Therefore, we propose a novel physically-based rendered LIGHT Specularity (LIGHTS) Dataset for the evaluation of the specular highlight detection task. Our dataset consists of 18 high quality architectural scenes, where each scene is rendered with multiple views. In total we have 2,603 views with an average of 145 views per scene. Additionally we propose a simple aggregation based method for specular highlight detection that outperforms prior work by 3.6% in two orders of magnitude less time on our dataset.
  • LIGHTS: LIGHT Specularity Dataset for specular detection in Multi-view

    Mohamed Dahy Elkhouly, Theodore Tsesmelis, Alessio Del Bue, Stuart James

    IEEE International Conference on Image Processing | Anchorage, Alaska

    We address the issue of creating consistent mesh texture maps captured from scenes without color calibration. We find that the method for aggregation of the multiple views is crucial for creating spatially consistent meshes without the need to explicitly optimize for spatial consistency. We compute a color prior from the cross-correlation of observable view faces and the faces per view to identify an optimal per-face color. We then use this color in a re-weighting ratio for the best-view texture, which is identified by prior mesh texturing work, to create a spatial consistent texture map. Despite our method not explicitly handling spatial consistency, our results show qualitatively more consistent results than other state-of-the-art techniques while being computationally more efficient. We evaluate on prior datasets and additionally Matterport3D showing qualitative improvements.
    @inproceedings{ElkhoulyICIP21lights,
    author={Elkhouly, Mohamed Dahy and Tsesmelis, Theodore and Bue, Alessio Del and James, Stuart},
    booktitle={2021 IEEE International Conference on Image Processing (ICIP)},
    title={Lights: Light Specularity Dataset For Specular Detection In Multi-View},
    year={2021},
    volume={},
    number={},
    pages={2908-2912},
    doi={10.1109/ICIP42928.2021.9506354}}
  • 2020

    Machine Learning for Cultural Heritage: A Survey

    Marco Fiorucci, Marina Khoroshiltseva, Massimilano Pontil, Ariana Traviglia, Alessio Del Bue and Stuart James

    Pattern Recognition Letters (PR-L) | Elsevier

    The application of Machine Learning (ML) to Cultural Heritage (CH) has evolved since basic statistical approaches such as Linear Regression to complex Deep Learning models. The question remains how much of this actively improves on the underlying algorithm versus using it within a ‘black box’ setting. We survey across ML and CH literature to identify the theoretical changes which contribute to the algorithm and in turn them suitable for CH applications. Alternatively, and most commonly, when there are no changes, we review the CH applications, features and pre/post-processing which make the algorithm suitable for its use. We analyse the dominant divides within ML, Supervised, Semi-supervised and Unsupervised, and reflect on a variety of algorithms that have been extensively used. From such an analysis, we give a critical look at the use of ML in CH and consider why CH has only limited adoption of ML.
    @article{FiorucciPRL20ml4ch, title = "Machine Learning for Cultural Heritage: A Survey",
    journal = "Pattern Recognition Letters",
    volume = "133",
    pages = "102 - 108",
    year = "2020",
    issn = "0167-8655",
    doi = "https://doi.org/10.1016/j.patrec.2020.02.017",
    url = "http://www.sciencedirect.com/science/article/pii/S0167865520300532",
    author = "Marco Fiorucci and Marina Khoroshiltseva and Massimiliano Pontil and Arianna Traviglia and Alessio [Del Bue] and Stuart James",
    keywords = "Artificial Intelligence, Machine Learning, Cultural Heritage, Digital Humanities",
    abstract = "The application of Machine Learning (ML) to Cultural Heritage (CH) has evolved since basic statistical approaches such as Linear Regression to complex Deep Learning models. The question remains how much of this actively improves on the underlying algorithm versus using it within a ‘black box’ setting. We survey across ML and CH literature to identify the theoretical changes which contribute to the algorithm and in turn them suitable for CH applications. Alternatively, and most commonly, when there are no changes, we review the CH applications, features and pre/post-processing which make the algorithm suitable for its use. We analyse the dominant divides within ML, Supervised, Semi-supervised and Unsupervised, and reflect on a variety of algorithms that have been extensively used. From such an analysis, we give a critical look at the use of ML in CH and consider why CH has only limited adoption of ML."}
  • 2019

    Mixing realities for sketch retrieval in Virtual Reality

    Daniele Giunchi, Stuart James, Donald Degraen and Anthony Steed

    VRCAI'19 | Brisbane, Austrailia

    Users within a Virtual Environment often need support designing the environment around them with the need to find relevant content while remaining immersed. We, therefore, focus on the familiar sketch-based interaction to support the process of content placing and specifically investigate how interactions from a tablet or desktop translate into the virtual environment. To understand sketching interaction within a virtual environment, we compare different methods of sketch interaction, i.e., 3D mid-air sketching, 2D sketching on a virtual tablet, 2D sketching on a fixed virtual whiteboard, and 2D sketching on a real tablet. The user remains immersed within the environment and queries a database containing detailed 3D models and replace them into the virtual environment. Our results show that 3D mid-air sketching is considered to be a more intuitive method to search a collection of models; while the addition of physical devices creates confusion due to the complications of their inclusion within a virtual environment. While we pose our work as a retrieval problem for 3D models of chairs, our results are extendable to other sketching tasks for virtual environments.
    @inproceedings{GiunchiVRCAI19mixingReal,
    author = {Giunchi, Daniele and James, Stuart and Degraen, Donald and Steed, Anthony},
    title = {Mixing Realities for Sketch Retrieval in Virtual Reality},
    year = {2019},
    isbn = {9781450370028},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3359997.3365751},
    doi = {10.1145/3359997.3365751},
    booktitle = {The 17th International Conference on Virtual-Reality Continuum and Its Applications in Industry},
    articleno = {Article 50},
    numpages = {2},
    keywords = {HCI, Sketch, CNN, Virtual Reality},
    location = {Brisbane, QLD, Australia},
    series = {VRCAI ’19}}
  • re-OBJ:Jointly learning the foreground and background for object instance re-identification

    Viabhav Bansal, Stuart James and Alessio Del Bue

    ICIAP'19 | Trento, Italy Best Student Paper Award

    Conventional approaches to object instance re-identification rely on matching appearances of the target objects among a set of frames. However, learning appearances of the objects alone might fail when there are multiple objects with similar appearance or multiple instances of same object class present in the scene. This paper proposes that partial observations of the background can be utilized to aid in the object re-identification task for a rigid scene, especially a rigid environment with a lot of reoccurring identical models of objects. Using an extension to the Mask R-CNN architecture, we learn to encode the important and distinct information in the background jointly with the foreground relevant to rigid real-world scenarios such as an indoor environment where objects are static and the camera moves around the scene. We demonstrate the effectiveness of our joint visual feature in the re-identification of objects in the ScanNet dataset and show a relative improvement of around 28.25% in the rank-1 accuracy over the deepSort method.
    @inproceedings{BansalICIAP19reobj,
    author = {Vaibhav Bansal and Stuart James and Alessio {Del Bue}},
    editor = {Elisa Ricci and Samuel Rota Bul{\`{o}} and Cees Snoek and Oswald Lanz and Stefano Messelodi and Nicu Sebe},
    title = {re-OBJ: Jointly Learning the Foreground and Background for Object Instance Re-identification},
    booktitle = {Image Analysis and Processing - {ICIAP} 2019 - 20th International Conference,
    Trento, Italy, September 9-13, 2019, Proceedings, Part{II}},
    series = {Lecture Notes in Computer Science},
    volume = {11752}, pages = {402--413},
    publisher = {Springer},
    year = {2019},
    url = {https://doi.org/10.1007/978-3-030-30645-8\_37},
    doi = {10.1007/978-3-030-30645-8\_37}}
  • Augmenting datasets for Visual Question and Answering for complex spatial reasoning

    Stuart James and Alessio Del Bue

    CVPR Workshop on VQA | California, USA

  • Autonomous 3D reconstruction, mapping and exploration of indoor environments with a robotic arm

    Yiming Wang$, Stuart James,Elisavet Konstantina Stathopoulou, Carlos Beltran-Gonzalez, Yoshinori Konishi and Alessio Del Bue

    IEEE Robotics and Automation Letters | Macau

    We propose a novel information gain metric that combines hand-crafted and data-driven metrics to address the next best view problem for autonomous 3D mapping of unknown indoor environments. For the hand-crafted metric, we propose an entropy-based information gain that accounts for the previous view points to avoid the camera to revisit the same location and to promote the motion toward unexplored or occluded areas. Whereas for the learnt metric, we adopt a Convolutional Neural Network (CNN) architecture and formulate the problem as a classification problem. The CNN takes as input the current depth image and outputs the motion direction that suggests the largest unexplored surface. We train and test the CNN using a new synthetic dataset based on the SUNCG dataset. The learnt motion direction is then combined with the proposed hand-crafted metric to help handle situations where using only the hand-crafted metric tends to face ambiguities. We finally evaluate the autonomous paths over several real and synthetic indoor scenes including complex industrial and domestic settings and prove that our combined metric is able to further improve the exploration coverage compared to using only the proposed hand-crafted metric.
    @ARTICLE{WangRAL19explore, author={Y. {Wang} and S. {James} and E. K. {Stathopoulou} and C. {Beltrán-González} and Y. {Konishi} and A. {Del Bue}}, journal={IEEE Robotics and Automation Letters}, title={Autonomous 3-D Reconstruction, Mapping, and Exploration of Indoor Environments With a Robotic Arm}, year={2019}, volume={4}, number={4}, pages={3340-3347},}
  • 2018

    Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning

    Paul Gay, Stuart James, Alessio Del Bue

    ACCV'18 | Perth, Australia

    Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motion. Indeed, in such cases, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We report results on a dataset we created for the task of 3D scene graph generation in multiple views.
    @InProceedings{GayACCV19vgfm,
    author="Gay, Paul and Stuart, James and Del Bue, Alessio",
    editor="Jawahar, C. V.and Li, Hongdong and Mori, Greg and Schindler, Konrad",
    title="Visual Graphs from Motion (VGfM): Scene Understanding with Object Geometry Reasoning",
    booktitle="Computer Vision -- ACCV 2018",
    year="2019",
    publisher="Springer International Publishing",address="Cham",
    pages="330--346",
    abstract="Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motions. Indeed, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We report results on a dataset we created for the task of 3D scene graph generation in multiple views.",
    isbn="978-3-030-20893-6"}
  • Multi-view Aggregation for Color Naming with Shadow Detection and Removal

    Mohamed Dahy Elkhouly, Stuart James, Alessio Del Bue

    IPAS'18 | Nice, France Best Paper Award

    This paper presents a set of methods for classifying the color attribute of objects when multiple images of the same objects are available. This problem is more complex than the single image estimation since varying environmental effects, such as, shadows or specularities from light sources, can result in poor accuracy. These depend primarily on the camera positions and the material type of the objects. Single image techniques focus on improving the discrimination of between colors, whereas in multi-view systems additional information is available but should be utilized wisely. To this end, we propose three methods to aggregate image pixel information in multi-view that boost the performance of color name classification. Moreover, we study the effect of shadows by employing automatic shadow detection and correction techniques on the color naming problem. We tested our proposals on a new multi-view color names dataset (M3DCN) which contain indoor and outdoor objects. The experimental evaluation shows that one out of the three presented aggregation methods is very efficient and it achieves the highest accuracy in term of classification results. Also, we experimentally show that addressing visual outliers like shadow in multi-view images improves the performance of the color attribute decision process.
    @INPROCEEDINGS{ElkhoulyIPAS18mvcolor, author={M. D. {Elkhouly} and S. {James} and A. {Del Bue}}, booktitle={2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS)}, title={Multi-view Aggregation for Color Naming with Shadow Detection and Removal}, year={2018}, volume={}, number={}, pages={115-120},}
  • 3D Sketching for Interactive Model Retrieval in Virtual Reality

    Daniele Giunchi, Stuart James, Anthony Steed

    Expressive | Victoria, British Columbia, Canada

    Users within a Virtual Environment often need support designing the environment around them with the need to find relevant content while remaining immersed. We, therefore, focus on the familiar sketch-based interaction to support the process of content placing and specifically investigate how interactions from a tablet or desktop translate into the virtual environment. To understand sketching interaction within a virtual environment, we compare different methods of sketch interaction, i.e., 3D mid-air sketching, 2D sketching on a virtual tablet, 2D sketching on a fixed virtual whiteboard, and 2D sketching on a real tablet. The user remains immersed within the environment and queries a database containing detailed 3D models and replace them into the virtual environment. Our results show that 3D mid-air sketching is considered to be a more intuitive method to search a collection of models; while the addition of physical devices creates confusion due to the complications of their inclusion within a virtual environment. While we pose our work as a retrieval problem for 3D models of chairs, our results are extendable to other sketching tasks for virtual environments.
    @inproceedings{10.1145/3229147.3229166,
    author = {Giunchi, Daniele and James, Stuart and Steed, Anthony},
    title = {3D Sketching for Interactive Model Retrieval in Virtual Reality},
    year = {2018},
    isbn = {9781450358927},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3229147.3229166},
    doi = {10.1145/3229147.3229166},
    booktitle = {Proceedings of the Joint Symposium on Computational Aesthetics and Sketch-Based Interfaces and Modeling and Non-Photorealistic Animation and Rendering},
    articleno = {Article 1},
    numpages = {12},
    keywords = {HCI, sketch, virtual reality, CNN},
    location = {Victoria, British Columbia, Canada},
    series = {Expressive ’18}}
  • Model Retrieval by 3D Sketching in Immersive Virtual Reality

    Daniele Giunchi, Stuart James, Anthony Steed

    IEEE VR Poster | Reutlingen, Germany

    We describe a novel method for searching 3D model collections using free-form sketches within a virtual environment as queries. As opposed to traditional Sketch Retrieval, our queries are drawn directly onto an example model. Using immersive virtual reality the user can express their query through a sketch that demonstrates the desired structure, color and texture. Unlike previous sketch-based retrieval methods, users remain immersed within the environment without relying on textual queries or 2D projections which can disconnect the user from the environment. We show how a convolutional neural network (CNN) can create multi-view representations of colored 3D sketches. Using such a descriptor representation, our system is able to rapidly retrieve models and in this way, we provide the user with an interactive method of navigating large object datasets. Through a preliminary user study we demonstrate that by using our VR 3D model retrieval system, users can perform quick and intuitive search. Using our system users can rapidly populate a virtual environment with specific models from a very large database,and thus the technique has the potential to be broadly applicable in immersive editing systems.
  • 2017

    Texture Stationarization: Turning Photos into Tileable Textures

    Joep Moritz, Stuart James, Tom S.F. Haines, Tobias Ritschel, Tim Weyrich

    Computer Graphics Forum (Proc. Eurographics) | Lyon, France

    Texture synthesis has grown into a mature field in computer graphics, allowing the synthesis of naturalistic textures and images from photographic exemplars. Surprisingly little work, however, has been dedicated to synthesizing tileable textures, that is, textures that when laid out in a regular grid of tiles form a homogeneous appearance suitable for use in memory-sensitive real-time graphics applications. One of the key challenges in doing so is that most natural input exemplars exhibit uneven spatial variations that, when tiled, show as repetitive patterns. We propose an approach to synthesize tileable textures while enforcing stationarity properties that effectively mask repetitions while maintaining the unique characteristics of the exemplar. We explore a number of alternative measures for texture stationarity and show how each measure can be integrated into a standard texture synthesis method (PatchMatch) to enforce stationarity at user-controlled scales. We demonstrate the efficacy of our approach using a database of 118 exemplar images, both from publicly available sources as well as new ones captured under uncontrolled conditions, and we quantitatively analyze alternative stationarity measures for their robustness across many test runs using different random seeds. In conclusion, we suggest a novel synthesis approach that employs local histogram matching to reliably turn input photographs of natural surfaces into tiles well suited for artifact-free tiling.
  • Digital Photographic Practices as Expressions of Personhood and Identity: Variations Across School Leavers and Recent Retirees

    K Orzech, W Moncur, A Durrant, S James, J Collomosse

    Journal of Visual Studies |

  • 2016

    Evolutionary Data Purification for Social Media Classification

    Stuart James, John Collomosse

    International Conference on Pattern Recognition (ICPR'16) | Cancun, Mexico

  • Towards Sketched Visual Narratives for Retrieval

    Stuart James

    SketchX - Human Sketch Analysis and its Applications | London, UK

  • 2015

    Visual Narratives: Free-hand Sketch for Visual Search and Navigation of Video

    Stuart James

    PhD Thesis | University of Surrey, Guildford, UK

    Humans have an innate ability to communicate visually; the earliest forms of communication were cave drawings, and children can communicate visual descriptions of scenes through drawings well before they can write. Drawings and sketches offer an intuitive and efficient means for communicating visual concepts. Today, society faces a deluge of digital visual content driven by a surge in the generation of video on social media and the online availability of video archives. Mobile devices are emerging as the dominant platform for consuming this content, with Cisco predicting that by 2018 over 80% of mobile traffic will be video. Sketch offers a familiar and expressive modality for interacting with video on the touch-screens commonly present on such devices. This thesis contributes several new algorithms for searching and manipulating video using free-hand sketches. We propose the Visual Narrative (VN); a storyboarded sequence of one or more actions in the form of sketch that collectively describe an event. We show that VNs can be used to both efficiently search video repositories, and to synthesise video clips. First, we describe a sketch based video retrieval (SBVR) system that fuses multiple modalities (shape, colour, semantics, and motion) in order to find relevant video clips. An efficient multi-modal video descriptor is proposed enabling the search of hundreds of videos in milliseconds. This contrasts with prior SBVR that lacks an efficient index representation, and take minutes or hours to search similar datasets. This contribution not only makes SBVR practical at interactive speeds, but also enables user-refinement of results through relevance feedback to resolve sketch ambiguity, including the relative priority of the different VN modalities. Second, we present the first algorithm for sketch based pose retrieval. A pictographic representation (stick-men) is used to specify a desired human pose within the VN, and similar poses found within a video dataset. We use archival dance performance footage from the UK National Resource Centre for Dance (UK-NRCD), containing diverse examples of human pose. We investigate appropriate descriptors for sketch and video, and propose a novel manifold learning technique for mapping between the two descriptor spaces and so performing sketched pose retrieval. We show that domain adaptation can be applied to boost the performance of this system through a novel piece-wise feature-space warping technique. Third, we present a graph representation for VNs comprising multiple actions. We focus on the extension of our pose retrieval system to a sequence of poses interspersed with actions (e.g. jump, twirl). We show that our graph representation can be used for multiple applications: 1) to retrieve sequences of video comprising multiple actions; 2) to navigate in pictorial form, the retrieved video sequences; 3) to synthesise new video sequences by retrieving and concatenating video fragments from archival footage.
  • 2014

    Enhanced Digital Literacy by Multi-modal Data Mining of the Digital Lifespan

    John Collomosse, Stuart James, Abigail Durrant, Diego Trujillo-Pisanty, Wendy Moncur, Kathryn Orzech, Sarah Martindale, Mike Chantler.

    DE2015 | London, UK

  • Interactive Video Asset Retrieval using Sketched Queries

    Stuart James and John Collomosse

    CVMP'14 | London

  • Particle Filtering approach to salient video object localization

    C Gray, S James, J Collomosse and P Asente

    ICIP'14 | Switzerland

  • ReEnact Sketch based Choreographic Design from Archival Dance Footage

    S James, M Fonseca and J Collomosse

    ACM International Conference on Multimedia Retrieval (ICMR'14) | Glasgow, UK

  • Admixed Portrait Design Intervention to Prompt Reflection on Being Online as a New Parent

    D Trujillo-Pisanty, A Durrant, S Martindale, S James, J Collomosse

    ACM DIS'14 |

  • 2013

    Markov Random Fields for Sketch based Video Retrieval

    R Hu, S James, T Wang and J Collomosse

    ACM International Conference on Multimedia Retrieval (ICMR'13) |

  • 2012

    Skeletons from Sketches of Dancing Poses

    M Fonseca, S James and J Collomosse

    IEEE VL/HCC'12 |

  • Annotated Free-hand Sketches for Video Retrieval using Object Semantics and Motion

    R Hu, S James and J Collomosse

    Springer ACM MultiMedia Modelling (MMM'12) |

  • 2011

    Annotated Sketches for Intuitive Video Retrieval

    Stuart James and John Collomosse

    BMVA / AVA Workshop on Biological and Machine Vision. Perception Journal | Cardiff, UK

  • 2011

    Sketched Visual Narratives for Content Based Video Retrieval

    Stuart James

    MPhil Transfer Report | University of Surrey, UK