Publications

  • 2020

    Machine Learning for Cultural Heritage: A Survey

    Marco Fiorucci, Marina Khoroshiltseva, Massimilano Pontil, Ariana Traviglia, Alessio Del Bue and Stuart James

    Pattern Recognition Letters (PR-L) | Elsevier

    The application of Machine Learning (ML) to Cultural Heritage (CH) has evolved since basic statistical approaches such as Linear Regression to complex Deep Learning models. The question remains how much of this actively improves on the underlying algorithm versus using it within a ‘black box’ setting. We survey across ML and CH literature to identify the theoretical changes which contribute to the algorithm and in turn them suitable for CH applications. Alternatively, and most commonly, when there are no changes, we review the CH applications, features and pre/post-processing which make the algorithm suitable for its use. We analyse the dominant divides within ML, Supervised, Semi-supervised and Unsupervised, and reflect on a variety of algorithms that have been extensively used. From such an analysis, we give a critical look at the use of ML in CH and consider why CH has only limited adoption of ML.
    @article{FiorucciPRL20ml4ch, title = "Machine Learning for Cultural Heritage: A Survey",
    journal = "Pattern Recognition Letters",
    volume = "133",
    pages = "102 - 108",
    year = "2020",
    issn = "0167-8655",
    doi = "https://doi.org/10.1016/j.patrec.2020.02.017",
    url = "http://www.sciencedirect.com/science/article/pii/S0167865520300532",
    author = "Marco Fiorucci and Marina Khoroshiltseva and Massimiliano Pontil and Arianna Traviglia and Alessio [Del Bue] and Stuart James",
    keywords = "Artificial Intelligence, Machine Learning, Cultural Heritage, Digital Humanities",
    abstract = "The application of Machine Learning (ML) to Cultural Heritage (CH) has evolved since basic statistical approaches such as Linear Regression to complex Deep Learning models. The question remains how much of this actively improves on the underlying algorithm versus using it within a ‘black box’ setting. We survey across ML and CH literature to identify the theoretical changes which contribute to the algorithm and in turn them suitable for CH applications. Alternatively, and most commonly, when there are no changes, we review the CH applications, features and pre/post-processing which make the algorithm suitable for its use. We analyse the dominant divides within ML, Supervised, Semi-supervised and Unsupervised, and reflect on a variety of algorithms that have been extensively used. From such an analysis, we give a critical look at the use of ML in CH and consider why CH has only limited adoption of ML."}
  • 2019

    Mixing realities for sketch retrieval in Virtual Reality

    Daniele Giunchi, Stuart James, Donald Degraen and Anthony Steed

    VRCAI'19 | Brisbane, Austrailia

    Users within a Virtual Environment often need support designing the environment around them with the need to find relevant content while remaining immersed. We, therefore, focus on the familiar sketch-based interaction to support the process of content placing and specifically investigate how interactions from a tablet or desktop translate into the virtual environment. To understand sketching interaction within a virtual environment, we compare different methods of sketch interaction, i.e., 3D mid-air sketching, 2D sketching on a virtual tablet, 2D sketching on a fixed virtual whiteboard, and 2D sketching on a real tablet. The user remains immersed within the environment and queries a database containing detailed 3D models and replace them into the virtual environment. Our results show that 3D mid-air sketching is considered to be a more intuitive method to search a collection of models; while the addition of physical devices creates confusion due to the complications of their inclusion within a virtual environment. While we pose our work as a retrieval problem for 3D models of chairs, our results are extendable to other sketching tasks for virtual environments.
    @inproceedings{GiunchiVRCAI19mixingReal,
    author = {Giunchi, Daniele and James, Stuart and Degraen, Donald and Steed, Anthony},
    title = {Mixing Realities for Sketch Retrieval in Virtual Reality},
    year = {2019},
    isbn = {9781450370028},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3359997.3365751},
    doi = {10.1145/3359997.3365751},
    booktitle = {The 17th International Conference on Virtual-Reality Continuum and Its Applications in Industry},
    articleno = {Article 50},
    numpages = {2},
    keywords = {HCI, Sketch, CNN, Virtual Reality},
    location = {Brisbane, QLD, Australia},
    series = {VRCAI ’19}}
  • re-OBJ:Jointly learning the foreground and background for object instance re-identification

    Viabhav Bansal, Stuart James and Alessio Del Bue

    ICIAP'19 | Trento, Italy Best Student Paper Award

    Conventional approaches to object instance re-identification rely on matching appearances of the target objects among a set of frames. However, learning appearances of the objects alone might fail when there are multiple objects with similar appearance or multiple instances of same object class present in the scene. This paper proposes that partial observations of the background can be utilized to aid in the object re-identification task for a rigid scene, especially a rigid environment with a lot of reoccurring identical models of objects. Using an extension to the Mask R-CNN architecture, we learn to encode the important and distinct information in the background jointly with the foreground relevant to rigid real-world scenarios such as an indoor environment where objects are static and the camera moves around the scene. We demonstrate the effectiveness of our joint visual feature in the re-identification of objects in the ScanNet dataset and show a relative improvement of around 28.25% in the rank-1 accuracy over the deepSort method.
    @inproceedings{BansalICIAP19reobj,
    author = {Vaibhav Bansal and Stuart James and Alessio {Del Bue}},
    editor = {Elisa Ricci and Samuel Rota Bul{\`{o}} and Cees Snoek and Oswald Lanz and Stefano Messelodi and Nicu Sebe},
    title = {re-OBJ: Jointly Learning the Foreground and Background for Object Instance Re-identification},
    booktitle = {Image Analysis and Processing - {ICIAP} 2019 - 20th International Conference,
    Trento, Italy, September 9-13, 2019, Proceedings, Part{II}},
    series = {Lecture Notes in Computer Science},
    volume = {11752}, pages = {402--413},
    publisher = {Springer},
    year = {2019},
    url = {https://doi.org/10.1007/978-3-030-30645-8\_37},
    doi = {10.1007/978-3-030-30645-8\_37}}
  • Augmenting datasets for Visual Question and Answering for complex spatial reasoning

    Stuart James and Alessio Del Bue

    CVPR Workshop on VQA | California, USA

  • Autonomous 3D reconstruction, mapping and exploration of indoor environments with a robotic arm

    Yiming Wang$, Stuart James,Elisavet Konstantina Stathopoulou, Carlos Beltran-Gonzalez, Yoshinori Konishi and Alessio Del Bue

    IEEE Robotics and Automation Letters | Macau

    We propose a novel information gain metric that combines hand-crafted and data-driven metrics to address the next best view problem for autonomous 3D mapping of unknown indoor environments. For the hand-crafted metric, we propose an entropy-based information gain that accounts for the previous view points to avoid the camera to revisit the same location and to promote the motion toward unexplored or occluded areas. Whereas for the learnt metric, we adopt a Convolutional Neural Network (CNN) architecture and formulate the problem as a classification problem. The CNN takes as input the current depth image and outputs the motion direction that suggests the largest unexplored surface. We train and test the CNN using a new synthetic dataset based on the SUNCG dataset. The learnt motion direction is then combined with the proposed hand-crafted metric to help handle situations where using only the hand-crafted metric tends to face ambiguities. We finally evaluate the autonomous paths over several real and synthetic indoor scenes including complex industrial and domestic settings and prove that our combined metric is able to further improve the exploration coverage compared to using only the proposed hand-crafted metric.
    @ARTICLE{WangRAL19explore, author={Y. {Wang} and S. {James} and E. K. {Stathopoulou} and C. {Beltrán-González} and Y. {Konishi} and A. {Del Bue}}, journal={IEEE Robotics and Automation Letters}, title={Autonomous 3-D Reconstruction, Mapping, and Exploration of Indoor Environments With a Robotic Arm}, year={2019}, volume={4}, number={4}, pages={3340-3347},}
  • 2018

    Visual Graphs from Motion (VGfM): Scene understanding with object geometry reasoning

    Paul Gay, Stuart James, Alessio Del Bue

    ACCV'18 | Perth, Australia

    Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motion. Indeed, in such cases, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We report results on a dataset we created for the task of 3D scene graph generation in multiple views.
    @InProceedings{GayACCV19vgfm,
    author="Gay, Paul and Stuart, James and Del Bue, Alessio",
    editor="Jawahar, C. V.and Li, Hongdong and Mori, Greg and Schindler, Konrad",
    title="Visual Graphs from Motion (VGfM): Scene Understanding with Object Geometry Reasoning",
    booktitle="Computer Vision -- ACCV 2018",
    year="2019",
    publisher="Springer International Publishing",address="Cham",
    pages="330--346",
    abstract="Recent approaches on visual scene understanding attempt to build a scene graph -- a computational representation of objects and their pairwise relationships. Such rich semantic representation is very appealing, yet difficult to obtain from a single image, especially when considering complex spatial arrangements in the scene. Differently, an image sequence conveys useful information using the multi-view geometric relations arising from camera motions. Indeed, object relationships are naturally related to the 3D scene structure. To this end, this paper proposes a system that first computes the geometrical location of objects in a generic scene and then efficiently constructs scene graphs from video by embedding such geometrical reasoning. Such compelling representation is obtained using a new model where geometric and visual features are merged using an RNN framework. We report results on a dataset we created for the task of 3D scene graph generation in multiple views.",
    isbn="978-3-030-20893-6"}
  • Multi-view Aggregation for Color Naming with Shadow Detection and Removal

    Mohamed Dahy Elkhouly, Stuart James, Alessio Del Bue

    IPAS'18 | Nice, France Best Paper Award

    This paper presents a set of methods for classifying the color attribute of objects when multiple images of the same objects are available. This problem is more complex than the single image estimation since varying environmental effects, such as, shadows or specularities from light sources, can result in poor accuracy. These depend primarily on the camera positions and the material type of the objects. Single image techniques focus on improving the discrimination of between colors, whereas in multi-view systems additional information is available but should be utilized wisely. To this end, we propose three methods to aggregate image pixel information in multi-view that boost the performance of color name classification. Moreover, we study the effect of shadows by employing automatic shadow detection and correction techniques on the color naming problem. We tested our proposals on a new multi-view color names dataset (M3DCN) which contain indoor and outdoor objects. The experimental evaluation shows that one out of the three presented aggregation methods is very efficient and it achieves the highest accuracy in term of classification results. Also, we experimentally show that addressing visual outliers like shadow in multi-view images improves the performance of the color attribute decision process.
    @INPROCEEDINGS{ElkhoulyIPAS18mvcolor, author={M. D. {Elkhouly} and S. {James} and A. {Del Bue}}, booktitle={2018 IEEE International Conference on Image Processing, Applications and Systems (IPAS)}, title={Multi-view Aggregation for Color Naming with Shadow Detection and Removal}, year={2018}, volume={}, number={}, pages={115-120},}
  • 3D Sketching for Interactive Model Retrieval in Virtual Reality

    Daniele Giunchi, Stuart James, Anthony Steed

    Expressive | Victoria, British Columbia, Canada

    Users within a Virtual Environment often need support designing the environment around them with the need to find relevant content while remaining immersed. We, therefore, focus on the familiar sketch-based interaction to support the process of content placing and specifically investigate how interactions from a tablet or desktop translate into the virtual environment. To understand sketching interaction within a virtual environment, we compare different methods of sketch interaction, i.e., 3D mid-air sketching, 2D sketching on a virtual tablet, 2D sketching on a fixed virtual whiteboard, and 2D sketching on a real tablet. The user remains immersed within the environment and queries a database containing detailed 3D models and replace them into the virtual environment. Our results show that 3D mid-air sketching is considered to be a more intuitive method to search a collection of models; while the addition of physical devices creates confusion due to the complications of their inclusion within a virtual environment. While we pose our work as a retrieval problem for 3D models of chairs, our results are extendable to other sketching tasks for virtual environments.
    @inproceedings{10.1145/3229147.3229166,
    author = {Giunchi, Daniele and James, Stuart and Steed, Anthony},
    title = {3D Sketching for Interactive Model Retrieval in Virtual Reality},
    year = {2018},
    isbn = {9781450358927},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    url = {https://doi.org/10.1145/3229147.3229166},
    doi = {10.1145/3229147.3229166},
    booktitle = {Proceedings of the Joint Symposium on Computational Aesthetics and Sketch-Based Interfaces and Modeling and Non-Photorealistic Animation and Rendering},
    articleno = {Article 1},
    numpages = {12},
    keywords = {HCI, sketch, virtual reality, CNN},
    location = {Victoria, British Columbia, Canada},
    series = {Expressive ’18}}
  • Model Retrieval by 3D Sketching in Immersive Virtual Reality

    Daniele Giunchi, Stuart James, Anthony Steed

    IEEE VR Poster | Reutlingen, Germany

    We describe a novel method for searching 3D model collections using free-form sketches within a virtual environment as queries. As opposed to traditional Sketch Retrieval, our queries are drawn directly onto an example model. Using immersive virtual reality the user can express their query through a sketch that demonstrates the desired structure, color and texture. Unlike previous sketch-based retrieval methods, users remain immersed within the environment without relying on textual queries or 2D projections which can disconnect the user from the environment. We show how a convolutional neural network (CNN) can create multi-view representations of colored 3D sketches. Using such a descriptor representation, our system is able to rapidly retrieve models and in this way, we provide the user with an interactive method of navigating large object datasets. Through a preliminary user study we demonstrate that by using our VR 3D model retrieval system, users can perform quick and intuitive search. Using our system users can rapidly populate a virtual environment with specific models from a very large database,and thus the technique has the potential to be broadly applicable in immersive editing systems.
  • 2017

    Texture Stationarization: Turning Photos into Tileable Textures

    Joep Moritz, Stuart James, Tom S.F. Haines, Tobias Ritschel, Tim Weyrich

    Computer Graphics Forum (Proc. Eurographics) | Lyon, France

    Texture synthesis has grown into a mature field in computer graphics, allowing the synthesis of naturalistic textures and images from photographic exemplars. Surprisingly little work, however, has been dedicated to synthesizing tileable textures, that is, textures that when laid out in a regular grid of tiles form a homogeneous appearance suitable for use in memory-sensitive real-time graphics applications. One of the key challenges in doing so is that most natural input exemplars exhibit uneven spatial variations that, when tiled, show as repetitive patterns. We propose an approach to synthesize tileable textures while enforcing stationarity properties that effectively mask repetitions while maintaining the unique characteristics of the exemplar. We explore a number of alternative measures for texture stationarity and show how each measure can be integrated into a standard texture synthesis method (PatchMatch) to enforce stationarity at user-controlled scales. We demonstrate the efficacy of our approach using a database of 118 exemplar images, both from publicly available sources as well as new ones captured under uncontrolled conditions, and we quantitatively analyze alternative stationarity measures for their robustness across many test runs using different random seeds. In conclusion, we suggest a novel synthesis approach that employs local histogram matching to reliably turn input photographs of natural surfaces into tiles well suited for artifact-free tiling.
  • Digital Photographic Practices as Expressions of Personhood and Identity: Variations Across School Leavers and Recent Retirees

    K Orzech, W Moncur, A Durrant, S James, J Collomosse

    Journal of Visual Studies |

  • 2016

    Evolutionary Data Purification for Social Media Classification

    Stuart James, John Collomosse

    International Conference on Pattern Recognition (ICPR'16) | Cancun, Mexico

  • Towards Sketched Visual Narratives for Retrieval

    Stuart James

    SketchX - Human Sketch Analysis and its Applications | London, UK

  • 2015

    Visual Narratives: Free-hand Sketch for Visual Search and Navigation of Video

    Stuart James

    PhD Thesis | University of Surrey, Guildford, UK

    Humans have an innate ability to communicate visually; the earliest forms of communication were cave drawings, and children can communicate visual descriptions of scenes through drawings well before they can write. Drawings and sketches offer an intuitive and efficient means for communicating visual concepts. Today, society faces a deluge of digital visual content driven by a surge in the generation of video on social media and the online availability of video archives. Mobile devices are emerging as the dominant platform for consuming this content, with Cisco predicting that by 2018 over 80% of mobile traffic will be video. Sketch offers a familiar and expressive modality for interacting with video on the touch-screens commonly present on such devices. This thesis contributes several new algorithms for searching and manipulating video using free-hand sketches. We propose the Visual Narrative (VN); a storyboarded sequence of one or more actions in the form of sketch that collectively describe an event. We show that VNs can be used to both efficiently search video repositories, and to synthesise video clips. First, we describe a sketch based video retrieval (SBVR) system that fuses multiple modalities (shape, colour, semantics, and motion) in order to find relevant video clips. An efficient multi-modal video descriptor is proposed enabling the search of hundreds of videos in milliseconds. This contrasts with prior SBVR that lacks an efficient index representation, and take minutes or hours to search similar datasets. This contribution not only makes SBVR practical at interactive speeds, but also enables user-refinement of results through relevance feedback to resolve sketch ambiguity, including the relative priority of the different VN modalities. Second, we present the first algorithm for sketch based pose retrieval. A pictographic representation (stick-men) is used to specify a desired human pose within the VN, and similar poses found within a video dataset. We use archival dance performance footage from the UK National Resource Centre for Dance (UK-NRCD), containing diverse examples of human pose. We investigate appropriate descriptors for sketch and video, and propose a novel manifold learning technique for mapping between the two descriptor spaces and so performing sketched pose retrieval. We show that domain adaptation can be applied to boost the performance of this system through a novel piece-wise feature-space warping technique. Third, we present a graph representation for VNs comprising multiple actions. We focus on the extension of our pose retrieval system to a sequence of poses interspersed with actions (e.g. jump, twirl). We show that our graph representation can be used for multiple applications: 1) to retrieve sequences of video comprising multiple actions; 2) to navigate in pictorial form, the retrieved video sequences; 3) to synthesise new video sequences by retrieving and concatenating video fragments from archival footage.
  • 2014

    Enhanced Digital Literacy by Multi-modal Data Mining of the Digital Lifespan

    John Collomosse, Stuart James, Abigail Durrant, Diego Trujillo-Pisanty, Wendy Moncur, Kathryn Orzech, Sarah Martindale, Mike Chantler.

    DE2015 | London, UK

  • Interactive Video Asset Retrieval using Sketched Queries

    Stuart James and John Collomosse

    CVMP'14 | London

  • Particle Filtering approach to salient video object localization

    C Gray, S James, J Collomosse and P Asente

    ICIP'14 | Switzerland

  • ReEnact Sketch based Choreographic Design from Archival Dance Footage

    S James, M Fonseca and J Collomosse

    ACM International Conference on Multimedia Retrieval (ICMR'14) | Glasgow, UK

  • Admixed Portrait Design Intervention to Prompt Reflection on Being Online as a New Parent

    D Trujillo-Pisanty, A Durrant, S Martindale, S James, J Collomosse

    ACM DIS'14 |

  • 2013

    Markov Random Fields for Sketch based Video Retrieval

    R Hu, S James, T Wang and J Collomosse

    ACM International Conference on Multimedia Retrieval (ICMR'13) |

  • 2012

    Skeletons from Sketches of Dancing Poses

    M Fonseca, S James and J Collomosse

    IEEE VL/HCC'12 |

  • Annotated Free-hand Sketches for Video Retrieval using Object Semantics and Motion

    R Hu, S James and J Collomosse

    Springer ACM MultiMedia Modelling (MMM'12) |

  • 2011

    Annotated Sketches for Intuitive Video Retrieval

    Stuart James and John Collomosse

    BMVA / AVA Workshop on Biological and Machine Vision. Perception Journal | Cardiff, UK

  • 2011

    Sketched Visual Narratives for Content Based Video Retrieval

    Stuart James

    MPhil Transfer Report | University of Surrey, UK