Projects : Transformed data

Below we showcase several projects in which SIH has delivered transformed data. See all projects.

ProteHome: Proteomics Experimental Results Database

ProteHome: Proteomics Experimental Results Database

The Metabolic Cybernetics Lab at the Charles Perkins Centre has generated disparate Mass Spectrometry datasets for protein metabolism studies. With datasets stored in various formats and places, it has been difficult to search and compare experimental results.

SIH developed ProteHome, a bioinformatics system providing a centralised data repository, with a web-based interface to facilitate the:

  • Standardisation of quantitative analysis results, with common specification of experimental metadata and formatting of analysis data.
  • Retrieving those results for protein(s) or modification(s) of interest, regardless of the version of protein identification number used in the stored experiment.
  • Management of submitted datasets using a comprehensive hierarchical storage structure.

ProteHome: Proteomics Experimental Results Database
The Metabolic Cybernetics Lab at the Charles Perkins Centre has generated disparate Mass Spectrometry datasets for protein metabolism studies. With datasets stored in various formats and places, it has been difficult to search and compare experimenta...

Labelling Clause Type at Scale for LCT

Labelling Clause Type at Scale for LCT

LCT studies how knowledge is built through teaching, and in order to determine the trajectory of knowledge building, proposes to categorise each clause in a teaching transcript. SIH made this process of labelling clauses much faster and scalable. They did so firstly by developing software with natural language processing technology that converts a lesson transcript into a spreadsheet where each row contains a clause to be categorised. Secondly, they developed a machine learning classifier to learn from these spreadsheets and predict the labels of future clauses. Finally, SIH developed techniques to visualise the trajectory of knowledge building through a lesson where clauses have been categorised.

Labelling Clause Type at Scale for LCT
LCT studies how knowledge is built through teaching, and in order to determine the trajectory of knowledge building, proposes to categorise each clause in a teaching transcript. SIH made this process of labelling clauses much faster and scalable. The...

Video Tracking Predator-Prey Interactions in Fish.

Video Tracking Predator-Prey Interactions in Fish.

By video-tracking the interaction between prey mosquitofish, Gambusia holbrooki, and their predator, jade perch, Scortum barcoo, under controlled conditions, we provide some of the first fine-scale characterisation of how prey adapt their behaviour according to their continuous assessment of risk based on both predator behaviour and angular distance to the predator’s mouth. When these predators were inactive and posed less of an immediate threat, prey were often found within the attack cone of the predator showing reductions in speed and acceleration, characteristic of predator-inspection behaviour. However, when predators became active, prey swam faster with greater acceleration and were closer together within the attack cone of predators. Most importantly, this study provides evidence that prey do not adopt a uniform response to the presence of a predator. Instead, we demonstrate that prey are capable of rapidly and dynamically updating their assessment of risk and showing fine-scale adjustments to their behaviour.

Paper: “Fine-scale behavioural adjustments of prey on a continuum of risk”. M.I.A. Kent, J.E. Herbert-Read, G.D. McDonald, A.J. Wood, A.J.W. Ward. Proceedings of the Royal Society B. 2019

Video Tracking Predator-Prey Interactions in Fish.
By video-tracking the interaction between prey mosquitofish, Gambusia holbrooki, and their predator, jade perch, Scortum barcoo, under controlled conditions, we provide some of the first fine-scale characterisation of how prey adapt their behaviour a...

Where can deep-sea iron nodules be found?

Where can deep-sea iron nodules be found?

Potato-sized nodules of iron ore found on the ocean floor are of commercial mining interest. However, negative ecological effects from mining these nodules is of concern. SIH constructed a global predictive model of nodule occurrence by combining data from thousands of ocean floor samples with global maps of oceanic variables. The environments in which these deposits do and do not occur could then be characterised to generate insight into potential consequences of proposed mining.

Where can deep-sea iron nodules be found?
Potato-sized nodules of iron ore found on the ocean floor are of commercial mining interest. However, negative ecological effects from mining these nodules is of concern. SIH constructed a global predictive model of nodule occurrence by combining dat...

Optimal Image Reconstruction for the SAMI Galaxy Survey

Optimal Image Reconstruction for the SAMI Galaxy Survey

The SAMI Galaxy Survey is a large-scale observational program to target several thousand galaxies with the University of Sydney built Sydney-AAO Multi-object Integral field spectrograph (SAMI). A key data challenge is to optimally reconstruct a data cube from ~500 spectra taken at different spatial locations across a galaxy. The previous method resulted in undesirable artefacts due to under-sampling and the astronomical sources changing spatial location within the data due to differential atmospheric refraction. We have developed a novel method using probabilistic image fusion that delivers optimal combination of the spectral fibre bundle data into a cube with uniform image quality while maintaining spectral details. This innovative technology has further demonstrated capabilities to achieve super-resolution and is implemented as flexible software framework that can eventually be used by a wide range of worldwide telescopes.

Optimal Image Reconstruction for the SAMI Galaxy Survey
The SAMI Galaxy Survey is a large-scale observational program to target several thousand galaxies with the University of Sydney built Sydney-AAO Multi-object Integral field spectrograph (SAMI). A key data challenge is to optimally reconstruct a data ...

Understanding Transgenerational Welfare Dependence

Understanding Transgenerational Welfare Dependence

The Transgenerational Dataset 2 Extended (TDS2-e) dataset is an important investment by the Commonwealth in understanding the factors contributing to life outcomes, including the reliance of people on income support. The data contains welfare payments to recipients born between 1987-1988, their families, parents, children and siblings. The raw data was difficult to work with because it was subject to extensive security requirements, was large in volume, and an inconvenient data shape. SIH engineered software to convert the data into forms that made the data accessible to the end user while complying with security and licence requirements. This rich dataset is now available for researchers to explore, and will contribute to the understanding and improvements to the Commonwealth income support systems and life outcomes for all Australians.

The Transgenerational Dataset 2 Extended (TDS2-e) dataset is an important investment by the Commonwealth in understanding the factors contributing to life outcomes, including the reliance of people on income support. The data contains welfare payment...

Which treatment might patients with relapsed ovarian cancer respond to?

Which treatment might patients with relapsed ovarian cancer respond to?
  • Cristina Mapagu, Westmead Clinical School
  • The University of Sydney Medical School
  • Data Science (Dr Maryam Montazerolghaem)
  • 2018
  • Transformed data

Molecular markers measured within the primary tumour are used to determine if patients who have relapsed ovarian cancer will respond to a particular treatment. SIH helped to identify subsets of genes that are overexpressed / underexpressed in response to treatments, by developing statistical methods including dimensionality reduction and hypothesis testing.

Molecular markers measured within the primary tumour are used to determine if patients who have relapsed ovarian cancer will respond to a particular treatment. SIH helped to identify subsets of genes that are overexpressed / underexpressed in respons...

Applying Machine Learning to Criminology

Applying Machine Learning to Criminology

The incidence of crime the impacts of societal and individual characteristics on criminal behaviour can be explored using modern machine learning methods, answering important questions about crime, such as: • What is the probability of a crime occurring at a location? • What are the characteristics of the population that affect the incidence of crime? Our work implements novel Bayesian machine learning techniques to modelling the dependency between offence data and demographic characteristics and spatial location. This provides a fully probabilistic approach to modelling crime which reflects all uncertainties in the prediction of offences as well as the uncertainties surrounding model parameters. By using Bayesian updating, these predictions and inferences are dynamic in the sense that they change as new information becomes available. Our model has been applied to offence data, such as domestic violence-related assaults, burglary and motor vehicle theft, in New South Wales (NSW), Australia. The results highlight the strength of the technique by validating the factors that are associated with high and low criminal activity.

Applying Machine Learning to Criminology
The incidence of crime the impacts of societal and individual characteristics on criminal behaviour can be explored using modern machine learning methods, answering important questions about crime, such as: • What is the probability of a crime occur...