Below we showcase several projects in which SIH has used data description and basic visualization. This may range from statistical summaries and basic plots to more sophisticated cluster analysis. See all projects.
Bayesian Updating for Childhood Obesity Grant Proposal
SIH supported a grant proposal by the Centre for Translational Data Science, by demonstrating the value of using Bayesian modelling when collecting and analysing longitudinal data on childhood obesity. We built cross-sectional Bayesian variable selection models to select important factors and models for predicting children’s BMI, mental health and sleep quality across multiple ages, for each child in the Longitudinal Study of Australian Children (LSAC) study. A vector-autoregressive model was then applied to visualise the unexplained variation in the preceding models. We constructed visualisations to demonstrate the importance of understanding uncertainty over the course of data collection, and the potential for using Bayesian adaptive trials during collection.
Labelling Clause Type at Scale for LCT
LCT studies how knowledge is built through teaching, and in order to determine the trajectory of knowledge building, proposes to categorise each clause in a teaching transcript. SIH made this process of labelling clauses much faster and scalable. They did so firstly by developing software with natural language processing technology that converts a lesson transcript into a spreadsheet where each row contains a clause to be categorised. Secondly, they developed a machine learning classifier to learn from these spreadsheets and predict the labels of future clauses. Finally, SIH developed techniques to visualise the trajectory of knowledge building through a lesson where clauses have been categorised.
eSCAPE parallel landscape evolution benchmarking
eSCAPE is a parallel landscape evolution model, built to simulate topography dynamic at various space and time scales. SIH benchmarked eSCAPE’s performance across multiple CPUs and nodes on the University of Sydney’s Artemis HPC, visualizing the program’s runtimes as well as the runtimes of specific functions within the program. SIH created reusable scripts to allow the researcher to easily assess eSCAPE’s performance in the future as code development continues.
Video Tracking Predator-Prey Interactions in Fish.
By video-tracking the interaction between prey mosquitofish, Gambusia holbrooki, and their predator, jade perch, Scortum barcoo, under controlled conditions, we provide some of the first fine-scale characterisation of how prey adapt their behaviour according to their continuous assessment of risk based on both predator behaviour and angular distance to the predator’s mouth. When these predators were inactive and posed less of an immediate threat, prey were often found within the attack cone of the predator showing reductions in speed and acceleration, characteristic of predator-inspection behaviour. However, when predators became active, prey swam faster with greater acceleration and were closer together within the attack cone of predators. Most importantly, this study provides evidence that prey do not adopt a uniform response to the presence of a predator. Instead, we demonstrate that prey are capable of rapidly and dynamically updating their assessment of risk and showing fine-scale adjustments to their behaviour.
Paper: “Fine-scale behavioural adjustments of prey on a continuum of risk”. M.I.A. Kent, J.E. Herbert-Read, G.D. McDonald, A.J. Wood, A.J.W. Ward. Proceedings of the Royal Society B. 2019
Where can deep-sea iron nodules be found?
Potato-sized nodules of iron ore found on the ocean floor are of commercial mining interest. However, negative ecological effects from mining these nodules is of concern. SIH constructed a global predictive model of nodule occurrence by combining data from thousands of ocean floor samples with global maps of oceanic variables. The environments in which these deposits do and do not occur could then be characterised to generate insight into potential consequences of proposed mining.
Predicting unnecessary CT scans
- Professor Jonathan Morris, Kolling Institute of Medical Research and Sydney Medical School; Dr Felicity Gallimore
- The University of Sydney Medical School
- Data Science (Dr Aldo Saavedra , Dr Madhura Killedar, Dr Joel Nothman and Mr Peter Thiem)
- Predictive modelling Inferential modelling Description and basic visualization Language as data
Diagnostic imaging in hospitals is costly due to expensive machines and their operators, as well as the cost of moving patients in and out of radiography. Published studies of emergency presentations have shown that the number of brain computer tomography (CT-Brain) scans performed is increasing with time while the proportion of scans giving no cause for concern remains the same and represents the largest category.
We sought to determine whether a substantial portion of CT Scans performed in North Sydney LHD were unnecessary. We translated this research question into something determinable from data: identify CT-Brain cases where the unconcerning outcome of scans could be predicted from clinical knowledge available prior to the scan. By first constructing a text classifier to label CT Scan reports as unconcerning, we were able to use clustering and predictive modelling to weakly identify some patient features that predicted unconcerning CT results.
While the project had the potential to impact clinical policy surrounding the application of CT Scans in Emergency Departments, the weak results suggests that if any excessive expenditure problem exists it is not simple to resolve. At the same time, we have developed methodologies for performing similar studies towards rationalising diagnostic scan expenditure.
Predicting Crime using a Spatial-Demographic Framework
Responding to domestic violence related assaults dominate much of the NSW Police’s resources. We try to understand the relationships that drive social-demographic change and cause the occurrence of crime using a complex modelling framework. The social-demographic-crime network and its inter-dependencies were modelled using a Bayesian vector autoregression model. We built a collaboration with BOCSAR, the crime database of all offences in NSW over the last 20 years, and sourced demographic data for multiple census years. The results of this study will help inform policy decision-making by government and police.
Discharge against medical advice in the Sydney Children's Hospital Network
Patients who discharge against medical advice (DAMA) from hospital carry a significant risk of readmission and have increased rates of morbidity and mortality. Using five years of admissions and diagnosis data, we sought to identify the demographic, clinical and administrative characteristics of DAMA patients in the Sydney Children’s Hospital Network. Using a bayesian logistic regression framework, we found statistically significant predictors of DAMA in a given admission were hospital site, a mental health/behavioural diagnosis, Aboriginality, emergency rather than elective admissions, a gastrointestinal diagnosis and a history of previous DAMA. Identification of these predictors of DAMA provides opportunities for intervention at a practice and policy level in order to prevent adverse outcomes for patients.
Transforming IMPALA: International Migration Law and Policy Assessment Database
The IMPALA database (http://www.impaladatabase.org/) contains migration and citizenship law and policy across countries and through time in a form that allows legislation, policy and some statistical data to be easily compared and measured. With one record for each visa type in each year, thousands of records of content have been entered, mostly manually, into a Qualtrics survey. SIH transformed the unwieldy manually-entered database to improve data exploration. We wrote software to ingest the Qualtrics responses (https://github.com/Sydney-Informatics-Hub/qualtrics-pandas), and to generate a more usable output with consistent metadata.
Disease spectrum and management of children admitted with acute respiratory infection in Viet Nam
- Nguyen Thi Kim Phuong, Respiratory Department, Da Nang Hospital for Women and Children; Professor Ben Marais, The Children’s Hospital at Westmead Clinical School and Deputy Director, Marie Bashir Institute for Infectious Diseases and Biosecurity
- Faculty of Health Sciences
- Data Science (Dr Maryam Montazerolghaem )
- Description and basic visualization
This study aim to assess the acute respiratory infection (ARI) disease spectrum, duration of hospitalisation and outcome in children hospitalised with an ARI in Viet Nam. The result indicates that acute respiratory infection is a major cause of paediatric hospitalisation in Viet Nam, characterised by prolonged hospitalisation for relatively mild disease. There is huge potential to reduce unnecessary hospital admission and cost.