Projects [DRAFT]

THIS PAGE INCLUDES DRAFT PROJECT SUMMARIES NOT YET APPROVED FOR PUBLICATION. Hide published projects

PIPE-156 ✏️

Labelling Clause Type at Scale for LCT

Labelling Clause Type at Scale for LCT
  • Dr. Yaegan Doran, Linguistics and LCT Centre; Professor Karl Maton, LCT Centre for Knowledge Building
  • Faculty of Arts and Social Sciences
  • Data Science (Dr. Joel Nothman)
  • 2019
  • Software Transformed data Report
  • Data collection Predictive modelling Description and basic visualization Language as data

LCT studies how knowledge is built through teaching, and in order to determine the trajectory of knowledge building, proposes to categorise each clause in a teaching transcript. SIH made this process of labelling clauses much faster and scalable. They did so firstly by developing software with natural language processing technology that converts a lesson transcript into a spreadsheet where each row contains a clause to be categorised. Secondly, they developed a machine learning classifier to learn from these spreadsheets and predict the labels of future clauses. Finally, SIH developed techniques to visualise the trajectory of knowledge building through a lesson where clauses have been categorised.

Labelling Clause Type at Scale for LCT
LCT studies how knowledge is built through teaching, and in order to determine the trajectory of knowledge building, proposes to categorise each clause in a teaching transcript. SIH made this process of labelling clauses much faster and scalable. The...
PIPE-516 ✏️

Identifying ram mating behaviour

Identifying ram mating behaviour
  • AProf. Simon de Graaf; Dr Jessica Rickard; Ms Emmah Tumeth
  • Faculty of Science
  • Data Science (Dr Madhura Killedar and Dr Alex Judge)
  • 2019
  • Verbal advice Software Report
  • Predictive modelling Time series

Monitoring livestock has historically been labour intensive. The advent of on-animal sensors means this monitoring can be conducted remotely, continuously, and accurately. The ability to identify the precise time when sheep are mating using ram-mounted accelerometer data would unlock unprecedented information on the reproductive performance of these animals. We fit a classifier model to data from collar accelerometers labelled by videoing rams in the presence of ewes in oestrus. We then wrote code to detect change points in new acceleration data and to predict the occurence of mating events.

Identifying ram mating behaviour
Monitoring livestock has historically been labour intensive. The advent of on-animal sensors means this monitoring can be conducted remotely, continuously, and accurately. The ability to identify the precise time when sheep are mating using ram-mount...
PIPE-520 ✏️

eSCAPE parallel landscape evolution benchmarking

eSCAPE parallel landscape evolution benchmarking
  • Dr. Tristan Salles
  • Faculty of Science
  • Data Science (David Kohn)
  • 2019
  • Report
  • Description and basic visualization Creative visualization

eSCAPE is a parallel landscape evolution model, built to simulate topography dynamic at various space and time scales. SIH benchmarked eSCAPE’s performance across multiple CPUs and nodes on the University of Sydney’s Artemis HPC, visualizing the program’s runtimes as well as the runtimes of specific functions within the program. SIH created reusable scripts to allow the researcher to easily assess eSCAPE’s performance in the future as code development continues.

eSCAPE parallel landscape evolution benchmarking
eSCAPE is a parallel landscape evolution model, built to simulate topography dynamic at various space and time scales. SIH benchmarked eSCAPE's performance across multiple CPUs and nodes on the University of Sydney's Artemis HPC, visualizing the pro...
PIPE-399 ✏️

1000-fold speedup in Dynamic Bayesian network model

1000-fold speedup in Dynamic Bayesian network model

A Bayesian network is a series of linear models fit to describe the relationships between different variables in a time series. If there are change points in how these variables are related, then the network is dynamic.

SIH helped the researcher by speeding up the R-package used to fit the dynamic bayesian network model by 1000x. The R-package is now available at https://github.com/FrankD/EDISON/tree/MultipleTimeSeries

1000-fold speedup in Dynamic Bayesian network model
A Bayesian network is a series of linear models fit to describe the relationships between different variables in a time series. If there are change points in how these variables are related, then the network is dynamic. SIH helped the researcher by...
PIPE-362 ✏️

Repackaging software for modelling topic structure in language

Repackaging software for modelling topic structure in language
  • Eduardo Altmann, Mathematics and Statistics
  • Faculty of Science
  • Data Science (Vijay Raghunath and Dr Joel Nothman)
  • 2019
  • Software

Altmann with Martin Gerlach wanted other researchers to try out their new language modelling technique, so they made it open-source. SIH made their work more accessible by following software best practices: restructuring the code so that it conformed to the Scikit-learn estimator API; adding automated software testing and continuous integration; extending and publishing documentation; and releasing version 0.1 of the software to the Python package index. See http://topsbm.readthedocs.io

Altmann with Martin Gerlach wanted other researchers to try out their new language modelling technique, so they made it open-source. SIH made their work more accessible by following software best practices: restructuring the code so that it conformed...
PIPE-223 ✏️

Modelling the Earth's Subsurface with Uncertainty Using Bayesian Inference and Evaluating MCMC Sampler Performance for Geoscientific Applications

Modelling the Earth's Subsurface with Uncertainty Using Bayesian Inference and Evaluating MCMC Sampler Performance for Geoscientific Applications
  • Richard Scalzo, Dietmar Müller, Sally Cripps, Rohitash Chandra, Gregory Houseman, Hugo Olierook
  • Faculty of Science
  • Data Science (David Kohn)
  • https://github.com/rscalzo/obsidian
  • 2018
  • Software Report Paper
  • Predictive modelling Inferential modelling Description and basic visualization Creative visualization

The “Formation Boundaries” project looks at both the application of a Bayesian inference engine to the Gascoyne region in Western Australia as well as the exploration of the modelling assumptions of the inference engine.

Bayesian inference is an important tool for understanding uncertainty in regards to formation location and rock properties when undertaking mineral exploration.

The project was part of a demonstration of collaboration for a Geosciences Centre of Excellence bid.

SIH has helped apply this Bayesian modelling approach to a novel region in an applications paper, demonstrating the wider applicability of the Bayesian inference approach in geophysical problems beyond the original Moomba region that the inference engine was developed for.

SIH has also helped to facilitate the understanding of the inference engine for the geoscience community by exploring how the different modelling assumptions of Bayesian inference effect the model output. SIH has also helped to develop novel extensions to the inference engine including a more efficient MCMC sampling scheme.

Modelling the Earth's Subsurface with Uncertainty Using Bayesian Inference and Evaluating MCMC Sampler Performance for Geoscientific Applications
The "Formation Boundaries" project looks at both the application of a Bayesian inference engine to the Gascoyne region in Western Australia as well as the exploration of the modelling assumptions of the inference engine. Bayesian inference is an i...
PIPE-43 ✏️

Where can deep-sea iron nodules be found?

Where can deep-sea iron nodules be found?
  • Dr. Adriana Dutkiewicz, School of Geosciences; Prof. Dietmar Müller, School of Geosciences
  • Faculty of Science
  • Data Science (Dr. Alexander Judge)
  • 2018
  • Software Transformed data Report
  • Predictive modelling Inferential modelling Description and basic visualization

Potato-sized nodules of iron ore found on the ocean floor are of commercial mining interest. However, negative ecological effects from mining these nodules is of concern. SIH constructed a global predictive model of nodule occurrence by combining data from thousands of ocean floor samples with global maps of oceanic variables. The environments in which these deposits do and do not occur could then be characterised to generate insight into potential consequences of proposed mining.

Where can deep-sea iron nodules be found?
Potato-sized nodules of iron ore found on the ocean floor are of commercial mining interest. However, negative ecological effects from mining these nodules is of concern. SIH constructed a global predictive model of nodule occurrence by combining dat...
PIPE-71 ✏️

Predicting unnecessary CT scans

Predicting unnecessary CT scans
  • Professor Jonathan Morris, Kolling Institute of Medical Research and Sydney Medical School; Dr Felicity Gallimore
  • The University of Sydney Medical School
  • Data Science (Dr Aldo Saavedra , Dr Madhura Killedar, Dr Joel Nothman and Mr Peter Thiem)
  • 2018
  • Report
  • Predictive modelling Inferential modelling Description and basic visualization Language as data

Diagnostic imaging in hospitals is costly due to expensive machines and their operators, as well as the cost of moving patients in and out of radiography. Published studies of emergency presentations have shown that the number of brain computer tomography (CT-Brain) scans performed is increasing with time while the proportion of scans giving no cause for concern remains the same and represents the largest category.

We sought to determine whether a substantial portion of CT Scans performed in North Sydney LHD were unnecessary. We translated this research question into something determinable from data: identify CT-Brain cases where the unconcerning outcome of scans could be predicted from clinical knowledge available prior to the scan. By first constructing a text classifier to label CT Scan reports as unconcerning, we were able to use clustering and predictive modelling to weakly identify some patient features that predicted unconcerning CT results.

While the project had the potential to impact clinical policy surrounding the application of CT Scans in Emergency Departments, the weak results suggests that if any excessive expenditure problem exists it is not simple to resolve. At the same time, we have developed methodologies for performing similar studies towards rationalising diagnostic scan expenditure.

Predicting unnecessary CT scans
Diagnostic imaging in hospitals is costly due to expensive machines and their operators, as well as the cost of moving patients in and out of radiography. Published studies of emergency presentations have shown that the number of brain computer tomog...
PIPE-17 ✏️

Optimal Image Reconstruction for the SAMI Galaxy Survey

Optimal Image Reconstruction for the SAMI Galaxy Survey
  • Prof. Scott Croom, School of Physics, and the SAMI team; Dr. Richard Scalzo, Centre for Translational Data Science
  • Faculty of Science
  • Data Science (Sebastian Haan)
  • 2018
  • Software Transformed data
  • Predictive modelling

The SAMI Galaxy Survey is a large-scale observational program to target several thousand galaxies with the University of Sydney built Sydney-AAO Multi-object Integral field spectrograph (SAMI). A key data challenge is to optimally reconstruct a data cube from ~500 spectra taken at different spatial locations across a galaxy. The previous method resulted in undesirable artefacts due to under-sampling and the astronomical sources changing spatial location within the data due to differential atmospheric refraction. We have developed a novel method using probabilistic image fusion that delivers optimal combination of the spectral fibre bundle data into a cube with uniform image quality while maintaining spectral details. This innovative technology has further demonstrated capabilities to achieve super-resolution and is implemented as flexible software framework that can eventually be used by a wide range of worldwide telescopes.

Optimal Image Reconstruction for the SAMI Galaxy Survey
The SAMI Galaxy Survey is a large-scale observational program to target several thousand galaxies with the University of Sydney built Sydney-AAO Multi-object Integral field spectrograph (SAMI). A key data challenge is to optimally reconstruct a data ...
PIPE-208 ✏️

Smart Exploration for Mineral Resources

⚠️ Reason in draft: Still in publication process ⚠️

Smart Exploration for Mineral Resources
  • Prof. Fabio Ramos (CTDS, University Sydney); Prof. Dietmar Muller (University Sydney)
  • Faculty of Engineering and Information Technologies
  • Data Science (Sebastian Haan)
  • 2018
  • Software Paper
  • Predictive modelling Creative visualization

Exploring for new mineral resources is time-consuming and expensive, so we seek to combine a variety of aerial sensor types to optimise recommendations of where to place new drill-core measurements. A probabilistic framework was built to optimise measurement collection given an expensive cost function. Our new method is implemented as software package and jointly solves multi-linear forward models of 2D-sensor data to 3D-geophysical properties using sparse Gaussian Process kernels. By modelling simultaneously, the cross-variances between geophysical properties, the reconstructed 3D properties are more accurate than solving for each individually. We tested multiple optimisation strategies on a set of synthetic and real geophysical data. This project shows the advantages of this novel method for one use case study, and the same method can be applied to a large range of sensor fusion problems.

Smart Exploration for Mineral Resources
Exploring for new mineral resources is time-consuming and expensive, so we seek to combine a variety of aerial sensor types to optimise recommendations of where to place new drill-core measurements. A probabilistic framework was built to optimise mea...
PIPE-197 ✏️

Predicting Crime using a Spatial-Demographic Framework

Predicting Crime using a Spatial-Demographic Framework
  • Dr. Roman Marchant, Centre for Translational Data Science
  • Faculty of Engineering and Information Technologies
  • Data Science (Dr. Sebastian Haan)
  • 2018
  • Verbal advice Software
  • Data collection Predictive modelling Inferential modelling Data linkage Description and basic visualization Time series

Responding to domestic violence related assaults dominate much of the NSW Police’s resources. We try to understand the relationships that drive social-demographic change and cause the occurrence of crime using a complex modelling framework. The social-demographic-crime network and its inter-dependencies were modelled using a Bayesian vector autoregression model. We built a collaboration with BOCSAR, the crime database of all offences in NSW over the last 20 years, and sourced demographic data for multiple census years. The results of this study will help inform policy decision-making by government and police.

Predicting Crime using a Spatial-Demographic Framework
Responding to domestic violence related assaults dominate much of the NSW Police's resources. We try to understand the relationships that drive social-demographic change and cause the occurrence of crime using a complex modelling framework. The socia...
PIPE-94 ✏️

Understanding Transgenerational Welfare Dependence

  • Professor Deborah Cobb-Clark, School of Economics; Dr Sarah Dahmann
  • Faculty of Arts and Social Sciences
  • Data Science (Mr Peter Thiem)
  • 2018
  • Software Transformed data
  • Data-store development

The Transgenerational Dataset 2 Extended (TDS2-e) dataset is an important investment by the Commonwealth in understanding the factors contributing to life outcomes, including the reliance of people on income support. The data contains welfare payments to recipients born between 1987-1988, their families, parents, children and siblings. The raw data was difficult to work with because it was subject to extensive security requirements, was large in volume, and an inconvenient data shape. SIH engineered software to convert the data into forms that made the data accessible to the end user while complying with security and licence requirements. This rich dataset is now available for researchers to explore, and will contribute to the understanding and improvements to the Commonwealth income support systems and life outcomes for all Australians.

The Transgenerational Dataset 2 Extended (TDS2-e) dataset is an important investment by the Commonwealth in understanding the factors contributing to life outcomes, including the reliance of people on income support. The data contains welfare payment...
PIPE-13 ✏️

Automating information curation in the OMIA knowledge base

Automating information curation in the OMIA knowledge base
  • Prof. Frank Nicholas, Faculty of Science at The University of Sydney
  • Faculty of Science
  • Data Science (Joshua Stretton, Di Lu)
  • 2018
  • Software
  • Data collection Data-store development Predictive modelling Language as data

Online Medelian Inheritance in Animals (OMIA) is an online knowledge base of inherited disorders in animals. It offers a wide range of search & curation functionalities on the animal genetics database created and maintained by Prof. Frank Nicholas. Frank maintained an annotated bibliography in OMIA by manually searching for the latest articles (~150 per day), but this approach was not sustainable. SIH automated this process to emulate Frank’s existing work. A text-mining pipeline now automatically downloads and shortlists recent publications predicted to have high relevance for OMIA. We developed an interface in which Frank can annotate or exclude these publications from the knowledge base. This project enables the OMIA to continue contributing to the genetic science community as a user-friendly online platform.

Automating information curation in the OMIA knowledge base
Online Medelian Inheritance in Animals (OMIA) is an online knowledge base of inherited disorders in animals. It offers a wide range of search & curation functionalities on the animal genetics database created and maintained by Prof. Frank Nicholas. F...
PIPE-231 ✏️

A lecture on text analysis for social science

A lecture on text analysis for social science
  • Shaun Ratcliff, United States Studies Centre
  • Faculty of Arts and Social Sciences
  • Data Science (Joel Nothman)
  • 2018
  • Language as data

A new Unit of Study was being developed to teach Data Analysis in the Social Sciences to Masters students. The Unit Coordinator sought the assistance of SIH to develop a lecture and a coding tutorial on language processing and text analysis applied to the social sciences. We developed and delivered an hour-long lecture in consultation with the coordinator.

A new Unit of Study was being developed to teach Data Analysis in the Social Sciences to Masters students. The Unit Coordinator sought the assistance of SIH to develop a lecture and a coding tutorial on language processing and text analysis applied t...
PIPE-180 ✏️

Clustering Light Sources: Scaling Up to the Whole Sky

  • Associate Professor Tara Murphy, School of Physics
  • Faculty of Science
  • Data Science (Joel Nothman)
  • 2018
  • Software
  • Data-store development Time series

The Murchison Widefield Array is a state-of-the-art telescope in Western Australia. Over the last four years, researchers have collected an exceptionally large time-series dataset on 300,000 bright objects in the sky, such as supernovae. Analysing the brightness of light sources over time requires matching each across pictures from different times and locations in the sky. The astrophysicists had built processing software, a database and a web app to analyse similar datasets, but had never tried to scale it to this size of dataset. An SIH engineer was able to debug and optimise the software involved, so that the data loading process took 8 hours instead of around 15 hours, and web app load times were reduced from multiple minutes to a few seconds. This enabled further research and analysis of this unique and enormous dataset.

The Murchison Widefield Array is a state-of-the-art telescope in Western Australia. Over the last four years, researchers have collected an exceptionally large time-series dataset on 300,000 bright objects in the sky, such as supernovae. Analysing th...
PIPE-226 ✏️

Detecting animal interactions towards analysing collective behaviour

  • Faculty of Science
  • Data Science
  • 2018

The individual behaviour of animals can be tracked through tags reporting their locations over time. To help identify collective behaviours, SIH developed software that would transform this individual movement data into a dataset of animal interactions, and which could apply at scale to large numbers of animals and long time periods.

The individual behaviour of animals can be tracked through tags reporting their locations over time. To help identify collective behaviours, SIH developed software that would transform this individual movement data into a dataset of animal interactio...
PIPE-214 ✏️

Transforming Naplan data for stochastic frontier modelling

⚠️ Reason in draft: I need to get feed back from client; request approval from client. ⚠️

  • Dr Diane Dancer, Sydney Business School
  • The University of Sydney Business School
  • Data Science (Dr Maryam Montazerolghaem)
  • 2018
  • Transformed data
  • Data linkage

There are 760,000 observations for 9600 schools in Australia’s Naplan testing used in this study. The study aims to improve the quality and impact of research output by addressing the major education problems, extracted from data and modelling, to challenge Australian schools. The big dataset and complicated data structure required for the modelling made the progress of the research slow. SIH provided technical support of transforming the data to facilitate each model.

There are 760,000 observations for 9600 schools in Australia's Naplan testing used in this study. The study aims to improve the quality and impact of research output by addressing the major education problems, extracted from data and modelling, to ch...
PIPE-27 ✏️

Which treatment might patients with relapsed ovarian cancer respond to?

Which treatment might patients with relapsed ovarian cancer respond to?
  • Cristina Mapagu, Westmead Clinical School
  • The University of Sydney Medical School
  • Data Science (Dr Maryam Montazerolghaem)
  • 2018
  • Transformed data

Molecular markers measured within the primary tumour are used to determine if patients who have relapsed ovarian cancer will respond to a particular treatment. SIH helped to identify subsets of genes that are overexpressed / underexpressed in response to treatments, by developing statistical methods including dimensionality reduction and hypothesis testing.

Molecular markers measured within the primary tumour are used to determine if patients who have relapsed ovarian cancer will respond to a particular treatment. SIH helped to identify subsets of genes that are overexpressed / underexpressed in respons...
PIPE-83 ✏️

Objective Detainee Classification System

⚠️ Reason in draft: Needs images. Approval from clients not yet requested. ⚠️

  • Dr Roman Marchant Matus, Centre for Translational Data Science; Dr Garner Clancey, Sydney Law School
  • The University of Sydney Law School
  • Data Science (David Kohn)
  • 2018
  • Verbal advice Software Transformed data Report
  • Predictive modelling Inferential modelling Description and basic visualization Creative visualization Time series

The “Objective Detainee Classification System” project contributed to a report by the Sydney Law School to Juvenile Justice NSW regarding an assessment of their juvenile risk classification systems.

The Juvenile Justice risk classification system is important as a detainee’s risk classification helps determine what activities a detainee can partake in while in detention as well as the level of security need for a given detainee.

The way a detainee is treated can have drastic impacts on their recidivism and rehabilitation prospects.

SIH helped to deliver basic analysis of the demographics over the past 20 years of Juvenile Justice detainees as well as visualizations and statistics of the outcomes of the Juvenile Justice detainee risk classification system.

SIH also helped to develop an interactive app to examine the sensitivity of the current risk classification system to changes in the risk classification scoring system.

The report was delivered to Juvenile Justice in early 2018 and was part of a bid by project members to fully examine Juvenile Justice’s classification scope beyond the more limited examination in the delivered report.

The “Objective Detainee Classification System” project contributed to a report by the Sydney Law School to Juvenile Justice NSW regarding an assessment of their juvenile risk classification systems. The Juvenile Justice risk classification system...
PIPE-66 ✏️

Breast Cancer Dashboard

Breast Cancer Dashboard
  • Professor Tim Shaw, Director Research in Implementation Science and eHealth, Charles Perkins Centre; Anna Janssen; Candice Kielly-Carroll
  • Faculty of Health Sciences
  • Data Science (Dr Aldo Saavedra, Joshua Stretton and Peter Thiem)
  • 2017
  • Software

Medical data is under-used for its potential to inform clinical practice. SIH developed a visual dashboard to display information about lymphoedema in breast cancer patients. A prototype web application with an easy to use interactive dashboard was developed to help understand a patient’s journey and assess the results of different cohorts of patients. User and expert workshops helped optimise the design.

Breast Cancer Dashboard
Medical data is under-used for its potential to inform clinical practice. SIH developed a visual dashboard to display information about lymphoedema in breast cancer patients. A prototype web application with an easy to use interactive dashboard was d...
PIPE-99 ✏️

Identifying Nerve Function Profiles in Motor Neurodegenerative Disorders

  • Dr Susanna Parks; Tiffany Li
  • The University of Sydney Medical School
  • Data Science (Alex Judge)
  • 2017
  • Software Report
  • Predictive modelling Inferential modelling

Nerve excitability measurements can identify patterns of nerve dysfunction associated with many diseases of the nervous system. The researchers manage a database containing around 20 years’ of peripheral nerve excitability studies. A software package, QTRAC, is used to generate ~35 properties that are analysed in a research context. Additional information is incorporated to help make a diagnosis, such as clinical survey data, and the temperature of the nerve at the time of the test. Importantly, diagnosis of the disorder is not always 100% accurate. SIH used machine learning to predict the likelihood motor neuron disease for a patient given nerve excitability measurements. The model had reasonable ability to rank individual cases in order of increasing MND risk. SIH delivered this model in a software package for future use in research as well as a clinical setting, with the intention of improving the speed and accuracy of MND diagnosis to improve treatment outcomes for patients.

Nerve excitability measurements can identify patterns of nerve dysfunction associated with many diseases of the nervous system. The researchers manage a database containing around 20 years' of peripheral nerve excitability studies. A software package...
PIPE-91 ✏️

Scopus Data Preparation

Scopus Data Preparation

The University’s research output is evaluated, in part, on the basis of publication and citation networks derived from publication metadata archives like Elsevier’s Scopus. While the University has subscribed to Scopus snapshot data for a few years, it lacked an efficient way to load and query the data.

We analysed the snapshot, stored as a collection of XML files, and developed a relational database schema to represent useful portions of the data for efficient access. We developed a script to efficiently load the data from XML’s snapshot into a relational database.

This resource now allows the Research Portfolio to calculate metrics over the publication record, while the database is now more accessible to researchers, and the loading script to external Scopus Snapshot users.

Scopus Data Preparation
The University's research output is evaluated, in part, on the basis of publication and citation networks derived from publication metadata archives like Elsevier's Scopus. While the University has subscribed to Scopus snapshot data for a few years, ...
PIPE-76 ✏️

Applying Machine Learning to Criminology

Applying Machine Learning to Criminology
  • Dr. Roman Marchant
  • Faculty of Engineering and Information Technologies
  • Data Science (Dr. Sebastian Haan)
  • https://github.com/sebhaan/GPplus
  • 2017
  • Software Transformed data Paper

The incidence of crime the impacts of societal and individual characteristics on criminal behaviour can be explored using modern machine learning methods, answering important questions about crime, such as: • What is the probability of a crime occurring at a location? • What are the characteristics of the population that affect the incidence of crime? Our work implements novel Bayesian machine learning techniques to modelling the dependency between offence data and demographic characteristics and spatial location. This provides a fully probabilistic approach to modelling crime which reflects all uncertainties in the prediction of offences as well as the uncertainties surrounding model parameters. By using Bayesian updating, these predictions and inferences are dynamic in the sense that they change as new information becomes available. Our model has been applied to offence data, such as domestic violence-related assaults, burglary and motor vehicle theft, in New South Wales (NSW), Australia. The results highlight the strength of the technique by validating the factors that are associated with high and low criminal activity.

Applying Machine Learning to Criminology
The incidence of crime the impacts of societal and individual characteristics on criminal behaviour can be explored using modern machine learning methods, answering important questions about crime, such as: • What is the probability of a crime occur...
PIPE-120 ✏️

Pancreatic cancer survival analysis using classification

Pancreatic cancer survival analysis using classification
  • Associate Professor Fabio Ramos, School of Information Technologies and Centre for Translational Data Science
  • The University of Sydney Medical School
  • Data Science (Dr Maryam Montazerolghaem )
  • 2017
  • Predictive modelling

This study attempts to develop a method to predict survival time of the patients diagnosed with pancreatic cancer and had surgery as treatment using classification techniques.

Pancreatic cancer survival analysis using classification
This study attempts to develop a method to predict survival time of the patients diagnosed with pancreatic cancer and had surgery as treatment using classification techniques. ...
PIPE-116 ✏️

Perceptions of Police Worn Body Cameras by Detainees

Perceptions of Police Worn Body Cameras by Detainees
  • Dr Roman Marchant Matus, Centre for Translational Data Science; Professor Murray Lee, Sydney Law School
  • The University of Sydney Law School
  • Data Science (David Kohn)
  • 2017
  • Transformed data Report
  • Predictive modelling Inferential modelling Description and basic visualization

Police worldwide have adopted the use of body worn cameras but have not reliably examined the attitudes of police detainees towards these cameras. Australian detainee perceptions of police body worn cameras was explored using data from the Drug Use Monitoring in Australia (DUMA) survey. The analysis helped show limited difference in detainee attitudes based on the information collected in the surveys, supporting the need for further study of the issue. The results of the work were presented at the Australian and New Zealand Society of Criminology 2017 conference.

Perceptions of Police Worn Body Cameras by Detainees
Police worldwide have adopted the use of body worn cameras but have not reliably examined the attitudes of police detainees towards these cameras. Australian detainee perceptions of police body worn cameras was explored using data from the Drug Use M...
PIPE-88 ✏️

Research Environment for Ancient Documents (READS) efficiency

Research Environment for Ancient Documents (READS) efficiency
  • Ian McCrab; Dr Mark Allon, School of Languages and Cultures
  • Faculty of Arts and Social Sciences
  • Data Science (Joel Nothman)
  • 2017
  • Verbal advice

Research Environment for Ancient Documents (READ) is an integrated Open Source web platform for epigraphical and manuscript research. It allows digital images of texts to be annotated, and for multiple annotations to be maintained for critical analysis.

The READ research and development team consulted with SIH because they had trouble getting their software to perform well with long texts. SIH Research Engineers helped them to identify parts of the system that were slower than was reasonable. Their software engineers were able to then resolve these bottlenecks, enabling the READ system to be more widely adopted in archaeology, history and manuscript studies.

Research Environment for Ancient Documents (READ) is an integrated Open Source web platform for epigraphical and manuscript research. It allows digital images of texts to be annotated, and for multiple annotations to be maintained for critical analys...
PIPE-25 ✏️

Predictive Project Profile

Predictive Project Profile
  • Professor Lynn Crawford, School of Civil Engineering; Dr Terry Cooke-Davies; Dr Mike Steele
  • Faculty of Engineering and Information Technologies
  • Data Science (Mr Peter Thiem)
  • 2017
  • Software
  • Predictive modelling

Judging the likelihood of project success is a difficult and important aspect of project management. Projects can fail in many ways, such as overruns in cost, duration, failure to deliver benefit and failure to satisfy the project goals.

This project used survey data collected over the lifecycle of 1000 projects to build a machine learning model and prototype web application that would indicate the likelihood of success of the project.

This tool would be used for further demonstration and discussion of the idea with expert groups of project managers, such they can develop better and data driven approaches to lead to successful project management.

Predictive Project Profile
Judging the likelihood of project success is a difficult and important aspect of project management. Projects can fail in many ways, such as overruns in cost, duration, failure to deliver benefit and failure to satisfy the project goals. This proj...
PIPE-4 ✏️

Transforming IMPALA: International Migration Law and Policy Assessment Database

Transforming IMPALA: International Migration Law and Policy Assessment Database
  • Professor Mary Crock, Sydney Law School
  • The University of Sydney Law School
  • Data Science (Joel Nothman)
  • 2017
  • Software
  • Description and basic visualization

The IMPALA database (http://www.impaladatabase.org/) contains migration and citizenship law and policy across countries and through time in a form that allows legislation, policy and some statistical data to be easily compared and measured. With one record for each visa type in each year, thousands of records of content have been entered, mostly manually, into a Qualtrics survey. SIH transformed the unwieldy manually-entered database to improve data exploration. We wrote software to ingest the Qualtrics responses (https://github.com/Sydney-Informatics-Hub/qualtrics-pandas), and to generate a more usable output with consistent metadata.

Transforming IMPALA: International Migration Law and Policy Assessment Database
The IMPALA database (http://www.impaladatabase.org/) contains migration and citizenship law and policy across countries and through time in a form that allows legislation, policy and some statistical data to be easily compared and measured. With one ...
PIPE-2 ✏️

Disease spectrum and management of children admitted with acute respiratory infection in Viet Nam

Disease spectrum and management of children admitted with acute respiratory infection in Viet Nam
  • Nguyen Thi Kim Phuong, Respiratory Department, Da Nang Hospital for Women and Children; Professor Ben Marais, The Children’s Hospital at Westmead Clinical School and Deputy Director, Marie Bashir Institute for Infectious Diseases and Biosecurity
  • Faculty of Health Sciences
  • Data Science (Dr Maryam Montazerolghaem )
  • 2016
  • Paper
  • Description and basic visualization

This study aim to assess the acute respiratory infection (ARI) disease spectrum, duration of hospitalisation and outcome in children hospitalised with an ARI in Viet Nam. The result indicates that acute respiratory infection is a major cause of paediatric hospitalisation in Viet Nam, characterised by prolonged hospitalisation for relatively mild disease. There is huge potential to reduce unnecessary hospital admission and cost.

This study aim to assess the acute respiratory infection (ARI) disease spectrum, duration of hospitalisation and outcome in children hospitalised with an ARI in Viet Nam. The result indicates that acute respiratory infection is a major cause of paed...