Logo del repository
  1. Home
 
Opzioni

A Deep Learning-Based Pipeline for the Generation of Synthetic Tabular Data

Panfilo, Daniele
•
Boudewijn, Alexander
•
Saccani, Sebastiano
altro
Medvet, Eric
2023
  • journal article

Periodico
IEEE ACCESS
Abstract
The recent and rapid progresses in Machine Learning (ML) tools and methodologies paved the way for an accessible market of ML services. In principle, small and medium-sized enterprises, as well as big companies, could act as providers and consumers of services, resulting in an intense exchange of ML services where a consumer may ask many providers for a service preview based on its particular business case, that is, its data. In practice, however, many potential service consumers are reluctant to release their data, when seeking for ML services, because of privacy or intellectual property concerns. As a consequence, the market of ML services is not as fluid as it could be. An alternative to providing real data when looking for an ML service consists in generating and releasing synthetic data. The synthetic data should 1) allow the service provider to preview an ML service whose performance is predictive of the one the same service will achieve on the real data; and 2) prevent the disclosure of the real data. In this paper, we propose a data synthesis technique tailored to a family of very relevant business cases: supervised and unsupervised learning on single-table datasets and relational datasets. Our technique is based on generative deep learning models and we instantiate it in three variants: standard Variational Autoencoders (VAEs), β -VAEs, and Introspective VAEs. We experimentally evaluate the two variants to measure the degree to which they meet the two requirements above, using several performance indexes that capture different aspects of the quality of the generated data. The results suggest that data synthesis is a practical answer to the need of decoupling ML service providers and consumers and, hence, can favor the arising of an active and accessible market of ML services.
DOI
10.1109/ACCESS.2023.3288336
WOS
WOS:001021934000001
Archivio
https://hdl.handle.net/11368/3050258
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85162882364
https://ieeexplore.ieee.org/document/10158698
Diritti
open access
license:creative commons
license uri:http://creativecommons.org/licenses/by-nc-nd/4.0/
FVG url
https://arts.units.it/bitstream/11368/3050258/1/A_Deep_Learning-Based_Pipeline_for_the_Generation_of_Synthetic_Tabular_Data.pdf
Soggetti
  • Synthetic data

  • variational auto enco...

  • data privacy

  • tabular data

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback