Logo del repository
  1. Home
 
Opzioni

Machine learning methods for generating high dimensional discrete datasets

Manco G.
•
Ritacco E.
•
Rullo A.
altro
Serra E.
2022
  • journal article

Periodico
WILEY INTERDISCIPLINARY REVIEWS. DATA MINING AND KNOWLEDGE DISCOVERY
Abstract
The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real-life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two-step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X ' that preserves the main characteristics of X. This survey explores two possible approaches: (1) Constraint-based generation and (2) probabilistic generative modeling. The former is devised using inverse mining (IFM) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling (PGM) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons. This article is categorized under: Fundamental Concepts of Data and Knowledge > Big Data Mining Technologies > Machine Learning Algorithmic Development > Structure Discovery
DOI
10.1002/widm.1450
WOS
WOS:000744989000001
Archivio
https://hdl.handle.net/11390/1248982
info:eu-repo/semantics/altIdentifier/scopus/2-s2.0-85122851101
https://ricerca.unityfvg.it/handle/11390/1248982
Diritti
open access
Soggetti
  • constraints-based mod...

  • data generation

  • generative adversaria...

  • generative model

  • inverse frequent item...

  • synthetic dataset

  • variational autoencod...

google-scholar
Get Involved!
  • Source Code
  • Documentation
  • Slack Channel
Make it your own

DSpace-CRIS can be extensively configured to meet your needs. Decide which information need to be collected and available with fine-grained security. Start updating the theme to match your nstitution's web identity.

Need professional help?

The original creators of DSpace-CRIS at 4Science can take your project to the next level, get in touch!

Realizzato con Software DSpace-CRIS - Estensione mantenuta e ottimizzata da 4Science

  • Impostazioni dei cookie
  • Informativa sulla privacy
  • Accordo con l'utente finale
  • Invia il tuo Feedback