Repository logo
  • English
  • Deutsch
  • Español
  • Français
  • Log In
    New user? Click here to register.Have you forgotten your password?
Universidad Panamericana
  • Communities & Collections
  • Research Outputs
  • Fundings & Projects
  • Researchers
  • Statistics
  • Feedback
  • English
  • Deutsch
  • Español
  • Français
  1. Home
  2. CRIS
  3. Publications
  4. A Black-Box Analysis of the Capacity of ChatGPT to Generate Datasets of Human-like Comments
 
  • Details
Options

A Black-Box Analysis of the Capacity of ChatGPT to Generate Datasets of Human-like Comments

Journal
Computers
ISSN
2073-431X
Publisher
MDPI AG
Date Issued
2025-04-27
Author(s)
Alejandro Rosete
Sosa-Gómez, Guillermo  orcid-logo
Facultad de Ciencias Económicas y Empresariales - CampGDL  
Rojas, Omar  orcid-logo
Facultad de Ciencias Económicas y Empresariales - CampGDL  
Type
text::journal::journal article
DOI
10.3390/computers14050162
URL
https://scripta.up.edu.mx/handle/20.500.12552/12173
Abstract
<jats:p>This paper examines the ability of ChatGPT to generate synthetic comment datasets that mimic those produced by humans. To this end, a collection of datasets containing human comments, freely available in the Kaggle repository, was compared to comments generated via ChatGPT. The latter were based on prompts designed to provide the necessary context for approximating human results. It was hypothesized that the responses obtained from ChatGPT would demonstrate a high degree of similarity with the human-generated datasets with regard to vocabulary usage. Two categories of prompts were analyzed, depending on whether they specified the desired length of the generated comments. The evaluation of the results primarily focused on the vocabulary used in each comment dataset, employing several analytical measures. This analysis yielded noteworthy observations, which reflect the current capabilities of ChatGPT in this particular task domain. It was observed that ChatGPT typically employs a reduced number of words compared to human respondents and tends to provide repetitive answers. Furthermore, the responses of ChatGPT have been observed to vary considerably when the length is specified. It is noteworthy that ChatGPT employs a smaller vocabulary, which does not always align with human language. Furthermore, the proportion of non-stop words in ChatGPT’s output is higher than that found in human communication. Finally, the vocabulary of ChatGPT is more closely aligned with human language than the similarity between the two configurations of ChatGPT. This alignment is particularly evident in the use of stop words. While it does not fully achieve the intended purpose, the generated vocabulary serves as a reasonable approximation, enabling specific applications such as the creation of word clouds.</jats:p>

Copyright 2024 Universidad Panamericana
Términos y condiciones | Política de privacidad | Reglamento General

Built with DSpace-CRIS software - Extension maintained and optimized by - Hosting & support SCImago Lab

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback