Navegando por Assunto "Aprendizado do computador"
Agora exibindo 1 - 20 de 62
- Resultados por Página
- Opções de Ordenação
Item Uma abordagem baseada em aprendizado de máquina para dimensionamento de requisitos de software(2016-12-13) Fernandes Neto, Eça da Rocha; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964; http://lattes.cnpq.br/6325583065151828This work proposes to perform the automatic sizing of software requirements using a machine learning approach. The database used is real and was obtained from a company that works with Scrum-based development process and Planning Poker es- timation. During the studies, data pre-processing, classification and selection of best attributes were used along with the term frequency–inverse document frequency algo- rithm (tf-idf) and principal component analysis (PCA). Machine learning and automatic sorting occurred with the use of Support Vector Machines (SVM) based on available data history. The final tests were performed with and without attribute selection by PCA. It is demonstrated that the assertiveness is greater when the best attributes are selected. The final tool can estimate the size of user stories with a generalization of up to 91 %. The results were considered likely to be used in the production environment without any problems to the development team.Item Uma abordagem para o suporte ao diagnóstico de melanoma por imagem via comitês de autoencoders(2021-12-17) Silva, Evele Kelle Lemos; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964Skin cancer is the most common type of cancer in Brazil, representing about 33% of cases of malignant neoplasms in the country. Melanoma is a type of skin cancer that represents only 3% of cancer cases in the organ, but it is considered the most offensive due to high possibility of metastasis, which is the spread of cancer to other organs. Although melanoma is considered the main fatal skin disease, the introduction of new drugs combined with the detection of the tumor in early stages have contributed to positive prognosis. Through the ABCDEs rule of melanoma, it is possible to identify melanoma by watching some characteristics present in the lesion, however, the identification of melanoma through observation is often a failure, especially when it comes from an inexperienced doctor. Therefore, this work aims to select and use Machine Learning techniques to propose a model that can help dermatologists to identify skin lesions through dermoscopic images, serving as a second opinion to say if it is or it is not melanoma. The proposed model was compared with techniques widely used in the literature for solving complex problems, with the objective of presenting superior performance. Using Precision and Recall, the proposed model proved to be comparable to the others, although it had access to only 0,1% of the dimensions of the image, which indicates that the model worked well on finding the characteristics that discriminate skin lesions.Item Análise da evasão no ensino superior: predição e prevenção por meio da mineração de dados educacionais(2024-03-05) Ferreira, Rodolfo André Barbosa; Mello, Rafael Ferreira Leite de; http://lattes.cnpq.br/6190254569597745; http://lattes.cnpq.br/2982020271806247Considering that dropout occurs due to abandonment, transfer, or withdrawal from the course; when the student disengages from the institution they are enrolled in or when the student definitively abandons or does not complete higher education, this article seeks to identify methods and automated techniques to assist managers in preventing dropout cases through predictions. To conduct the study, Educational Data Mining (EDM) was used, which applies data mining techniques such as database, statistics, and machine learning in education. Data from 5144 students with characteristics related to course, semester, and demographics were used from the database provided by the Academic Information and Management System (SIGA) of the Federal Rural University of Pernambuco (UFRPE) for the courses of Animal Science, Fisheries Engineering, and Agronomy. The data, except for those containing personal, restricted, and sensitive information, were separated into Academic Characteristics per Semester, General Academic Characteristics, Course-related, Demographic, and Target Characteristics. The study employs the LSTM machine learning algorithm and the SGD and Adam optimizers, exploring different values for the parameters of learning rate, momentum, batch size, and number of epochs.Item Análise da previsibilidade do preço spot do milho na determinação do preço futuro: um estudo utilizando Random Forest(2025-07-21) Lima, Luiz Felipe Dias de; Duarte, Gisleia Benini; http://lattes.cnpq.br/6349616407324519; http://lattes.cnpq.br/2985117696253378Este estudo investigou a relevância do preço do contrato futuro de milho como variável preditora do preço spot da commodity para o período de 2018 a 2020 e de 2022 a 2024, com periodicidade diária e assim para as demais variáveis. Para tanto, adotou-se como metodologia o algoritmo Random Forest, considerando como variáveis explicativas a cotação do dólar, o preço futuro da soja e o próprio preço presente (spot) do milho. O principal objetivo foi avaliar se o preço atual do milho constitui um bom predito para o comportamento do mercado futuro. Dessa forma o Random Forest demostrou alto desempenho na previsão do contrato futuro do milho, indicando boa capacidade de generalização a partir do preço spot, além disso demostrando que a cotação do dólar é uma variável importante no comportamento do preço futuro do milho.Item Análise da utilização de aprendizado de máquina na redução do volume de alertas benignos(2019) Simião, Augusto Fernando de Melo; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964; http://lattes.cnpq.br/0529129636604731To aid in combating cyber attacks, Managed Security Services Providers (MSSPs) use SIEMs (Security Information and Event Management). SIEMs are able to aggregate, process and correlate vast amounts of events from different systems, alerting security analysts of the existence of threats, such as computer viruses and cyber attacks, in computer networks. However, SIEMs are known for the high rates of benign alertas (non-threatening alerts) warnings relative to malign alerts (threatening alerts). Due to the high volumes and prevalence of benign alertas, the analyst ignores alerts as a whole, which includes those that represent potential threats, thereby increasing the risk of a network compromise. This phenomenon is known as alert fatigue and has been a frequent target of applying machine learning techniques to reduce the volume of benign alerts. Modern SIEMs use machine learning, in correlation of events, so that only alerts that actually represent possible threats are reported. However, this correlation does not consider the analyst’s deliberation, thus allowing SIEMs to continue to generate alerts previously identified as benign. This paper investigates the use of the algorithms Naïve Bayesian Learning, Decision Tree and Random Forest, to reduce the volume of benign alerts using alerts previously identified by analysts, rather than the chain of events that generate such alerts. In this way, it was possible to show, through experiments, that supervised machine learning techniques can be applied in the identification of alerts previously identified as benign.Item Análise de sentimentos em publicações do Stackoverflow(2019-08-22) Santos, Luiz Felipe dos; Trindade, Cleyton Carvalho da; http://lattes.cnpq.br/6298429503812388The use of social networks, forums and various media has been growing exponentially, reflecting directly on the amount of data generated on the Internet, a large portion of the data generated, are open and can be accessed and processed. As a result, the possibilities generated by open data have attracted many researchers and companies to extract valuable information about their customers. Information extracted from this mass of data can change the strategy of many companies and people. In computer forums, you can see the same pattern, multiple people interacting and generating various information about information technology and its derivatives. The research will go through the whole cycle of sentiment analysis, data capture on the StackOverflow platform, data processing,natural language processing, algorithm training and classification. In order to show the data processing and classification steps, compare the classification approaches and extract information about the analyzed database. After applying the sentiment analysis cycle, it was possible to compare the results of each classifier and extract information about the analyzed database, about the performance of the unstructured classifiers and the difficulty of working with the language Portuguese database.Item Análise de sentimentos em reviews de jogos digitais da Plataforma Steam(2024-09-26) Albuquerque, Júlia de Melo; Albuquerque Júnior, Gabriel Alves de; http://lattes.cnpq.br/1399502815770584Sentiment analysis is an area that investigates the emotional expressions of human language, aiming to understand the underlying needs and opinions expressed in texts. Its complexity lies in the ability to discern not only the textual content but also the implicit emotional matrices. With technological advancements, the ease of publicly expressing opinions is disseminated through various means, with online gaming being a sector that attracts numerous player posts about various available titles. However, this diversity of audiences and topics makes it challenging to understand the expressed sentiment that pervades this universe. The aim of this study is to apply sentiment analysis techniques to digital game reviews, adopting an approach focused on supervised machine learning algorithms and pre-polarized libraries, in order to identify the best classification path capable of discerning the sentiments expressed by users in the reviews. This operation considers an approach with all opinions and another focused on each game’s specific genre. This analysis was conducted by exploring data from an online game distribution company (Steam), followed by data preparation due to the peculiarities present in the records. The results reveal that machine learning models outperform traditional approaches, such as using the VADER library, showing a higher precision by approximately 10% in captures. A difference of 20% more was observed in metrics such as recall and F1-score. This study represents an analytical contribution to the field of sentiment analysis, highlighting the model’s ability to deal with the complexity of human language.Item Análise do comportamento através dos dados coletados na internet(2021-04-07) Lima, Priscilla Amarante de; Diniz, Juliana Regueira Basto; http://lattes.cnpq.br/0175193064988810; http://lattes.cnpq.br/7284770857817456This work presents an analysis of human behavior through data collected on the internet. They will be confirmed as Big Techs and the Cambridge Analytica case study. We show that digital records of behavior easily obtained, through likes, through Facebook can be used to automatically and accurately predict a range of highly confidential personal attributes, including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, separation from parents, age and sex. The based analysis is based on a data set of more than 58,000 volunteers who provided Facebook likes, detailed demographic profiles and the results of various psychometric tests. The proposed model uses dimensionality reduction to pre-process the tanned data, which is then inserted in linear regression to predict individual psych demographic profiles of tanned. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African-Americans and Caucasian Americans in 95% of cases, and between Democrats and Republicans in 85% of cases. For the personality trait "Aperture", prediction accuracy is close to the test-retest accuracy of a personality test pattern. We give examples of associations between attributes and likes and discuss it as a conclusion for online personalization and privacy.Item Análise e predição nas votações de leis federais na Câmara dos Deputados(2022-05-27) Brito, Ranniery Dias de; Brito, Kellyton dos Santos; http://lattes.cnpq.br/8750956715158540; http://lattes.cnpq.br/1061900830319137This study aims to analyze machine learning algorithms and deep learning for the task of predictability of approval of bills. It follows a post-positivist approach, adopting the quali-quantitative paradigm as a methodology. In the search for results, experiments were carried out using the available data on the Portal of the Chamber of Deputies, following the steps of bibliographic review, definition of experimentation environment, descriptive analysis and prediction. It was also sought to do a descriptive analysis and to predict possible outcomes in the voting process of legislative proposals focusing on bills that have been voted.Item Aprendizado profundo com capacidade computacional reduzida: uma aplicação à quebra de CAPTCHAs(2018-08-16) Melo, Diogo Felipe Félix de; Sampaio, Pablo Azevedo; http://lattes.cnpq.br/8865836949700771; http://lattes.cnpq.br/2213650736070295During the last decade, Deep Neural Networks has been shown to be a powerfull machine learn technique. Generally, to obtain relevant results, these techniques require high computacional power and large volumes of data, which can be a limiting factor on some cases. Neverthless, a careful project of trainig and archtecture may help to reduce these requirements. In the this work we present a comparative approach to the application of deep neural networks to text based CAPTCHAs as a way to cope with these limitations. We studied models that are capable of learn to segment and identify the text content of images, only based on examples. By experimentation of different hiper-parameters and architectures, we were capable to obtain a final model with 96.06% of token prediction accuracy in approximately 3 hours of training in a simple personal computer.Item Aprendizagem de máquina para a identificação de clientes propensos à compra em Inbound marketing(2019-07-12) Silva, Bruno Roberto Florentino da; Monteiro, Cleviton Vinicius Fonsêca; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964; http://lattes.cnpq.br/9362573782715504The most important point for a company should always be the customer and getting new customers is not always an easy strategy. Digital marketing techniques study how to attract new customers to businesses using digital platforms. By virtue of the popularization of these means, the strategies had to be shaped to the new possibilities. With just one click you can reach thousands of individuals, which means many new leads for the company. However, filtering out which of these individuals are really interested in the product or service offered by the company demands a lot of effort from the sales team. This overhead is detrimental in the sense that the company can lose revenue by not targeting the real opportunities. With the aim to minimize this problem, the present work offers a proposal whose objective is the automatic identification of the client achieved through digital marketing strategies. It is proposed the usage of Machine Learning techniques, in particular supervised classification algorithms, namely Decision Tree and Naive Bayes. It was used the Scikit-learn library available for the Python programming language. In addition, it was necessary to apply the SMOTE oversampling algorithm, due to the unbalance of the dataset. In addition, in order to optimize the classification, we used the techniques of attribute selection and model selection with hyperparameters adjustment. Finally, to evaluate the results, we used the confusion matrix, the precision and coverage metrics, and the accuracy and coverage curve. Due to the imbalance of the data, the precision metric did not report good indexes results, with averages of 5.5% of correctness. In addition, the coverage was around 83%. Even with such divergent results among the applied metrics, the present work reached its goal, identifying most of the real opportunities and reporting that using this approach, it would be possible to obtain a reduction of up to 85% in the effort applied by the sales team if they had to call for all the leads. As a consequence, the company may have a cost reduction with the resources applied to obtain new customers, allowing the sales team to find new customers with greater efficiency.Item Aprendizagem de máquina para classificação de tipos textuais: estudo de caso em textos escritos em português brasileiro(2025-07-30) Barbosa, Gabriel Augusto; Miranda, Péricles Barbosa Cunha de; http://lattes.cnpq.br/8649204954287770; http://lattes.cnpq.br/7161363389816372A classificação de textos considerando tipos textuais é de suma importância para algumas aplicações de Processamento de Linguagem Natural (PLN). Nos últimos anos, algoritmos de aprendizado de máquina têm obtido bons resultados nesta tarefa considerando textos em inglês. No entanto, pesquisas voltadas para a detecção de tipos textuais escritos em português ainda são escassas, e ainda há muito a ser estudado e descoberto nesse contexto. Assim, este artigo propõe um estudo experimental que investiga o uso de algoritmos de aprendizado de máquina para classificar textos em português considerando tipos textuais. Para isso, propomos um novo corpus composto por textos em português de dois tipos textuais: narrativo e dissertativo. Três algoritmos de aprendizado de máquina tiveram seu desempenho avaliado no corpus criado em termos de precisão, revocação e pontuação F1. Além disso, também foi realizada uma análise dos atributos envolvidos no processo para identificar quais características textuais são mais importantes na tarefa atual. Os resultados mostraram que é possível alcançar altos níveis de precisão e rememoração na classificação de textos narrativos e dissertativos. Os algoritmos obtiveram níveis de métricas semelhantes, demonstrando a qualidade das características extraídas.Item Aprendizagem de máquina quântica e comitê quântico de classificadores(2019-12-02) Araujo, Ismael Cesar da Silva; Nascimento, André Câmara Alves do; Silva, Adenilton José da; http://lattes.cnpq.br/0314035098884256; http://lattes.cnpq.br/0622594061462533; http://lattes.cnpq.br/7125338940009959Quantum machine learning is a subarea of quantum computing that studies, among other things, the creation of equivalent classical classifiers. An ensemble of classifiers is a classification model in which the output is a combined result of the outputs of the classifiers contained in it. With the premiss that when using a sufficiently large ensemble with average classifiers, a good performance can still be obtained. This work investigates the differences in the performance of a quantum equivalent of an ensemble of classifiers, using trained and untrained classifiers. Where the simulation was mane, which the performance was measured through the calculation of the amplitude probabilities of the system. And the machine learning models of the ensemble were executed over benchmark datasets made available by scikitlearn library.Item Aspect term extraction in aspect-based sentiment analysis(2019) Francisco, Alesson Delmiro; Lima, Rinaldo José de; http://lattes.cnpq.br/7645118086647340The increasing use of the Internet in many directions has created a necessity to analyze alarge quantity of data. A large amount of data is presented as Natural Language Text,which is unstructured, with many ways to express the same information. It is an importanttask to extract information and meaning from those unstructured content, such as opinionson products or services. The need to extract and analyze the large amount of data createdevery day on the Internet surpassed the capabilities of human ability, as a result, manytext mining applications that extract and analyze textual data produced by humans areavailable today, one of such kind of applications is Sentiment Analysis, viewed as a vitaltask both to the academic and commercial fields, so that companies and service providerscan use that knowledge extracted from textual documents to better understand how theircustomers think about them or to know how their products and services are appreciated ornot by their customers. However, the task of analysing unstructured text is a difficult one,that is why it is necessary to provide coherent information and concise summaries to thoserevisions. Sentiment Analysis is the process of computationally identifying and categorizingopinions expressed in a piece of text, especially in order to determine the writer’s attitudetowards a particular topic or product. Aspect-Based Sentiment Analysis is a sub-field ofSentiment Analysis that aims to extract more refined and exact opinions, by breakingdown text into aspects. Most of the current work in the literature does not take profitof either semantic-based resources or NLP-based analysis in the preprocessing stage. Tocountermeasure these limitations, a study on these resources is done aiming to extract thefeatures needed to execute the task, and to make the best combination for ATE. This workhas the main goal of implementing and analysing a method of Aspect Term Extraction(ATE) of users reviews (restaurants and laptops). The proposed method is based on asupervised approach called Conditional Random Fields (CRF) which is able to optimizethe use of features for classification, this choice was justified by previous related work thatdemonstrate the effectiveness of CRF for ATE. Also, we are investigating the existingmethods and features for ABSA, as well as proposing new features and experimentingwith feature combinations in order to find the best features combinations, that are not yetcovered in the state of art. The detailed study is done by experimenting with word features,n-grams and custom made features using an CRF supervised algorithm to accomplish thetask of Aspect Term Extraction with results in terms of Precision, Recall and F-measure,the standard evaluation metrics adopted in the field. Finally, a comparative assessmentbetween the proposal method for ATE against other related work presented in the literaturehas shown that the method presented by this work is competitive.Item Avaliação de métodos de imputação de valores ausentes para a predição de interações fármaco-proteína(2024-03-08) Santos, Victor Vidal dos; Nascimento, André Câmara Alves do; http://lattes.cnpq.br/0622594061462533; http://lattes.cnpq.br/7999257997046465In the last decade, the study of pharmacological networks has received a lot of attention given its relevance drug discovery process. Many different approaches for predicting biological interactions have been proposed, especially in the area of multiple kernel learning (MKL). Such methods comprise integrative approaches that can handle heterogeneous data sources, but suffer from the missing data problem. Techniques to handle missing values in the base kernel matrices can be used, usually based on simple techniques, such as imputing zeroes, mean and median of the matrix. In this work, techniques for handling missing values were evaluated in the context of bipartite networks. Our analyzes showed that the, depending on the amount of missing data, k-NN and SVD technique performed much better than the other techniques, bringing encouraging results, while zero-fill showed the worst performance in relation to all other evaluated methods.Item Avaliação de métodos para interpolação espacial de dados de precipitação(2019) Neris, Airton Martins; Gonçalves, Glauco Estácio; Medeiros, Victor Wanderley Costa de; http://lattes.cnpq.br/7159595141911505; http://lattes.cnpq.br/6157118581200722; http://lattes.cnpq.br/7254010025661115AbstractInformation on the amount of rainfall is essential for the most varied sectors, such asagriculture and agroforestry. Despite this importance many areas are still not coveredby meteorological stations, which causes the lack of data. To meet this need there aremethods of spatial interpolation, which use the information of correlated points to esti-mate the value that does not exist in a certain area. Thus, this work aims to evaluatemethods for the interpolation of daily precipitation data. The interpolation techniquesused in the experiments were the methods: Inverse Distance Weighting; Ordinary Krig-ing; Random Forest. For the Random Forest two different configurations were used, onethat receives as input the coordinates, and another that receives thebufferdistance,which is one of the most recent pre-processing used in the literature for the RandomForest to estimate its values based on geographical reference. We used rainfall datafrom the 46 meteorological stations from the state of Pernambuco in the period from2013 to 2018, and to compare the precision of the generalization of the methods, weused theleave-one-outcross validation. In the results, the Inverse Distance Weightingpresented a better performance in its estimates, for all the metrics, and the RandomForest using coordinates obtained the second best result. Random Forest usingbufferdistance had a lower result in terms of its metrics, but the quality of visual spatializationproved to be superior by offering a visually smoother result than offered by RandomForest using coordinates.Item Avaliação de plataformas para o reconhecimento de placas veiculares brasileiras(2021-12-14) Amaral, Carlos Ivan Santos do; Garrozi, Cícero; http://lattes.cnpq.br/0488054917286587; http://lattes.cnpq.br/8099840025648951With the growing number of private vehicles in Brazil, better methods for managing and inspecting the vehicle fleet is becoming increasingly necessary. License plates (LP) are unique and mandatory objects with the purpose of identifying the vehicle as well as its owner. It is recommended that the efficient collection of information on license plates be performed by automated systems for LP detection and recognition. These systems are fundamental for the supervision and management of different activities related to vehicle traffic. In this regard, this paper presents a study that identifies methods for LP detection and recognition with algorithms based on machine learning and deep learning. To produce this experiment, we succeeded in collecting an image bank of vehicles in toll plazas that are located in the municipality of Cabo de Santo Agostinho - PE and provide access to the Governador Eraldo Gueiros Port Industrial Complex - SUAPE. The objective of this work was to provide a comparison between Microsoft Azure's computer vision service for LP object detection in conjunction with Google Vision's Optical Character Recognition (OCR) services with the YOLO v4 Deep Learning algorithm. The result of the experiment showed that under similar configuration conditions in both models studied, YOLO v4 performed better, achieving a 92% precision rate in license plate detection and recognition.Item Classificação multi-rótulo para análise de qualidade de feedback(2025-08-06) Batista, Hyan Hugo Noá; Mello, Rafael Ferreira de Leite; http://lattes.cnpq.br/6190254569597745; http://lattes.cnpq.br/4262454011553103O feedback é um fator muito importante no processo de ensino-aprendizagem e crucial na Educação a Distância, pois, como professores e alunos estão separados no espaço e/ou tempo, é através do feedback que o aluno vai entender como está o seu desempenho na disciplina e quais são os próximos passos do aprendizado. Existem na literatura modelos de feedback que ajudam o professor a estruturar e fornecer um feedback de qualidade ao aluno. Nesse trabalho utilizamos o conceituado modelo de feedback de Hattie e Timperley que divide o feedback em categorias (tarefa, processamento da tarefa, regulação e pessoal). É possível encontrar na literatura trabalhos que analisam feedback automaticamente com base nesse modelo. Contudo, esses trabalhos utilizam algoritmos tradicionais de aprendizagem de máquina e treinam classificadores binários para cada nível de feedback. Dessa forma, este trabalho tem como objetivo utilizar algoritmos de deep learning para classificação multi-classe de feedback com base no modelo de Hattie e Timperley.Item Comparação de métodos baseados em redes convolucionais para classificação de fava(2022-05-26) Silva, Erico André da; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964; http://lattes.cnpq.br/0529785947460222The bean crop has received little attention from research and extension agencies, which has resulted in limitations in the knowledge of the agronomic characteristics of the crop. This has affected the accuracy in classifying them. Such classification is of great importance because the correct identification of plants allows a good response of the culture in terms of productivity and behavior in different environmental conditions. In this context of limited information about the characteristics, we present a solution that applies the power of computer vision to agronomy, which aims to improve productivity, reduce waste and assist in decision-making and in the selection of culture that best suits a particular region. Computer vision techniques are a set of methods used to interpret images, extracting patterns and features. Aiming to contribute to this scenario of technological development in the agronomic sector, this work compares some of the supervised classification approaches for the automatic identification of broad bean species. The scope of this work is to classify images of seedlings generated by rural producers into two categories of beans: grandma’s ear and cearense. From the comparisons made between methods of classifiers that use convolutional networks as feature extractors, we selected the best method that was the combination of a convolutional network with a support vector machine (SVM), to finally present this method to automate the classification of bean images.Item Comparação de técnicas de classificação para predição de esforço no desenvolvimento de software(2019-01-31) Uehara, Matheus Pitancó de Lima; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964; http://lattes.cnpq.br/2761038597182432A goal of activity development is critical in software development, and it is critical for the software to be delivered with quality without estimated timeframe. Estimates were taken from the project houses because they were planned in the one-year forecast, although they were facilitated by not being more stringent in the time of development of the activity, while those involving development time tended to be the more assertive in the semester demand more time and more the whole external forecasting process. The work was presented as the learning of auxiliary machines in an automated way in the times in which the improvement of movement diminished the time necessary for its accomplishment. Through the experiments were obtained results that validated the feasibility of the technique used for the extraction of characteristics and classifications in the effort estimate of the textual description of the activities. The values of the classifers range from 31% to 33% of the F-measure.
