Bacharelado em Sistemas de Informação (Sede)
URI permanente desta comunidadehttps://arandu.ufrpe.br/handle/123456789/12
Siglas das Coleções:
APP - Artigo Publicado em Periódico
TAE - Trabalho Apresentado em Evento
TCC - Trabalho de Conclusão de Curso
Navegar
11 resultados
Resultados da Pesquisa
Item Aprendizagem de máquina para a identificação de clientes propensos à compra em Inbound marketing(2019-07-12) Silva, Bruno Roberto Florentino da; Monteiro, Cleviton Vinicius Fonsêca; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964; http://lattes.cnpq.br/9362573782715504The most important point for a company should always be the customer and getting new customers is not always an easy strategy. Digital marketing techniques study how to attract new customers to businesses using digital platforms. By virtue of the popularization of these means, the strategies had to be shaped to the new possibilities. With just one click you can reach thousands of individuals, which means many new leads for the company. However, filtering out which of these individuals are really interested in the product or service offered by the company demands a lot of effort from the sales team. This overhead is detrimental in the sense that the company can lose revenue by not targeting the real opportunities. With the aim to minimize this problem, the present work offers a proposal whose objective is the automatic identification of the client achieved through digital marketing strategies. It is proposed the usage of Machine Learning techniques, in particular supervised classification algorithms, namely Decision Tree and Naive Bayes. It was used the Scikit-learn library available for the Python programming language. In addition, it was necessary to apply the SMOTE oversampling algorithm, due to the unbalance of the dataset. In addition, in order to optimize the classification, we used the techniques of attribute selection and model selection with hyperparameters adjustment. Finally, to evaluate the results, we used the confusion matrix, the precision and coverage metrics, and the accuracy and coverage curve. Due to the imbalance of the data, the precision metric did not report good indexes results, with averages of 5.5% of correctness. In addition, the coverage was around 83%. Even with such divergent results among the applied metrics, the present work reached its goal, identifying most of the real opportunities and reporting that using this approach, it would be possible to obtain a reduction of up to 85% in the effort applied by the sales team if they had to call for all the leads. As a consequence, the company may have a cost reduction with the resources applied to obtain new customers, allowing the sales team to find new customers with greater efficiency.Item Uma abordagem baseada em aprendizado de máquina para dimensionamento de requisitos de software(2016-12-13) Fernandes Neto, Eça da Rocha; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964; http://lattes.cnpq.br/6325583065151828This work proposes to perform the automatic sizing of software requirements using a machine learning approach. The database used is real and was obtained from a company that works with Scrum-based development process and Planning Poker es- timation. During the studies, data pre-processing, classification and selection of best attributes were used along with the term frequency–inverse document frequency algo- rithm (tf-idf) and principal component analysis (PCA). Machine learning and automatic sorting occurred with the use of Support Vector Machines (SVM) based on available data history. The final tests were performed with and without attribute selection by PCA. It is demonstrated that the assertiveness is greater when the best attributes are selected. The final tool can estimate the size of user stories with a generalization of up to 91 %. The results were considered likely to be used in the production environment without any problems to the development team.Item Comparação de técnicas de classificação para predição de esforço no desenvolvimento de software(2019-01-31) Uehara, Matheus Pitancó de Lima; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964; http://lattes.cnpq.br/2761038597182432A goal of activity development is critical in software development, and it is critical for the software to be delivered with quality without estimated timeframe. Estimates were taken from the project houses because they were planned in the one-year forecast, although they were facilitated by not being more stringent in the time of development of the activity, while those involving development time tended to be the more assertive in the semester demand more time and more the whole external forecasting process. The work was presented as the learning of auxiliary machines in an automated way in the times in which the improvement of movement diminished the time necessary for its accomplishment. Through the experiments were obtained results that validated the feasibility of the technique used for the extraction of characteristics and classifications in the effort estimate of the textual description of the activities. The values of the classifers range from 31% to 33% of the F-measure.Item Uso da ciência de dados para estudo de falhas e fraudes dos abastecimentos de postos de gasolina(2019-12-19) Arruda, Luiz Felipe Ribeiro de; Albuquerque Júnior, Gabriel Alves de; Roullier, Ana; http://lattes.cnpq.br/1399502815770584; http://lattes.cnpq.br/1825682578554550Item Serviço computacional para interpolação espacial de dados meteorológicos(2019) Antonio, Wellington Luiz; Gonçalves, Glauco Estácio; Medeiros, Victor Wanderley Costa de; http://lattes.cnpq.br/7159595141911505; http://lattes.cnpq.br/6157118581200722; http://lattes.cnpq.br/6454060359445906The spatial interpolation is an essential technique for several fields, such as meteorology, hydrology, agricultural zoning, characterization of health risk areas, sociodemo-graphic, among others. Through interpolation, it is possible to model a surface of a spatially distributed variable from a finite set of known data points. In the case of weather data for agriculture, for instance, interpolation allows us to observe how weather variables behave on a given rural property, which could serve as input for irrigation management on this property. Due to the increasing demand for the use of spatial interpolation,this work proposed the development of a scalable service based on technologies and standards of state-of-the-art in distributed systems, for spatial interpolation of meteorological data associated with agriculture. In order to achieve this goal, we developed a web service based on three different reference evapotranspiration interpolation algorithms, namely: Inverse distance weighted (IDW), Ordinary Kriging (OK) and RandomForest (RF). The first two are widely used algorithms in the spatialization of reference evapotranspiration and they are known to produce low interpolation errors. The third algorithm is originated from Machine Learning. It has been used in recent studies as an alternative for spatial interpolation of environmental variables. This last algorithmhas also been obtaining promising results in the estimation of evapotranspiration. The spatial interpolation web service proposed was developed and its performace was evaluated through measurement. This service was deployed on a production enviromentusing Docker container and a mobile application was developed to integrate and show the main functionalities of the web service. The developed service can be applied inseveral areas. However, in this work more attention was paid to the agricultural sector,as this one is the sector to which this study is focused on. The main beneficiaries of the web service are researchers and developers, which, in turn, are able to develop studies that will benefit the farmer from the application of the service. During this work,we also sought to evaluate how the developed service could be useful for the promotion of the performance and the scalability with respect to spatial interpolation calculus and generation of spatial models. We also highlighted the importance of this software as a support tool for other researches or even for other software, such as Aquaprev, which uses, among other parameters, evapotranspiration and spatial interpolation to estimate the irrigation time of a given crop.Item Estudo comparativo de algoritmos de classificação supervisionada para classificação de polaridade em análise de sentimentos(2019) Albuquerque, Rotsen Diego Rodrigues de; Albuquerque Júnior, Gabriel Alves de; http://lattes.cnpq.br/1399502815770584; http://lattes.cnpq.br/6441716676783585The huge increase of data on the Internet, it is a rich source for public opinion assessment of a specific subject. Consequently, the number of opinions available makes decision-making impossible if it is necessary to read and analyze all opinions. Since the use of Machine Learning has been widely used, I will present a comparative study of two algorithms for classifying movie comments using techniques of natural language processing and Sentiment Analysis. The data obtained were obtained manually where through the competition site called Kaggle where we have about 50,000 comments on various films. The purpose of this study is also to use the concepts of data science and Machine Learning, natural language processing and sentiment analysis to add more information about the entertainment and film industry. That is why these algorithms were created so that it is possible to show the results for this domain in the of movies comments registered in one big site/platform of movie industry, the famous IMDB. After training and testing, the machine had an accuracy of 86 % on predicting sentiments on commented text from movies.Item Predição de popularidade de podcasts através de características textuais(2019) Santana Júnior, Bernardo de Moraes; Cabral, Giordano Ribeiro Eulalio; http://lattes.cnpq.br/6045470959652684; http://lattes.cnpq.br/9948081717430490With the tremendous growth of Podcasts and the professionalization of its creators,to the point that news networks call this as Podcast’s ”golden age”, new tools have emerged to assist its content producers in building and maintaining of their channels.In this context, finding features inside episodes that provide broader reach to the target audience is of great value to both creators and listeners, allowing channels to stay active longer and offer better content quality.Thus, this paper proposes a study of popularity analysis of brazilian’s podcasts using a podcast audience analysis tool in one of the most used channel and episode aggregators in the world, iTunes. By using Web Scraping tools to collect available and necessary information, also tools for transcriptions of the audios’s episodes in orderto obtain what has been said, and calculating metrics to measure the accuracy of the generated model, therefore making an analysis of which information is relevant or not o predicting a channel’s popularity.Results displayed were favorable in the correlation between the categories analyzed individually and the its text, whereas in an analysis in which categories are not discriminated there is a low relationship between text and popularity, demonstrating that the category of a given channel plays an important role in analyzing its popularity.Item Aspect term extraction in aspect-based sentiment analysis(2019) Francisco, Alesson Delmiro; Lima, Rinaldo José de; http://lattes.cnpq.br/7645118086647340The increasing use of the Internet in many directions has created a necessity to analyze alarge quantity of data. A large amount of data is presented as Natural Language Text,which is unstructured, with many ways to express the same information. It is an importanttask to extract information and meaning from those unstructured content, such as opinionson products or services. The need to extract and analyze the large amount of data createdevery day on the Internet surpassed the capabilities of human ability, as a result, manytext mining applications that extract and analyze textual data produced by humans areavailable today, one of such kind of applications is Sentiment Analysis, viewed as a vitaltask both to the academic and commercial fields, so that companies and service providerscan use that knowledge extracted from textual documents to better understand how theircustomers think about them or to know how their products and services are appreciated ornot by their customers. However, the task of analysing unstructured text is a difficult one,that is why it is necessary to provide coherent information and concise summaries to thoserevisions. Sentiment Analysis is the process of computationally identifying and categorizingopinions expressed in a piece of text, especially in order to determine the writer’s attitudetowards a particular topic or product. Aspect-Based Sentiment Analysis is a sub-field ofSentiment Analysis that aims to extract more refined and exact opinions, by breakingdown text into aspects. Most of the current work in the literature does not take profitof either semantic-based resources or NLP-based analysis in the preprocessing stage. Tocountermeasure these limitations, a study on these resources is done aiming to extract thefeatures needed to execute the task, and to make the best combination for ATE. This workhas the main goal of implementing and analysing a method of Aspect Term Extraction(ATE) of users reviews (restaurants and laptops). The proposed method is based on asupervised approach called Conditional Random Fields (CRF) which is able to optimizethe use of features for classification, this choice was justified by previous related work thatdemonstrate the effectiveness of CRF for ATE. Also, we are investigating the existingmethods and features for ABSA, as well as proposing new features and experimentingwith feature combinations in order to find the best features combinations, that are not yetcovered in the state of art. The detailed study is done by experimenting with word features,n-grams and custom made features using an CRF supervised algorithm to accomplish thetask of Aspect Term Extraction with results in terms of Precision, Recall and F-measure,the standard evaluation metrics adopted in the field. Finally, a comparative assessmentbetween the proposal method for ATE against other related work presented in the literaturehas shown that the method presented by this work is competitive.Item Avaliação de métodos para interpolação espacial de dados de precipitação(2019) Neris, Airton Martins; Gonçalves, Glauco Estácio; Medeiros, Victor Wanderley Costa de; http://lattes.cnpq.br/7159595141911505; http://lattes.cnpq.br/6157118581200722; http://lattes.cnpq.br/7254010025661115AbstractInformation on the amount of rainfall is essential for the most varied sectors, such asagriculture and agroforestry. Despite this importance many areas are still not coveredby meteorological stations, which causes the lack of data. To meet this need there aremethods of spatial interpolation, which use the information of correlated points to esti-mate the value that does not exist in a certain area. Thus, this work aims to evaluatemethods for the interpolation of daily precipitation data. The interpolation techniquesused in the experiments were the methods: Inverse Distance Weighting; Ordinary Krig-ing; Random Forest. For the Random Forest two different configurations were used, onethat receives as input the coordinates, and another that receives thebufferdistance,which is one of the most recent pre-processing used in the literature for the RandomForest to estimate its values based on geographical reference. We used rainfall datafrom the 46 meteorological stations from the state of Pernambuco in the period from2013 to 2018, and to compare the precision of the generalization of the methods, weused theleave-one-outcross validation. In the results, the Inverse Distance Weightingpresented a better performance in its estimates, for all the metrics, and the RandomForest using coordinates obtained the second best result. Random Forest usingbufferdistance had a lower result in terms of its metrics, but the quality of visual spatializationproved to be superior by offering a visually smoother result than offered by RandomForest using coordinates.Item Análise da utilização de aprendizado de máquina na redução do volume de alertas benignos(2019) Simião, Augusto Fernando de Melo; Soares, Rodrigo Gabriel Ferreira; http://lattes.cnpq.br/2526739219416964; http://lattes.cnpq.br/0529129636604731To aid in combating cyber attacks, Managed Security Services Providers (MSSPs) use SIEMs (Security Information and Event Management). SIEMs are able to aggregate, process and correlate vast amounts of events from different systems, alerting security analysts of the existence of threats, such as computer viruses and cyber attacks, in computer networks. However, SIEMs are known for the high rates of benign alertas (non-threatening alerts) warnings relative to malign alerts (threatening alerts). Due to the high volumes and prevalence of benign alertas, the analyst ignores alerts as a whole, which includes those that represent potential threats, thereby increasing the risk of a network compromise. This phenomenon is known as alert fatigue and has been a frequent target of applying machine learning techniques to reduce the volume of benign alerts. Modern SIEMs use machine learning, in correlation of events, so that only alerts that actually represent possible threats are reported. However, this correlation does not consider the analyst’s deliberation, thus allowing SIEMs to continue to generate alerts previously identified as benign. This paper investigates the use of the algorithms Naïve Bayesian Learning, Decision Tree and Random Forest, to reduce the volume of benign alerts using alerts previously identified by analysts, rather than the chain of events that generate such alerts. In this way, it was possible to show, through experiments, that supervised machine learning techniques can be applied in the identification of alerts previously identified as benign.