Natural Language or Not (NLoN) - package for Software Engineering Text Analysis Pipeline (MSR 2018 - Technical Papers)

Who

Mika Mäntylä, Fabio Calefato, Maëlick Claes

Track

MSR 2018 Technical Papers

Time Zone

The program is currently displayed in (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna.

Use conference time zone: (GMT+02:00) Amsterdam, Berlin, Bern, Rome, Stockholm, ViennaSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 29 May 2018 12:08 - 12:15 at E4 room - Machine Learning for SE Chair(s): Alexander Serebrenik

Abstract

The use of natural language processing (NLP) is gaining popularity in software engineering. In order to correctly perform NLP, we must pre-process the textual information to separate natural language from other information, such as log messages, that are often part of the communication in software engineering. We present a simple approach for classifying whether some textual input is natural language or not. Although our NLoN package relies on only 11 language features and character tri-grams, we are able to achieve an area under the ROC curve performances between 0.976-0.987 on three different data sources, with Lasso regression from Glmnet as our learner and two human raters for providing ground truth. Cross-source prediction performance is lower and has more fluctuation with top ROC performances from 0.913 to 0.980. Compared with prior work, our approach offers similar performance but is considerably more lightweight, making it easier to apply in software engineering text mining pipelines. Our source code and data are provided as an R-package for further improvements.

Link to Preprint

https://arxiv.org/pdf/1803.07292.pdf

Mika MäntyläAuthor

University of Oulu

Finland

Fabio CalefatoAuthor

University of Bari

Italy

Maëlick ClaesAuthor