Write a Blog >>
MSR 2018
Mon 28 - Tue 29 May 2018 Gothenburg, Sweden
co-located with * ICSE 2018 *
Tue 29 May 2018 11:30 - 11:36 at E3 room - Data Showcase

In GitHub, the pull-based development model enables community contributors to collaborate in a more efcient way. However, the distributed and parallel characteristics of this model pose a potential risk for developers to submit duplicate pull-requests (PRs), which increase the extra cost of project maintenance. To facilitate the further studies to better understand and solve the issues introduced by duplicate PRs, we construct a large dataset of historical duplicate PRs extracted from 26 popular open source projects in GitHub by using a semi-automatic approach. Furthermore, we present some preliminary applications to illustrate how further researches can be conducted based on this dataset.

Tue 29 May

Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change

11:00 - 12:30
Data ShowcaseData Showcase at E3 room
11:00
6m
Short-paper
50K-C: A dataset of compilable, and compiled, Java projects
Data Showcase
A: Pedro Martins University of California at Irvine, USA, A: Crista Lopes University of California Irvine, A: Rohan Achar
11:06
6m
Short-paper
JBench: A Dataset of Data Races for Concurrency Testing
Data Showcase
A: Jian Gao School of Software, Tsinghua University, A: Xin Yang , A: Yu Jiang , A: Han Liu , A: Weiliang Ying , A: Xian Zhang
11:12
6m
Short-paper
Bugs.jar: A Large-scale, Diverse Dataset of Real-world Java Bugs
Data Showcase
A: Ripon Saha , A: Yingjun Lyu University of Southern California, A: Wing Lam University of Illinois at Urbana-Champaign, A: Hiroaki Yoshida Fujitsu Laboratories of America, Inc., A: Mukul Prasad Fujitsu Laboratories of America
11:18
6m
Short-paper
A Gold Standard for Emotion Annotation in Stack Overflow
Data Showcase
A: Nicole Novielli University of Bari, A: Fabio Calefato University of Bari, A: Filippo Lanubile University of Bari
Pre-print
11:24
6m
Short-paper
Vulinoss: A Dataset of Security Vulnerabilities in Open-source Systems
Data Showcase
A: Antonios Gkortzis Athens University of Economics and Business, A: Dimitris Mitropoulos , A: Diomidis Spinellis Athens University of Economics and Business
Pre-print
11:30
6m
Short-paper
A Dataset of Duplicate Pull-requests in GitHub
Data Showcase
A: Zhixing Li College of Computer, National University of Defense Technology, Changsha, China, A: Yue Yu National University of Defense Technology, A: Gang Yin National University of Defense Technology, A: Tao Wang National University of Defense Technology, A: Huaimin Wang
Pre-print
11:36
6m
Short-paper
Structured Information on State and Evolution of Dockerfiles on GitHub
Data Showcase
DOI Pre-print
11:42
6m
Short-paper
A Graph-based Dataset of Commit History of Real-World Android apps
Data Showcase
A: Franz-Xaver Geiger , A: Ivano Malavolta Vrije Universiteit Amsterdam, A: Luca Pascarella Delft University of Technology, A: Fabio Palomba , A: Dario Di Nucci Vrije Universiteit Brussel, A: Alberto Bacchelli University of Zurich
DOI Pre-print
11:48
6m
Short-paper
Public Git Archive: a Big Code dataset for all
Data Showcase
A: Vadim Markovtsev source{d}, A: Waren Long source{d}
DOI Pre-print
11:54
6m
Short-paper
Word Embeddings for the Software Engineering Domain
Data Showcase
A: Vasiliki Efstathiou Athens University of Economics and Business, A: Christos Chatzilenas , A: Diomidis Spinellis Athens University of Economics and Business
DOI Pre-print
12:00
6m
Short-paper
npm-miner: An Infrastructure for Measuring the Quality of the npm Registry
Data Showcase
A: Kyriakos Chatzidimitriou Aristotle University of Thessaloniki, A: Michail Papamichail , A: Themistoklis Diamantopoulos Electrical and Computer Engineering Dept, Aristotle University of Thessaloniki, A: Michail Tsapanos , A: Andreas Symeonidis
DOI Pre-print
12:06
6m
Short-paper
CROP: Linking Code Reviews to Source Code Changes
Data Showcase
A: Matheus Paixao University College London, A: Jens Krinke University College London, A: DongGyun Han University College London, A: Mark Harman Facebook and University College London
DOI Pre-print
12:12
6m
Short-paper
Developer Interaction Traces backed by IDE Screen Recordings from Think-aloud Sessions
Data Showcase
A: Aiko Yamashita Oslo Metropolitan University, A: Fabio Petrillo Concordia University, A: Foutse Khomh Polytechnique Montréal, A: Yann-Gaël Guéhéneuc Concordia University and Polytechnique Montréal
Pre-print
12:18
6m
Short-paper
A Multi-level Dataset of Linux Kernel Patchwork
Data Showcase
A: Yulin Xu Peking University, A: Minghui Zhou Peking University
DOI Pre-print
12:24
6m
Short-paper
Documented Unix Facilities Over 48 Years
Data Showcase
A: Diomidis Spinellis Athens University of Economics and Business
Link to publication DOI Media Attached