* ICSE 2018 * (series) / MSR 2018 (series) / Data Showcase /
A Dataset of Duplicate Pull-requests in GitHub
Tue 29 May 2018 11:30 - 11:36 at E3 room - Data Showcase
In GitHub, the pull-based development model enables community contributors to collaborate in a more efcient way. However, the distributed and parallel characteristics of this model pose a potential risk for developers to submit duplicate pull-requests (PRs), which increase the extra cost of project maintenance. To facilitate the further studies to better understand and solve the issues introduced by duplicate PRs, we construct a large dataset of historical duplicate PRs extracted from 26 popular open source projects in GitHub by using a semi-automatic approach. Furthermore, we present some preliminary applications to illustrate how further researches can be conducted based on this dataset.
Tue 29 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
Tue 29 May
Displayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
11:00 - 12:30 | |||
11:00 6mShort-paper | 50K-C: A dataset of compilable, and compiled, Java projects Data Showcase A: Pedro Martins University of California at Irvine, USA, A: Crista Lopes University of California Irvine, A: Rohan Achar | ||
11:06 6mShort-paper | JBench: A Dataset of Data Races for Concurrency Testing Data Showcase A: Jian Gao School of Software, Tsinghua University, A: Xin Yang , A: Yu Jiang , A: Han Liu , A: Weiliang Ying , A: Xian Zhang | ||
11:12 6mShort-paper | Bugs.jar: A Large-scale, Diverse Dataset of Real-world Java Bugs Data Showcase A: Ripon Saha , A: Yingjun Lyu University of Southern California, A: Wing Lam University of Illinois at Urbana-Champaign, A: Hiroaki Yoshida Fujitsu Laboratories of America, Inc., A: Mukul Prasad Fujitsu Laboratories of America | ||
11:18 6mShort-paper | A Gold Standard for Emotion Annotation in Stack Overflow Data Showcase A: Nicole Novielli University of Bari, A: Fabio Calefato University of Bari, A: Filippo Lanubile University of Bari Pre-print | ||
11:24 6mShort-paper | Vulinoss: A Dataset of Security Vulnerabilities in Open-source Systems Data Showcase A: Antonios Gkortzis Athens University of Economics and Business, A: Dimitris Mitropoulos , A: Diomidis Spinellis Athens University of Economics and Business Pre-print | ||
11:30 6mShort-paper | A Dataset of Duplicate Pull-requests in GitHub Data Showcase A: Zhixing Li College of Computer, National University of Defense Technology, Changsha, China, A: Yue Yu National University of Defense Technology, A: Gang Yin National University of Defense Technology, A: Tao Wang National University of Defense Technology, A: Huaimin Wang Pre-print | ||
11:36 6mShort-paper | Structured Information on State and Evolution of Dockerfiles on GitHub Data Showcase DOI Pre-print | ||
11:42 6mShort-paper | A Graph-based Dataset of Commit History of Real-World Android apps Data Showcase A: Franz-Xaver Geiger , A: Ivano Malavolta Vrije Universiteit Amsterdam, A: Luca Pascarella Delft University of Technology, A: Fabio Palomba , A: Dario Di Nucci Vrije Universiteit Brussel, A: Alberto Bacchelli University of Zurich DOI Pre-print | ||
11:48 6mShort-paper | Public Git Archive: a Big Code dataset for all Data Showcase DOI Pre-print | ||
11:54 6mShort-paper | Word Embeddings for the Software Engineering Domain Data Showcase A: Vasiliki Efstathiou Athens University of Economics and Business, A: Christos Chatzilenas , A: Diomidis Spinellis Athens University of Economics and Business DOI Pre-print | ||
12:00 6mShort-paper | npm-miner: An Infrastructure for Measuring the Quality of the npm Registry Data Showcase A: Kyriakos Chatzidimitriou Aristotle University of Thessaloniki, A: Michail Papamichail , A: Themistoklis Diamantopoulos Electrical and Computer Engineering Dept, Aristotle University of Thessaloniki, A: Michail Tsapanos , A: Andreas Symeonidis DOI Pre-print | ||
12:06 6mShort-paper | CROP: Linking Code Reviews to Source Code Changes Data Showcase A: Matheus Paixao University College London, A: Jens Krinke University College London, A: DongGyun Han University College London, A: Mark Harman Facebook and University College London DOI Pre-print | ||
12:12 6mShort-paper | Developer Interaction Traces backed by IDE Screen Recordings from Think-aloud Sessions Data Showcase A: Aiko Yamashita Oslo Metropolitan University, A: Fabio Petrillo Concordia University, A: Foutse Khomh Polytechnique Montréal, A: Yann-Gaël Guéhéneuc Concordia University and Polytechnique Montréal Pre-print | ||
12:18 6mShort-paper | A Multi-level Dataset of Linux Kernel Patchwork Data Showcase DOI Pre-print | ||
12:24 6mShort-paper | Documented Unix Facilities Over 48 Years Data Showcase Link to publication DOI Media Attached |