Accepted Papers
Call for Papers
Since 2013, the MSR conference has included a Data Showcase. The purpose of the Data Showcase is to provide a forum to share and discuss the important data sets that underpin the work of the Mining Software Repositories community.
Data Showcase papers should describe data sets that are curated by their authors and made available to use by others. Ideally, these data sets should be of value to others in the community, should be preprocessed or filtered in some way, and should provide an easy-to-understand schema. Data showcase papers are expected to include:
- a description of the data source
- a description of the methodology used to gather it (preferably with the tool used to create/generate the data)
- a description of the storage mechanism, including a schema if applicable,
- a description of how the data has been used by others,
- ideas for what future research questions could be answered or what further improvements could be made to the data set, and
- any limitations and/or challenges in creating or using this data set.
The data set should be made available at the time of submission of the paper for review, but will be considered confidential until publication of the paper.
Data showcase papers are not:
- empirical studies
- tool demos
- based on poorly explained or untrustworthy heuristics for data collection, or
- simply applying generic tools to generate data that is quick and easy for others to gather.
New this year: We expect all datasets to be accompanied by the source code that was used to create them, along with clear documentation on how to recreate them. The source code should be open source, accompanied by an appropriate license. If you cannot provide the source code or the source code clause is not applicable (e.g. because the dataset consists of qualitative data), please provide a short explanation of why this is not possible.
Submission
Submit your data paper (maximum 4 pages) to EasyChair on or before February 5, 2018. Submitted papers will undergo double-blind peer review, so please remove identifying information from the paper, including author names and funding information. We request that you use third person when referring to your own previous work (Use “This paper extends the work of Smith and Jones (2010)” rather than “This paper extends our previous work (2010)”). Identifying information should be removed during review, and, if the paper is accepted, can be added back into the paper before publication.
Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere during the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission. Submissions should follow ACM formatting guidelines and should be submitted using the EasyChair link.
Upon notification of acceptance, all authors of accepted papers will be asked to complete an ACM Copyright form and will receive further instructions for preparing their camera ready versions. At least one author of each paper is expected to present the results at the MSR conference. All accepted contributions will be published in the conference electronic proceedings.
A selection of the best papers will be invited to EMSE Special Issue.
Important Dates
Papers Due | 23:59 AOE, February 5, 2018 |
Author Notification | 23:59 AOE, March 2, 2018 |
Camera Ready | 23:59 AOE, March 16, 2018 |
Organization
Program Committee Chairs
- Georgios Gousios, TU Delft, Netherlands
- Sarah Nadi, University of Alberta, Canada
Tue 29 MayDisplayed time zone: Amsterdam, Berlin, Bern, Rome, Stockholm, Vienna change
11:00 - 12:30 | |||
11:00 6mShort-paper | 50K-C: A dataset of compilable, and compiled, Java projects Data Showcase A: Pedro Martins University of California at Irvine, USA, A: Crista Lopes University of California Irvine, A: Rohan Achar | ||
11:06 6mShort-paper | JBench: A Dataset of Data Races for Concurrency Testing Data Showcase A: Jian Gao School of Software, Tsinghua University, A: Xin Yang , A: Yu Jiang , A: Han Liu , A: Weiliang Ying , A: Xian Zhang | ||
11:12 6mShort-paper | Bugs.jar: A Large-scale, Diverse Dataset of Real-world Java Bugs Data Showcase A: Ripon Saha , A: Yingjun Lyu University of Southern California, A: Wing Lam University of Illinois at Urbana-Champaign, A: Hiroaki Yoshida Fujitsu Laboratories of America, Inc., A: Mukul Prasad Fujitsu Laboratories of America | ||
11:18 6mShort-paper | A Gold Standard for Emotion Annotation in Stack Overflow Data Showcase A: Nicole Novielli University of Bari, A: Fabio Calefato University of Bari, A: Filippo Lanubile University of Bari Pre-print | ||
11:24 6mShort-paper | Vulinoss: A Dataset of Security Vulnerabilities in Open-source Systems Data Showcase A: Antonios Gkortzis Athens University of Economics and Business, A: Dimitris Mitropoulos , A: Diomidis Spinellis Athens University of Economics and Business Pre-print | ||
11:30 6mShort-paper | A Dataset of Duplicate Pull-requests in GitHub Data Showcase A: Zhixing Li College of Computer, National University of Defense Technology, Changsha, China, A: Yue Yu National University of Defense Technology, A: Gang Yin National University of Defense Technology, A: Tao Wang National University of Defense Technology, A: Huaimin Wang Pre-print | ||
11:36 6mShort-paper | Structured Information on State and Evolution of Dockerfiles on GitHub Data Showcase DOI Pre-print | ||
11:42 6mShort-paper | A Graph-based Dataset of Commit History of Real-World Android apps Data Showcase A: Franz-Xaver Geiger , A: Ivano Malavolta Vrije Universiteit Amsterdam, A: Luca Pascarella Delft University of Technology, A: Fabio Palomba , A: Dario Di Nucci Vrije Universiteit Brussel, A: Alberto Bacchelli University of Zurich DOI Pre-print | ||
11:48 6mShort-paper | Public Git Archive: a Big Code dataset for all Data Showcase DOI Pre-print | ||
11:54 6mShort-paper | Word Embeddings for the Software Engineering Domain Data Showcase A: Vasiliki Efstathiou Athens University of Economics and Business, A: Christos Chatzilenas , A: Diomidis Spinellis Athens University of Economics and Business DOI Pre-print | ||
12:00 6mShort-paper | npm-miner: An Infrastructure for Measuring the Quality of the npm Registry Data Showcase A: Kyriakos Chatzidimitriou Aristotle University of Thessaloniki, A: Michail Papamichail , A: Themistoklis Diamantopoulos Electrical and Computer Engineering Dept, Aristotle University of Thessaloniki, A: Michail Tsapanos , A: Andreas Symeonidis DOI Pre-print | ||
12:06 6mShort-paper | CROP: Linking Code Reviews to Source Code Changes Data Showcase A: Matheus Paixao University College London, A: Jens Krinke University College London, A: DongGyun Han University College London, A: Mark Harman Facebook and University College London DOI Pre-print | ||
12:12 6mShort-paper | Developer Interaction Traces backed by IDE Screen Recordings from Think-aloud Sessions Data Showcase A: Aiko Yamashita Oslo Metropolitan University, A: Fabio Petrillo Concordia University, A: Foutse Khomh Polytechnique Montréal, A: Yann-Gaël Guéhéneuc Concordia University and Polytechnique Montréal Pre-print | ||
12:18 6mShort-paper | A Multi-level Dataset of Linux Kernel Patchwork Data Showcase DOI Pre-print | ||
12:24 6mShort-paper | Documented Unix Facilities Over 48 Years Data Showcase Link to publication DOI Media Attached |