Abstract:
It is clear that code reuse is important task in software development and maintenance. As a lot of software application and source code have been used as libraries in version control systems, such that Git, SVN, LibreSource and related web sites, such that GitHub.com, sourceforge.net, projectsgeek.com, Googlecode.com, more and more companies, especially Small and Medium Enterprises (SMEs), are reusing open source code to develop their own software. The problem in code reuse is, after download all relevant code, we need to identify most relevant code among pool of code. In this paper we use keyword search with n-gram NLP technique using GitHub Application Program Interface (API). Before search the source code, we retrieve all Repository name in GitHub belongs to particular programing language (JAVA, C++, etc.), as well as we retrieve all .java file name if we search java libraries using GitHub API. Then compare our keyword with this list, if the keyword extracted from Software architecture is connected word, then we will split using Apache Camel Splitter. If the particular keyword related to any project, we download the project. Otherwise using WordNet, get some synonym and do the above process again. For further relevancy, we will use a speech recognition technique (Dynamic Time Warping (DTW)) and a NLP technique (Part of Speech Tagging (POS)). Because of this is a part of the whole research, in this paper we will consider only GitHub API.