OpenKP
Automatically extracting keyphrases that are salient to the document meanings is an essential step to semantic document understanding. An effective keyphrase extraction (KPE) system can benefit a wide range of natural language processing and information retrieval tasks. Recent neural methods formulate the task as a document-to-keyphrase sequence-to-sequence task. These seq2seq learning models have shown promising results compared to previous KPE systems The recent progress in neural KPE is mostly observed in documents originating from the scientific domain. In real-world scenarios, most potential applications of KPE deal with diverse documents originating from sparse sources. These documents are unlikely to include the structure, prose and be as well written as scientific papers. They often include a much diverse document structure and reside in various domains whose contents target much wider audiences than scientists. To encourage the research community to develop a powerful neural model with key phrase extraction on open domains we have created OpenKP: a dataset of over 150,000 documents with the most relevant keyphrases generated by expert annotation.
Install / Use
/learn @microsoft/OpenKPREADME
OpenKP
Automatically extracting keyphrases that are salient to the document meanings is an essential step in semantic document understanding. To facilitate this research area we have created OpenKeyPhrase(OpenKP), a large scale, open domain keyphrase extraction dataset. The dataset features 148,124 real world web documents along with a human annotation indicating the 1-3 most relevant keyphrases. More information about the dataset and our initial experiments can be found in the paper Open Domain Web Keyphrase Extraction Beyond Language Modeling which will be an oral presentation at EMNLP-IJCNLP 2019. It is part of the MSMARCO dataset family and research projects like this power the core document understanding pipeline that Bing uses.
Key Phrase extraction
Keyphrase extraction is a language problem represented as: There is a document D in which there are 1-n key phrases which can be used to understand what the document is about, find other relevant documents, and improve many downstream NLP problems. In OpenKP we have formalized this problem to focus on the general web domain. The corpus consists of websites which were human annotated for their most relevant key phrases. Its worth noting that during the expert annotation, judges only copied the relevant text from the document and thus there is no language generation required.
Corpus Generation
To generate the corpus we sample ~100,000 urls from the Bing Index to get a representative sample of true domain diversity. Additionally, we sampled ~40,000 urls from the MSMARCO QA corpus since it can be considered a representative sample of open domain web document search. Once the urls are selected they are provided to an expert judge who visits the website, explores its content and when they are done annotates 1-3 keyphrases in the document they believe to be most salient to the overall document. This expert judge pool was trained specifically for this task and they received regular quality checks and feedback to ensure there was a consistent understand of what a documents relevant keyphrases may be. Once they judges annotated a website, the HTML was downloaded and parsed and prepared into our CleanBody pipeline. The cleanbody pipeline produces a text representation(without any menu's, ads, images, etc) and then a visual representation of the document, more information and specifics can be found below.
Examples
{
'url': 'http://1000projects.org/online-doctor-appointment-system-java-project.html',
'text': 'April 30 2018 by nikhith P Online Doctor Appointment System Java Project Project Title Secure Web Application for Online Doctor Appointment System Online Doctor appointment is a smart web application this provides a registration and login for both doctors and patients Doctors can register by giving his necessary details like timings fee category etc After successful registration the doctor can log in by giving username and password The doctor can view the booking request by patients and if he accepts the patient requests the status will be shown as booking confirmed to the patient He can also view the feedback given by the patient The patients must be registered and log in to book a doctor basing the category and the type The Application has following modules Admin Doctor Patient Admin Admin needs to login with username and password and in the admin home screen he can see the basic functionalities of admin Admin can view the registered doctors and patients He can also view the patients request and doctors requests and he will confirm the patients and doctors requests Doctor Doctor need to be registered by giving the necessary details like experience timing fees etc After registering he need to log in and in the home screen he can view the basic functionalities He can view the patient request forwarded from admin and he can accept and he can also view the feedback given by patients Patient The patient needs to be registered and log in after logging on he can search for the doctor by giving the location the reason or problem Basing on the doctor availability the admin will confirm the booking request and will send to mail that the booking is confirmed he can also view in the status and he can also give feedback basing the performance of the doctor Existing System In the existing system the patient needs to visit the doctor for booking we need to wait and the booking will be done manually so to maintain everything is always a problem Proposed System In the proposed system the doctors patients are brought to one platform will allow patients to be more flexible they can register and search for the doctors basing on the location the list of doctors will be shown and patient can book by selecting the time slots and the admin will confirm the booking so everything is computerized an done very fast which will save time Software Requirements NetBeans74JDK 17MySQL 55SQL Yog HTML JavaScript and CSS Screens Home Page This screen shows the basic view of the application home page and the list of modules Admin Login Page In this page admin can log in by giving username and password Admin Home Page After successful login the application shows the admin home page in which the basic functionalities are shown View doctors Page In this page admin can view the list of doctors registered View patients Page In this admin can view the list of patients registered Patients request Page In this page admin can view the requests sent by the user for booking a doctor View doctors request Page In this page the request from a doctor is shown and admin will send the confirmation to the user that the booking is confirmed Doctor registration Page In this page the can register into the application by providing all the necessary details like experience fee timings etc Doctor login Page In this page the doctor can login by giving the username and password Doctor home Page After the successful login the doctor home page shows basic functionalities View request Page In this page the doctor can view the patient requests which are forwarded by the admin and he responds to the request View feedback Page In this page the doctor can view the patients feedback Patient registration Page In this page the patient can register into the application by providing all necessary details Patient login Page In this page patient can log in by giving username and password Patient home Page After successful login the application shows the patient homepage with basic functionalities Search Results Page In this page patient can search the doctor by giving the category reason location by selecting on the map In this page after giving the details for searching the doctor the search results will be shown like as in above screen In this page patient can view the status of his booking whether the booking is confirmed or not Feedback Page In this page patient can give the feedback for the doctor based on his performance 201718 Java Projects CSE Projects Java Abstracts Java Based Projects MySQL Projects Previous Venue Booking System Java Project Next Campus Recruitment System Java Project',
'VDOM': '[{"Id":0,"text":"April 30 2018","feature":[48.0,115.0,97.0,14.0,0.0,0.0,0.0,0.0,11.0,0.0,48.0,619.0,96.0,18.0,1.0,0.0,0.0,0.0,11.0,0.0],"start_idx":0,"end_idx":3},{"Id":0,"text":"by","feature":[162.0,105.0,97.0,14.0,0.0,0.0,0.0,0.0,11.0,0.0,48.0,619.0,96.0,18.0,1.0,0.0,0.0,0.0,11.0,0.0],"start_idx":3,"end_idx":4},{"Id":0,"text":"nikhith P","feature":[190.0,77.0,97.0,14.0,0.0,0.0,0.0,0.0,11.0,0.0,48.0,619.0,96.0,18.0,1.0,0.0,0.0,0.0,11.0,0.0],"start_idx":4,"end_idx":6},{"Id":0,"text":"Online Doctor Appointment System Java Project","feature":[48.0,619.0,114.0,36.0,1.0,0.0,1.0,0.0,26.0,0.0,48.0,619.0,114.0,36.0,1.0,0.0,1.0,0.0,26.0,0.0],"start_idx":6,"end_idx":12},{"Id":0,"text":"Project Title","feature":[48.0,96.0,174.0,19.0,0.0,0.0,0.0,0.0,16.0,1.0,48.0,619.0,172.0,24.0,1.0,0.0,0.0,0.0,16.0,1.0],"start_idx":12,"end_idx":14},{"Id":0,"text":"Secure Web Application for Online Doctor Appointment System","feature":[48.0,619.0,172.0,24.0,1.0,0.0,0.0,0.0,16.0,1.0,48.0,619.0,172.0,24.0,1.0,0.0,0.0,0.0,16.0,1.0],"start_idx":14,"end_idx":22},{"Id":0,"text":"Online Doctor appointment is a smart web application this provides a registration and login for both doctors and patients Doctors can register by giving his necessary details like timings fee category etc After successful registration the doctor can log in by giving username and password The doctor can view the booking request by patients and if he accepts the patient requests the status will be shown as booking confirmed to the patient He can also view the feedback given by the patient The patients must be registered and log in to book a doctor basing the category and the type","feature":[48.0,619.0,272.0,336.0,1.0,0.0,0.0,0.0,16.0,0.0,48.0,619.0,272.0,336.0,1.0,0.0,0.0,0.0,16.0,0.0],"start_idx":22,"end_idx":122},{"Id":0,"text":"The Application has following modules","feature":[48.0,352.0,634.0,23.0,0.0,0.0,0.0,0.0,19.0,1.0,48.0,619.0,632.0,28.0,1.0,0.0,0.0,0.0,19.0,1.0],"start_idx":122,"end_idx":127},{"Id":0,"text":"Admin","feature":[48.0,619.0,684.0,24.0,1.0,0.0,0.0,0.0,16.0,0.0,48.0,619.0,684.0,24.0,1.0,0.0,0.0,0.0,16.0,0.0],"start_idx":127,"end_idx":128},{"Id":0,"text":"Doctor","feature":[48.0,619.0,708.0,24.0,1.0,0.0,0.0,0.0,16.0,0.0,48.0,619.0,708.0,24.0,1.0,0.0,0.0,0.0,16.0,0.0],"start_idx":128,"end_idx":129},{"Id":0,"text":"Patient","feature":[48.0,619.0,732.0,24.0,1.0,0.0,0.0,0.0,16.0,0.0,48.0,619.0,732.0,24.0,1.0,0.0,0.0,0.0,16.0,0.0],"start_idx":129,"end_idx":130},{"Id":0,"text":"Admin","feature":[48.0,56.0,782.0,19.0,0.0,0.0,0.0,0.0,16.0,1.0,48.0,619.0,780.0,24.0,1.0,0.0,0.0,0.0,16.0,1.0],"start_idx":130,"end_idx":131},{"Id":0,"text":"Admi
