jarPhys

What is jarPhys?

jarPhys is a PDF smart-search app, that's designed to rapidly run quick token matching based searches on a set of images. jarPhys uses OCR to tag images, and search through the text in them. As of now jarPhys works amazingly with plain text, so-so with handwriting, and extremely poorly with mathematical text.

Far from complete, jarPhys is still a small but fully functional project. As of now, jarPhys allows real-time search of pre-cached PDFs. Currently, jarPhys inputs documents as images, so the best solution is to convert PDFs to PNG images using a tool like PDF2PNG.

The real advantage jarPhys presents is avoiding searching through multiple (or very large) PDFs using CTRL-F (or however the "find" feature works in your browser :P). Since jarPhys does token matching, you also don't have to worry about not finding an exact match, similar to the Google/Bing search tools (lol). For the time being, the search interface is hosted on your device's command prompt and triggered through a command-line, while the results for now are presented using an HTML where the matched pages from PDFs are displayed.

For me personally, a tool like this is helpful during exams/quizzes/tests or when trying to lookup material from a vague memory. A great advantage is being able to search for questions without worrying about not finding "exact" matches. If you're worried about being penalized for using jarPhys (or similar programs), don't be! You're allowed to! Check this section for further details.

At some point, I'll make a complete browser-run downloadable app, but for now this is it. The name itself is a coinage from the combination Tony Stark's Jarvice bot, and Physics (the sad hell in which we are united). Quick links :

Installing and using jarPhys-search

Requirements

*If you have Python installed and have used it a few times before, or did 5CCP211C - Intro. to Numerical Modelling, and haven't deleted the installation, you can skip reading this part.

Python 3.8 (or later) installed. You can download it from python.org
At least 3GB of available storage on disk (~1GB for JarPhys ~2GB for Python files).
An active internet connection during installation.

The Stand-Alone Installer

The simplest way to install the application quickly is to run the stand-alone installer.

Download the installer: jarPhys-simple-installer.py.

You can do this by clicking on the embedded link above (or click here), or downloading the latest release from Releases section.
Run jarPhys-simple-installer.py; Do this by running

python jarPhys-simple-installer.py

on your terminal in the same directory. ( Don't worry, the file's are installed within a nested folder :) ). This should launch a command line installer right away. Follow the installation instructions.
Et voila! Enjoy :D
Optional (recommended) : Buy me coffee!

Download the source (and scanning your own documents)

Downloading the source allows you to access all the documentation and code. You can rebuild the code for yourself, and try to run the complete application on your own. installer.py is useful for downloading and extracting the complete database, and set it up correctly for developers... so use it ;P.

Download the entire project as a ZIP File from the green download button (link here as well). Extract the project to the desired location and follow the install instructions in Install_Instruction.txt. The entire project is also mirrored on Google Drive (link here).
Now you can search the pre-built database (all lecture notes/slides/past papers were provided from KEATs). If you have any more similar resources (not very long books, but rather presentations, cheat-sheets, or past question papers), I'd be more than happy to add them, or guide you on how you can add to the database on your own.
If you'd like to import your own documents, and create your own database:

Install Tesseract OCR. You can look at their repository tesseract-ocr/tesseract if you want the source. The documentation is available at tesseract-ocr/tessdoc. If you're running Windows, you can find an installer here - UB-Mannheim/tesseract^. If you're running Linux or MacOS, try following this guide by PyImageSearch.
Install OpenCV for Python from : by running pip install opencv-python.

The installer otherwise does other libraries required to run jarPhys. But you may want to check requirements.txt
If you're running windows, and did not install Tesseract-OCR in C:/Program Files/Tesseract-OCR/tesseract.exe, change Line 14 in buildDatabase.py in your code to point to where you installed Tesseract-OCR.
Et viola! Enjoy!

And please, do give us your feedback wherever possible! It's highly appreciated. Please support the project by buying me coffee! That's appreciated as well! :)

Running the application (hands-on guide)

Running the jarPhys installer

Navigate to the folder where you downloaded the installer.
Run the command

python jarPhys-simple-installer.py

Example :
This should launch a command line installer with fairly simple instructions you can follow :D

Running jarPhys search

Navigate to the folder where you have installed jarPhys.
Run the command

python jarPhys-search.py

Example :
The results will be shown as an HTML file. There are two result sets - one based on direct matching, and the other based on frequency of matches, stored in results.html and results.html repsectively.

On Windows your HTML will be launched immediately on your default browser, while on MacOS/Linux (posix) you may have to launch results.html manually.

Example :

Contributing resources to the public database:

Upload your PDF file(s) to PDF2PNG (pdf2png.com). Sadly, there's a limit of 20 files per run, but you can open it in multiple tabs :D
Once all the files are done converting, click on 'Download All', which gives you a ZIP (.zip) file.
Upload the ZIP file to the 'Resources_to_be_added' Google Drive folder: jarPhys/pdfs/Resources_to_be_added in the relevant subject folder. Please DO NOT delete/modify any exisiting files here.
The files will be processed and the latest version of the database will be distributed with the next release (~5-6 times a week as of now).

If you have trouble converting your files to a ZIP file, simply upload your PDFs to the relevant folders, and they'll be processed later. (This will take some more time as compared to uploading the ZIP files.)

Reporting and getting help with issues

You can use either of the following methods to report and get help with an issue. In both cases, we will get back to you as soon as possible:

Any issues can be reported using the Issue/Feedback Form.
If you use Github, the issue can be submitted directly to the repository's issues tab

Supporting and Contributing to the repository

You're welcome to create branches, forks, and issues wherever to fix/upgrade/add any relevant code/features/functionality. I will obviously consider pull-requests which improve the overall app. However, the main branch will remain solely under my control as, at the end of the day it is my repository.
And of course:

<a href="https://www.buymeacoffee.com/pt420"><img src="https://img.buymeacoffee.com/button-api/?text=Buy me a coffee&emoji=&slug=pt420&button_colour=BD5FFF&font_colour=ffffff&font_family=Lato&outline_colour=000000&coffee_colour=FFDD00"></a>

Am I allowed to use jarPhys for open book exa

JarPhys

Install / Use

README