Reminiscence

Self-hosted Bookmark and Archive manager

Features
Installation
- Normal Method
- Using Docker
Documentation
Future Roadmap
Motivation

Features

Bookmark links and edit its metadata (like title, tags, summary) via web-interface.
Archive links content in HTML, PDF or full-page PNG format.
Automatic archival of links to non-html content like pdf, jpg, txt etc..

i.e. Bookmarking links to pdf, jpg etc.. via web-interface will automatically save those files on server.
Supports archival of media elements of a web-page using third party download managers.
Directory based categorization of bookmarks
Automatic tagging of HTML links.
Automatic summarization of HTML content.
Special readability mode.
Search bookmarks according to url, title, tags or summary.
Supports multiple user accounts.
Supports public and group directory for every user.
Upload any file from web-interface for archiving.
Easy to use admin interface for managing multiple users.
Import bookmarks from Netscape Bookmark HTML file format.
Supports streaming of archived media elements.
Annotation support for both HTML, its readable version.
Annotation support for both archived and uploaded pdf/epub files.
Remembers last read position of html (and its readable version), pdf and epub.
Rudimentary support for adding custom note.

Installation

First make sure that python 3.9+ (recommended version is 3.10+) is installed on system and install following packages using native package manager.

 1. virtualenv

 2. ~wkhtmltopdf (for html to pdf/png conversion)~ deprecated from v4.0+ due to security vulnerability.

     * [hlspy](https://github.com/kanishka-linux/hlspy) is now default headless browser which is based on QTWebEngine.

 3. hlspy (mandatory from v4.0+)

 4. redis-server

 5. chromium (optional from v0.2+)

 6. PyQt5

 7. PyQtWebEngine

Installation of above dependencies in Arch or Arch based distros

 $ sudo pacman -S python-virtualenv redis chromium python-pyqt5 qt5-webengine python-pyqtwebengine

Installation of above dependencies in Debian or Ubuntu based distros

 $ sudo apt install virtualenv redis-server chromium-browser python3-pyqt5 python3-pyqt5.qtwebengine

Install hlspy

 $ sudo pip3 install git+https://github.com/kanishka-linux/hlspy

Note: Name of above dependencies may change depending on distro or OS, so install accordingly. Once above dependencies are installed, execute following commands, which are distro/platform independent.

Now execute following commands in terminal.

$ mkdir reminiscence

$ cd reminiscence

$ virtualenv -p python3 venv

$ python3 -m venv venv (for python3.10+)

$ source venv/bin/activate

$ cd venv

$ git clone https://github.com/kanishka-linux/reminiscence.git

$ cd reminiscence

$ source hlspy.env

$ pip install -r requirements.txt

$ mkdir logs archive tmp

$ python manage.py generatesecretkey

$ python manage.py nltkdownload

$ python manage.py migrate

$ python manage.py createsuperuser

$ python manage.py runserver 0.0.0.0:8000

open 0.0.0.0:8000 using any browser, login and start adding links

**Note:** replace localhost address with local ip address of your server
        
          to access web-interface from anywhere on the local network

Admin interface available at: /admin/

Setting up Celery (mandatory from v0.4 onwards):

Generating PDFs and PNGs are resource intesive and time consuming. We can delegate these tasks to celery, in order to execute them in the background.
```
 Edit reminiscence/settings.py file and set `USE_CELERY = True`
```

Now open another terminal in the same topmost project directory and execute following commands:

 $ sudo systemctl start redis-server

 $ cd venv

 $ source bin/activate

 $ cd venv/reminiscence

 $ source hlspy.env

 $ celery -A reminiscence worker --loglevel=info -c 4 --detach

Using Docker

Note: Following procedure may not work exactly from v4.0+. The dockerfiles have been updated but it is possible that users may still face some issues, so they are advised to make changes in respective Dockerfile or docker-compose as required.

Using docker is convenient compared to normal installation method described above. It will take care of configuration and setting up of gunicorn, nginx and also postgresql database along with redis and worker. (Setting and running up these three things can be a bit cumbersome, if done manually, which is described below in separate section.) It will also automatically download headless browser hlspy and nltk data set, apart from installing python based dependencies.

Note: from v4.0+, wkhtmltopdf is replaced with hlspy. Users are advised to migrate to v4.0 due to security vulnerability in wkhtmltopdf. If users are finding it difficult to migrate then they should atleast disable automatic pdf/png generation of a web-page for older reminiscence version and use chromium instead manually for pdf generation.

Install docker and docker-compose
Enable/start docker service. Instructions for enabling docker might be different in different distros. Sample instruction for enabling/starting docker will look like
```
 $ systemctl enable/start docker.service
```

clone github repository and enter directory

 $ git clone https://github.com/kanishka-linux/reminiscence.git

 $ cd reminiscence

build and start

 $ sudo docker-compose up --build

 Note: Above instruction will take some time when executed for the first time.

Above step will also create default user: 'admin' with default password: 'changepassword'

If IP address of server is '192.168.1.2' then admin interface will be available at

 192.168.1.2/admin/

 Note: In this method, there is no need to
       attach port number to IP address.

Change default admin password from admin interface and create new regular user. After that logout, and open '192.168.1.2'. Now login with regular user for regular activity.
For custom configuration, modify nginx.conf and dockerfiles available in the repository. After that execute step 4 again.

Note: If Windows users are facing problem in mounting data volume for Postgres, they are advised to refer this issue.

Note: Ubuntu 16.04 users might have to modify docker-compose.yml file and need to change version 3 to 2. issue

Note: For setting celery inside docker follow these instruction. Sometimes gunicorn doesn't work properly with default background task handler inside docker. In such cases users can enable celery.

Documentation

Adding Directories And Links

Creating Directory

Users first have to create directory from web interface.

Note: Currently '/' and few other special characters are not allowed as characters in directory name. If users are facing problem when accessing directory, then they are advised to rename directory and remove special characters.
Adding Links

Users have to navigate to required directory and then need to add links to it. URLs are fetched asynchronously from the source for gathering metadata initially. Users have to wait for few seconds, after that page will refresh automatically showing new content. It may happen, nothing would show up after automatic page refresh (e.g. due to slow URL fetching) then try refreshing page manually by clicking on directory entry again. Maybe in future, I will have to look into django channels and websockets to enable real-time duplex communication between client and server.

Automatic Tagging and Summarization

This feature has been implemented using NLTK library. The library has been used for proper tokenization and removing stopwords from sentence. Once stopwords are removed, top K high frequency words (where value of K is decided by user) are used as tags. In order to generate summary of HTML content, score is alloted to a sentence based on frequency of non-stopwords contained in it. After that highests s

Reminiscence

Install / Use

README

Reminiscence

Table of Contents

Features

Installation

Now execute following commands in terminal.

Setting up Celery (mandatory from v0.4 onwards):

Using Docker

Documentation

Adding Directories And Links

Automatic Tagging and Summarization