22 skills found
gosom / Google Maps Scraperscrape data from Google Maps. Extracts data such as the name, address, phone number, website URL, rating, reviews number, latitude and longitude, reviews,email and more for each place
istresearch / Scrapy ClusterThis Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
adar2 / Facebook Posts AutomationFacebook Automation, post automation ,groups social scraping and personal account scheduled and distributed posts , written in python and selenium
get-set-fetch / ScraperNodejs web scraper. Contains a command line, docker container, terraform module and ansible roles for distributed cloud scraping. Supported databases: SQLite, MySQL, PostgreSQL. Supported headless clients: Puppeteer, Playwright, Cheerio, JSdom.
siphonjs / SiphonFirst distributed web scraping library for Node.js
Pyscodes-pro / FSOCIETY DDOSFsociety DDoS GUI adalah alat serangan **DDoS (Distributed Denial of Service)** berbasis **Python + PyQt5** yang dilengkapi dengan fitur **proxy scraping otomatis**, **serangan asinkron**, dan **antarmuka grafis interaktif** untuk mempermudah pengguna dalam mengelola serangan secara efektif.
tenlee2012 / Scrapy Kafka RedisDistributed crawling/scraping, Kafka And Redis based components for Scrapy
ZenRows / Scaling To Distributed CrawlingRepository for the Mastering Web Scraping in Python: Scaling to Distributed Crawling blogpost with the final code.
leoncvlt / Etf4u📊 Python tool to scrape real-time information about ETFs from the web and mixing them together by proportionally distributing their assets allocation
queue-xec / MasterPeer 2 peer distributed/serverless computing solution.Define the problem , the way workers to solve it, code dependencies. Then the most difficult part is to find more workers/peers.. Example usecase : distributed web scraper
jianxunio / YascrapyA high-performance distributed web crawling & scraping framework written with golang and python.
mdxedia / Awsome CashUpdated January 2016 Note on translation: These Website Terms of Service may have been translated into various languages for the convenience of Cash Loophole Users. While the translation is correct to the best of Cash Loophole knowledge, Cash Loophole is not responsible or liable in the event of an inaccuracy. English is the controlling language of these Terms of Service, and any translation has been prepared for you as a courtesy only. In the event of a conflict between the English-language version of these Terms of Service and a version that has been translated into another language, the English-language version shall control. The Cash Loophole Website, is an online information service with downloadable software, provided by Cash Loophole, and is subject to the terms and conditions set forth below. IMPORTANT: These terms and conditions constitute a legal agreement between you, the User (hereafter “You”, “Your”, or the “User”), and us, Cash Loophole, its affiliates, and all of their respective authorized representatives, officers, directors, employees, agents, shareholders, licensors, attorneys, successors, and assigns (hereafter “Us” or “Cash Loophole”), and together with the Website Privacy Policy and the Software License Agreement, wholly and exclusively govern such relationship. BEFORE ACCESSING OR USING THE SERVICES OFFERED ON FIVEMINUTEEXPERIMENT.CO, PLEASE READ CAREFULLY THE FOLLOWING TERMS AND CONDITIONS CONTAINED IN THIS WEBSITE TERMS OF SERVICE AGREEMENT. THESE TERMS GOVERN YOUR ACCESS TO AND USE OF THE SITE AND ANY PROGRAMS, SERVICES, TOOLS, SOFTWARE, MATERIALS, OR OTHER INFORMATION AVAILABLE THROUGH THE SITE OR USED IN CONNECTION THEREWITH (collectively, “the Site”). Cash Loophole IS WILLING TO LICENSE AND ALLOW THE USE OF THIS SITE ONLY ON THE CONDITION THAT YOU ACCEPT AND AGREE TO ALL OF THE TERMS AND CONDITIONS CONTAINED THEREIN. BY USING THE SITE, YOU THEREFORE AGREE TO BE BOUND BY THE TERMS AND CONDITIONS SET FORTH BELOW. IF YOU DO NOT WISH TO BE BOUND BY THESE TERMS AND CONDITIONS, YOU ARE NOT GRANTED PERMISSION TO ACCESS OR OTHERWISE USE THE SITE AND ARE INSTRUCTED TO EXIT THE SITE IMMEDIATELY. Cash Loophole RESERVES THE RIGHT TO MODIFY THIS AGREEMENT AT ANY TIME, WITHOUT NOTICE TO THE USER, AND SUCH MODIFICATIONS SHALL BE EFFECTIVE IMMEDIATELY UPON POSTING OF THE MODIFIED TERMS AND CONDITIONS ON THE SITE. YOU AGREE TO REVIEW THE AGREEMENT PERIODICALLY TO BE AWARE OF SUCH MODIFICATIONS AND YOUR CONTINUED ACCESS OR USE OF THE SITE SHALL BE DEEMED YOUR CONCLUSIVE ACCEPTANCE OF THE MODIFIED AGREEMENT. Revised versions of the Terms and Conditions shall be indicated by the date posted at the top of the Website Terms of Service page (i.e., “Updated [Date]”). PROPRIETARY RIGHTS. All intellectual property of or relating to the Site, including but not limited to content, information, patents, trademarks, copyrights, modules, techniques, know-how, computer code (including html code), algorithms, methods of doing business, user interfaces, graphic design, look and feel, and software; and all developments, derivatives, and improvements thereto, whether registered or not (collectively, “Intellectual Property”), unless otherwise indicated, are owned, controlled and licensed in their entirety by Cash Loophole, its affiliates, its successors and assigns, and/or by third parties who have granted Cash Loophole license to use such Intellectual Property. Publications, products, content or services referenced herein or on the Site are the exclusive trademarks or service-marks of Cash Loophole or their respective owners and are protected by law. Except as expressly provided herein, Cash Loophole does not grant any express or implied right to You or any other person under any intellectual or proprietary rights. Any downloadable or printable software, programs, information or materials available through the Site and all copyrights, trade secrets, and know-how related thereto, unless otherwise indicated, are owned by Cash Loophole or third party licensors. The website name, Cash Loophole, its logo, and all other names, logos and icons identifying the Cash Loophole website and its services are proprietary trademarks of Cash Loophole, and any use of such marks, such as domain names, without the express written permission of Cash Loophole is strictly prohibited. LIMITED LICENSE GRANT. The Site is provided by Cash Loophole, and conditional with the acceptance of this Website Terms of Service Agreement, provides You with a personal, revocable, limited, non-exclusive, royalty-free, non-transferable license to use the Site and download any programs, services, tools, materials, or information made available through or from the Site. Please note that access to download and terms of use of Cash Loophole downloadable software is contingent on acceptance of the separate Software License Agreement. The Website Terms of Service permit you to use and access for personal use only the Cash Loophole Website (a) on a single laptop, workstation, or computer and (b) on a mobile device from the Internet or through an on-line network. You may also download information from the Site into your laptop, workstation or computer’s temporary memory (RAM) and print and download materials and information from the Site solely for your personal non-commercial use, provided that all hard copies contain all copyright and other applicable notices. LICENSE RESTRICTIONS. The foregoing license is limited. YOU MAY NOT MODIFY, COPY, STORE, REPRODUCE, REPUBLISH, UPLOAD, POST, TRANSMIT, LICENSE, SUBLICENSE, DISPLAY, RENT, LEASE, SELL, COMMERCIALLY EXPLOIT, OR DISTRIBUTE, IN ANY MANNER, ANY DATA, INTELLECTUAL PROPERTY OR MATERIAL PROVIDED BY Cash Loophole THROUGH THE SITE, IN ANY MANNER NOT EXPRESSLY PERMITTED BY THESE TERMS OF SERVICE. THE ABOVE RESTRICTION INCLUDES, BUT IS NOT LIMITED TO TEXT, GRAPHICS, CODE AND/OR SOFTWARE. In addition, you may not modify, translate, decompile, create any derivative work(s) of, disassemble, broadcast, publish, remove or alter any proprietary notices or labels, grant a security interest in, or otherwise use the Site in any manner not expressly permitted herein. Moreover, you may not (i) use any “deep link,” “page scrape,” “robot,” “spider” or other automatic device, program, script, algorithm, or methodology, or any similar or equivalent manual process, to access, acquire, copy, or monitor any portion of the Site or in any way reproduce or circumvent the navigational structure or presentation of the Site to obtain or attempt to obtain any materials, documents, or information through any means not purposely made available through the Site, OR (ii) attempt to gain unauthorized access to any portion or feature of the Site, including, without limitation, the account of any other Authorized User(s), any other systems or networks connected to the Site or its servers, to any of the services offered on or through the Site, by hacking, password “mining”, or any other illegitimate or prohibited means, OR (iii) probe, scan or test the vulnerability of the Site or any network connected to the Site, nor breach the security or authentication measures on the Site or any network connected to the Site, OR (iv) reverse look-up, trace, or seek to trace any information on any other Authorized User of or visitor to the Cash Loophole Site, OR (v) take any action that imposes an unreasonable or disproportionately large load on the infrastructure of the Site, the system, networks, or any systems or networks connected thereto, OR (vi) use any device, software, or routine to interfere with the proper working of the Site or transaction conducted on the Site, or with any other person’s use of the Site, OR (vii) forge headers, impersonate a person, or otherwise manipulate identifiers in order to disguise your identity or the origin of any message or transmittal you send to Cash Loophole on or through the Site, OR (viii) use the Site to collect e-mail addresses or other contact or personal information, OR (ix) market, co-brand, private label, appropriate, use the Cash Loophole name, or a name similar thereto on a different domain, separately distribute, resell, or otherwise permit third parties to access and use the Site, in whole or in part, without the express, separate and prior written permission of Cash Loophole, OR (x) use the Site in any other unlawful manner or in a manner that could be perceived to damage, disparage, or otherwise negatively impact Cash Loophole. 4.Moreover, this license is only valid where Cash Loophole is permitted to operate. Access to and use of this site in contravention of any laws or regulations, or where prohibited by law, is unauthorized and not permitted by Cash Loophole. THIRD PARTY INFORMATION/ PRODUCTS/ SERVICES/ LINKS TO OTHER SITES. The Site may contain information, data, links, promotional offers, or other content in any form, including financial information related to third parties. Such information is provided only for Your convenience and as a bonus service, and will not be considered financial advisement. In no case whatsoever shall Cash Loophole be liable for such content or any damages or losses that result from reliance thereon. You understand that, except for information, products or services clearly identified as being supplied by Cash Loophole, Cash Loophole is not affiliated with, is not responsible for, and does not operate, control or endorse any information, products or services offered by third parties that are provided on the Site in any way. Cash Loophole makes no representations whatsoever, nor does it guarantee or endorse, the quality, non-infringement, accuracy, completeness or reliability of such third-party materials, programs, products displayed on this Site or which You may access through a link on this Site. Your correspondence or any other dealings with such third parties found on this Site are solely between you and such third party. Accordingly, Cash Loophole EXPRESSLY DISCLAIMS RESPONSIBILITY FOR THE CONTENT, MATERIALS, ACCURACY, AND/OR QUALITY OF THE INFORMATION, PRODUCTS AND/OR SERVICES AVAILABLE THROUGH OR ADVERTISED ON THESE THIRD-PARTY WEBSITES. DISCLAIMER – NO WARRANTIES. You understand and accept that Cash Loophole cannot and does not guarantee or warrant that files available for downloading through the Site will be free of infection or viruses, worms, Trojan horses or other code that manifest contaminating or destructive properties. You are responsible for implementing sufficient procedures and checkpoints on your personal computer to satisfy your particular requirements for accuracy of data input and output, and for maintaining a means external to the Site for the reconstruction of any lost data.YOU UNDERSTAND AND AGREE TO ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THE SITE. Cash Loophole PROVIDES THE SITE AND RELATED INFORMATION “AS IS” AND DOES NOT MAKE ANY EXPRESS OR IMPLIED WARRANTIES, REPRESENTATIONS OR ENDORSEMENTS WHATSOEVER. Cash Loophole SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF TITLE, MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. WITH REGARD TO THE SITE, THE PERSONAL ACCOUNT REPRESENTATIVE SERVICE, OR ANY INFORMATION OR THIRD-PARTY INFORMATION OR LINKS PROVIDED THEREON, Cash Loophole SHALL NOT BE LIABLE FOR ANY COST OR DAMAGE ARISING EITHER DIRECTLY OR INDIRECTLY FROM ANY SUCH TRANSACTION. IT IS SOLELY YOUR RESPONSIBILITY TO EVALUATE THE ACCURACY, COMPLETENESS AND USEFULNESS OF ALL OPINIONS, ADVICE, SERVICES, MERCHANDISE AND OTHER INFORMATION PROVIDED THROUGH THE SERVICE. Cash Loophole DOES NOT WARRANT THAT THE SERVICE WILL BE UNINTERRUPTED OR ERROR-FREE OR THAT DEFECTS IN THE SERVICE WILL BE CORRECTED. YOU UNDERSTAND FURTHER THAT THE PURE NATURE OF THE INTERNET CONTAINS UNEDITED MATERIALS SOME OF WHICH ARE SEXUALLY EXPLICIT OR MAY BE OFFENSIVE TO YOU. YOUR ACCESS TO SUCH MATERIALS IS AT YOUR OWN RISK. Cash Loophole HAS NO CONTROL OVER AND ACCEPTS NO RESPONSIBILITY WHATSOEVER FOR SUCH MATERIALS. LIMITATION OF LIABILITY. YOU EXPRESSLY ABSOLVE AND RELEASE Cash Loophole FROM ANY CLAIM OF HARM RESULTING FOR A CAUSE BEYOND Cash Loophole CONTROL, INCLUDING BUT NOT LIMITED TO FAILURE OF ELECTRONIC OR MECHANICAL EQUIPMENT OR COMMUNICATION LINES FOR ANY REASON, SUCH AS MAINTENANCE, DENIAL OF SERVICE ATTACKS, TELEPHONE OR OTHER COMMUNICATION PROBLEMS, COMPUTER VIRUSES, UNAUTHORIZED ACCESS, THEFT, OPERATOR ERRORS, FORCE MAJEURE EVENT SUCH AS SEVERE WEATHER, EARTHQUAKES, NATURAL DISASTERS, STRIKES, LABOR PROBLEMS, WARS, OR GOVERNMENTAL RESTRICTION OR ACTION. MOREOVER, IN NO EVENT WILL Cash Loophole BE LIABLE FOR ANY INCIDENTAL, CONSEQUENTIAL, INDIRECT, PUNITIVE, OR SPECIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, LOSS OF PROGRAMS OR INFORMATION, AND THE LIKE) ARISING OUT OF OR IN ANY WAY CONNECTED WITH THE USE OF OR INABILITY TO USE THE SITE’S SERVICE, OR ANY INFORMATION, OR TRANSACTIONS PROVIDED OR DOWNLOADED FROM THE SITE, OR ANY DELAY OF SUCH INFORMATION OR SERVICE, EVEN IF Cash Loophole HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, OR ANY CLAIM ATTRIBUTABLE TO ERRORS, OMISSIONS, OR OTHER INACCURACIES IN THE SITE AND/OR MATERIALS OR INFORMATION DOWNLOADED THROUGH THE SITE. BECAUSE SOME JURISDICTIONS DO NOT ALLOW THE EXCLUSION OR LIMITATION OF LIABILITY FOR CONSEQUENTIAL OR INCIDENTAL DAMAGES, THE ABOVE LIMITATION MAY NOT APPLY TO YOU. NOTWITHSTANDING THE FOREGOING, TOTAL LIABILITY OF Cash Loophole FOR ANY REASON RELATED TO USE OF THE SITE SHALL NOT EXCEED THE TOTAL AMOUNT PAID BY YOU TO Cash Loophole IN CONNECTION WITH THE SUBJECT MATTER OF THE PARTICULAR DISPUTE DURING THE PRIOR THREE MONTHS. INDEMNIFICATION.You agree to indemnify, defend and hold harmless Cash Loophole, its affiliates, and all of their respective officers, directors, employees, agents, licensors, attorneys, successors, and assigns from and against all claims, proceedings, injuries, liabilities, losses, damages, costs, and expenses, including reasonable attorneys’ fees and litigation expenses, relating to or arising from any breach or violation of this Agreement by You (including negligent or reckless conduct). Each of the above referenced individuals or entities reserves the right to assert and enforce these provisions directly against you, on their own behalf. USER OBLIGATIONS. If you provide any false, inaccurate, untrue, or incomplete information, Cash Loophole reserves the right to terminate immediately Your access to and use of the Site and any downloadable software. You agree to abide by all applicable local, state, national, and international laws and regulations with respect to Your use of the Site and its related services. In addition, You acknowledge and agree that use of the Internet and access to or transmissions or communications with the Site is solely at your own risk. While Cash Loophole has endeavored to create a secure and reliable Site, you should understand that the confidentiality of any such communications cannot be guaranteed. Accordingly, Cash Loophole is not responsible for the security, or any breach thereof, of any information transmitted to or from the Site. You agree to assume all responsibility concerning activities related to Your use of the Site, including but not limited to obtaining and paying for all licenses and costs for third-party software and hardware necessary for implementation of the Site and its downloadable software, and maintaining or backing up any data. 10. USER NAME AND PASSWORD POLICY. Registration as an authorized user for access to certain areas of the Site may require both a user name and password. Only one authorized user can use one user name and password and account. Multiple accounts registered by the same individual or entity is not permitted and may result in one, some or all accounts being closed by Cash Loophole. By using the Site, you agree to keep your user name and password as confidential information. You also agree not to use another authorized user’s account. Should you become aware of any loss or theft of your password or any unauthorized use of your name and password, you will immediately notify Cash Loophole. Cash Loophole cannot and will not be liable for any loss or damage arising from your failure to comply with these obligations. Cash Loophole also reserves the right to delete or change (with notice) a user name or password at any time and for any reason. FEEDBACK AND SUBMISSIONS. You grant to Cash Loophole team the right to use your name in connection with any materials freely submitted by You and any other information as well as in connection with all advertising, marketing and promotional material related thereto. You agree that you shall have no recourse against Cash Loophole for any alleged or actual infringement or misappropriation of any proprietary right in your communications with the Site. Registered Site Users will have the opportunity to submit feedback and information regarding their trading activity through the software and through the website, which will be subsequently displayed on the website on an anonymous basis. Such information is submitted on a voluntary basis. Cash Loophole maintains no control over the accuracy or correctness of such self-reporting and accordingly disclaims all liability from User reliance on this data. PRIVACY POLICY. You understand, acknowledge and agree that the operation of certain programs, services, tools, materials, or information of the Site requires the submission, use and dissemination of various personal identifying information. Accordingly, if you wish to access and use those programs, services, tools, materials, or information on the Site, you acknowledge and agree that your use of the Site will constitute acceptance of Cash Loophole personal identifying information collection and use practices to protect your personal information. Please read our Privacy Policy before providing any personal data on this Site. VOID WHERE PROHIBITED. Any offer for any product or service made on this Site is void where prohibited. Moreover, Cash Loophole makes no representations regarding the legality of access to or use of the Site or its content in any country. Although the Site may be accessible worldwide, not all features, products or services provided or offered through or on the Site are appropriate or available for use in all countries. Cash Loophole reserves the right to limit, in its discretion, the provision and quantity of any feature, product or service to any person or geographic area. If You access the Site from a jurisdiction where prohibited, You do so at your own risk and You are solely responsible for complying with all applicable local regulations. People under 18 years of age are not permitted to use the Cash Loophole website. 15. NO ADVICE. You acknowledge that neither the Site or the Personal Account Representative service, is not authorized to offer any legal, tax, accounting advice, or recommendation regarding suitability, profitability, investment strategy or other matter. 17. ENFORCING SITE SECURITY. Actual or attempted unauthorized use of this Site may result in criminal and/or civil prosecution. Cash Loophole reserves the right to view, monitor, and record activity on the Site without notice or permission from the User, including, without limitation, by archiving notices or communications sent by you through the Site. In addition, Cash Loophole reserves the right, at any time and without notice, to modify, suspend, terminate or interrupt operation of or access to the Site, or any portion thereof, in order to protect the Site or Cash Loophole business. NOTICE OF SECURITY BREACH. In addition to the indemnification obligation stated in these Terms of Service, if you become aware of a breach or potential breach of security with respect to any personally identifiable information provided to or made available by Cash Loophole, or any unauthorized hacking of the Site, you shall (i) immediately notify Cash Loophole of such breach or potential breach, (ii) assist Cash Loophole as reasonably necessary to prevent or rectify any such breach, and (iii) enable Cash Loophole to comply with any applicable laws requiring the provision of notice of a security breach with respect to any impacted personally identifiable information. TERM AND TERMINATION. These Terms of Service govern Your right to use the Site will take effect at the moment you access or use the Site and is effective until terminated, as set forth below. This Agreement may be terminated by Cash Loophole without notice, at any time, and for any reason. In addition, Cash Loophole reserves the right at any time and on reasonable grounds, such as any reasonable belief of fraudulent or unlawful activity or actions or omissions that violate any term or condition of these Terms, to deny your access to the Site, in whole or in part, in order to protect its name and goodwill, its business and/or other authorized users, or if you fail to comply with these Terms, subject to the survival rights of certain provisions identified below. Termination is effective without notice. You may also terminate this Agreement at any time by ceasing to use the Site, subject to the survival rights below. Upon termination, You must destroy all copies of any aspect of the Site that you have made and remove downloaded software from Your possession. The following provisions shall survive termination of the Website Terms of Service Agreement for any reason: Proprietary Rights (§1), Limited License Grant (§2), License Restrictions (§3), Third Party Information (§4), Disclaimer (§5), Limitation of Liability (§6), Indemnification (§7), Governing Law (§17), and Miscellaneous (§18). GOVERNING LAW AND DISPUTE RESOLUTION. These Terms of Service and all disputes or claims arising out of or related thereto shall be governed by the laws of Cyprus, without applying conflict of law rules. Any cause of action or claim arising out of use of the Site must be commenced within one (1) year after the claim or cause of action arises, or such claim or cause of action is barred. Claimant and Cash Loophole waive their rights to a jury trial and participation in class action litigation. All disputes arising out of or relating to these Terms of Service shall be resolved by binding arbitration, except that Cash Loophole is not required to arbitrate any dispute regarding confidentiality, infringement, misappropriation, or misuse of any intellectual property right, or any other claim where interim relief from a court is sought to prevent serious and irreparable injury to Cash Loophole or any other person or entity. You acknowledge that any breach, threatened or actual, could cause irreparable injury to Cash Loophole that is not quantifiable in monetary damages. You agree that Cash Loophole shall be entitled to seek and be awarded an injunction or other appropriate equitable relief to restrain any breach of Your obligations under these Terms. Accordingly, you waive any requirement that Cash Loophole post any bond or other security in the event that any injunctive or equitable relief is sought by or awarded to Cash Loophole to enforce any provision of these Terms. MISCELLANEOUS. You agree that these Terms are for the benefit of the User, Cash Loophole, and Cash Loophole licensors. Therefore, these Terms are personal to You and not assignable. No joint venture, partnership, employment, or agency relationship exists between You and Cash Loophole as a result of these Terms of Service or arising out of your use of the Site. Cash Loophole failure to insist upon or enforce strict performance of any provision of this Agreement shall not be construed as a waiver of any provision or right under these Terms or at law. Neither the course of conduct between the parties nor trade practice shall act to modify any provision of this Agreement. Cash Loophole may assign its rights and duties under this Agreement to any party and at any time, without notice to the User. Headings herein are for convenience only. These Terms of Service, along with Cash Loophole Website Privacy Policy and the Software License Agreement, represent the entire agreement between You and Cash Loophole with respect to use of the Site, and supersedes all prior or contemporaneous communications and proposals, whether electronic, oral, or written between You and Cash Loophole. SEVERABILITY. If any provision of these Terms of Service is ruled invalid or otherwise unenforceable by a court of competent jurisdiction or on account of a conflict with an applicable government regulation, such determination shall not affect the remaining provisions (or parts thereof) contained herein. Any invalid or unenforceable portion should be deemed amended in order to achieve as closely as possible the same effect as the Terms of Service as original drafted. Cash Loophole © 2016 All rights reserved.
SarthakV7 / Clustering Barron S 333 Word List Using Unsupervised Machine LearningCovering Natural Language Processing (NLP), Term Frequency-Inverse Document Frequency (TF-IDF), Singular Value Decomposition (SVD), K-Means, t-Distributed Stochastic Neighbor Embedding (t-SNE) and many other techniques for data scraping, feature engineering and data visualization to demonstrate how we can cluster data from scratch.
codeuniversity / Smag MvpSocial Record - Distributed scraping and analysis pipeline for a range of social media platforms
Jai-Agarwal-04 / Sentiment Analysis With InsightsSentiment Analysis with Insights using NLP and Dash This project show the sentiment analysis of text data using NLP and Dash. I used Amazon reviews dataset to train the model and further scrap the reviews from Etsy.com in order to test my model. Prerequisites: Python3 Amazon Dataset (3.6GB) Anaconda How this project was made? This project has been built using Python3 to help predict the sentiments with the help of Machine Learning and an interactive dashboard to test reviews. To start, I downloaded the dataset and extracted the JSON file. Next, I took out a portion of 7,92,000 reviews equally distributed into chunks of 24000 reviews using pandas. The chunks were then combined into a single CSV file called balanced_reviews.csv. This balanced_reviews.csv served as the base for training my model which was filtered on the basis of review greater than 3 and less than 3. Further, this filtered data was vectorized using TF_IDF vectorizer. After training the model to a 90% accuracy, the reviews were scrapped from Etsy.com in order to test our model. Finally, I built a dashboard in which we can check the sentiments based on input given by the user or can check the sentiments of reviews scrapped from the website. What is CountVectorizer? CountVectorizer is a great tool provided by the scikit-learn library in Python. It is used to transform a given text into a vector on the basis of the frequency (count) of each word that occurs in the entire text. This is helpful when we have multiple such texts, and we wish to convert each word in each text into vectors (for using in further text analysis). CountVectorizer creates a matrix in which each unique word is represented by a column of the matrix, and each text sample from the document is a row in the matrix. The value of each cell is nothing but the count of the word in that particular text sample. What is TF-IDF Vectorizer? TF-IDF stands for Term Frequency - Inverse Document Frequency and is a statistic that aims to better define how important a word is for a document, while also taking into account the relation to other documents from the same corpus. This is performed by looking at how many times a word appears into a document while also paying attention to how many times the same word appears in other documents in the corpus. The rationale behind this is the following: a word that frequently appears in a document has more relevancy for that document, meaning that there is higher probability that the document is about or in relation to that specific word a word that frequently appears in more documents may prevent us from finding the right document in a collection; the word is relevant either for all documents or for none. Either way, it will not help us filter out a single document or a small subset of documents from the whole set. So then TF-IDF is a score which is applied to every word in every document in our dataset. And for every word, the TF-IDF value increases with every appearance of the word in a document, but is gradually decreased with every appearance in other documents. What is Plotly Dash? Dash is a productive Python framework for building web analytic applications. Written on top of Flask, Plotly.js, and React.js, Dash is ideal for building data visualization apps with highly custom user interfaces in pure Python. It's particularly suited for anyone who works with data in Python. Dash apps are rendered in the web browser. You can deploy your apps to servers and then share them through URLs. Since Dash apps are viewed in the web browser, Dash is inherently cross-platform and mobile ready. Dash is an open source library, released under the permissive MIT license. Plotly develops Dash and offers a platform for managing Dash apps in an enterprise environment. What is Web Scrapping? Web scraping is a term used to describe the use of a program or algorithm to extract and process large amounts of data from the web. Running the project Step 1: Download the dataset and extract the JSON data in your project folder. Make a folder filtered_chunks and run the data_extraction.py file. This will extract data from the JSON file into equal sized chunks and then combine them into a single CSV file called balanced_reviews.csv. Step 2: Run the data_cleaning_preprocessing_and_vectorizing.py file. This will clean and filter out the data. Next the filtered data will be fed to the TF-IDF Vectorizer and then the model will be pickled in a trained_model.pkl file and the Vocabulary of the trained model will be stored as vocab.pkl. Keep these two files in a folder named model_files. Step 3: Now run the etsy_review_scrapper.py file. Adjust the range of pages and product to be scrapped as it might take a long long time to process. A small sized data is sufficient to check the accuracy of our model. The scrapped data will be stored in csv as well as db file. Step 4: Finally, run the app.py file that will start up the Dash server and we can check the working of our model either by typing or either by selecting the preloaded scrapped reviews.
lewapek / Observability Demo Apps(4 scala apps + postgres + jaeger + prometheus + kafka) used to demonstrate distributed tracing with OpenTelemetry with some metrics scraped by Prometheus server
crawlerlab / AragogDistributed web scraping framework
Datakult0r / AI Grant Crawler A2a24/7 autonomous grant discovery system with MCP web scraping, A2A distributed agents, and Google Sheets automation for AI agencies
chipscoco / OceanMonkeyOceanMonkey is a High-Level Distributed Web Crawling and Web Scraping framework base on multi-process and multi-coroutines, used to crawl websites and extract structured data from their pages like the classical scrapy framework.
BlakeASmith / DistributedWebScrapingA distributed web scraping framework written in Kotlin and Golang supporting use of linux machines and android devices as nodes.