41 skills found · Page 2 of 2
stoianmihail / CHTImplementation of the compact "Hist-Tree", Andrew Crotty || Used in PLEX.
OrysyaStus / UCSD Data Mining CertificateModern databases can contain massive volumes of data. Within this data lies important information that can only be effectively analyzed using data mining. Data mining tools and techniques can be used to predict future trends and behaviors, allowing individuals and organizations to make proactive, knowledge-driven decisions. This expanded Data Mining for Advanced Analytics certificate provides individuals with the skills necessary to design, build, verify, and test predictive data models. Newly updated with added data sets, a robust practicum course, a survey of popular data mining tools, and additional algorithms, this program equips students with the skills to make data-driven decisions in any industry. Students begin by learning foundational data analysis and machine learning techniques for model and knowledge creation. Then students take a deep-dive into the crucial step of cleaning, filtering, and preparing the data for mining and predictive or descriptive modeling. Building upon the skills learned in the previous courses, students will then learn advanced models, machine learning algorithms, methods, and applications. In the practicum course, students will use real-life data sets from various industries to complete data mining projects, planning and executing all the steps of data preparation, analysis, learning and modeling, and identifying the predictive/descriptive model that produces the best evaluation scores. Electives allow students to learn further high-demand techniques, tools, and languages.
meghdadk / DB UnlearningAn implementation of the SIGMOD24 paper: Machine Unlearning in Learned DBs: An Experimental Analysis
Reena-senthilkumar / MYSQL💡 I Just Started My MySQL Learning Journey! I’ve recently started learning MySQL, a powerful relational database management system. I'm exploring how to create and manage databases, write SQL queries, and understand how data is stored, retrieved, and organized. So far, I've learned about SQL commands like CREATE, SELECT, INSERT, UPDATE,and DELETE.
var-skip / Var SkipCode for variable skipping ICML 2020 paper
jasonclark / Youtube Digital LibraryMSU Library has created a digital video library using the Youtube API to power our local library channel. It is a complete search and browse app with item level views, microdata, a caching and optimization routine, and a file backup routine. The article will discuss applying the YouTube API as a database application layer, workflow efficiencies gained, metadata procedures as well as local backup and optimization procedures. Code samples in PHP, .htaccess examples, and shell commands used in developing the app and routines will be explained at length. And finally, a complete prototype app will be released on github for other libraries to get started using the lessons learned. A live version of the app is here: http://www.lib.montana.edu/channel/. The real benefit of this method is the low overhead for smaller shops and the ability to scale production and distribution of digital video.
ShahadShaikh / Hive Case StudyProblem Statement Introduction So far, in this course, you have learned about the Hadoop Framework, RDBMS design, and Hive Querying. You have understood how to work with an EMR cluster and write optimised queries on Hive. This assignment aims at testing your skills in Hive, and Hadoop concepts learned throughout this course. Similar to Big Data Analysts, you will be required to extract the data, load them into Hive tables, and gather insights from the dataset. Problem Statement With online sales gaining popularity, tech companies are exploring ways to improve their sales by analysing customer behaviour and gaining insights about product trends. Furthermore, the websites make it easier for customers to find the products they require without much scavenging. Needless to say, the role of big data analysts is among the most sought-after job profiles of this decade. Therefore, as part of this assignment, we will be challenging you, as a big data analyst, to extract data and gather insights from a real-life data set of an e-commerce company. In the next video, you will learn the various stages in collecting and processing the e-commerce website data. Play Video2079378 One of the most popular use cases of Big Data is in eCommerce companies such as Amazon or Flipkart. So before we get into the details of the dataset, let us understand how eCommerce companies make use of these concepts to give customers product recommendations. This is done by tracking your clicks on their website and searching for patterns within them. This kind of data is called a clickstream data. Let us understand how it works in detail. The clickstream data contains all the logs as to how you navigated through the website. It also contains other details such as time spent on every page, etc. From this, they make use of data ingesting frameworks such as Apache Kafka or AWS Kinesis in order to store it in frameworks such as Hadoop. From there, machine learning engineers or business analysts use this data to derive valuable insights. In the next video, Kautuk will give you a brief idea on the data that is used in this case study and the kind of analysis you can perform with the same. Play Video2079378 For this assignment, you will be working with a public clickstream dataset of a cosmetics store. Using this dataset, your job is to extract valuable insights which generally data engineers come up within an e-retail company. So now, let us understand the dataset in detail in the next video. Play Video2079378 You will find the data in the link given below. https://e-commerce-events-ml.s3.amazonaws.com/2019-Oct.csv https://e-commerce-events-ml.s3.amazonaws.com/2019-Nov.csv You can find the description of the attributes in the dataset given below. In the next video, you will learn about the various implementation stages involved in this case study. Attribute Description Download Play Video2079378 The implementation phase can be divided into the following parts: Copying the data set into the HDFS: Launch an EMR cluster that utilizes the Hive services, and Move the data from the S3 bucket into the HDFS Creating the database and launching Hive queries on your EMR cluster: Create the structure of your database, Use optimized techniques to run your queries as efficiently as possible Show the improvement of the performance after using optimization on any single query. Run Hive queries to answer the questions given below. Cleaning up Drop your database, and Terminate your cluster You are required to provide answers to the questions given below. Find the total revenue generated due to purchases made in October. Write a query to yield the total sum of purchases per month in a single output. Write a query to find the change in revenue generated due to purchases from October to November. Find distinct categories of products. Categories with null category code can be ignored. Find the total number of products available under each category. Which brand had the maximum sales in October and November combined? Which brands increased their sales from October to November? Your company wants to reward the top 10 users of its website with a Golden Customer plan. Write a query to generate a list of top 10 users who spend the most. Note: To write your queries, please make necessary optimizations, such as selecting the appropriate table format and using partitioned/bucketed tables. You will be awarded marks for enhancing the performance of your queries. Each question should have one query only. Use a 2-node EMR cluster with both the master and core nodes as M4.large. Make sure you terminate the cluster when you are done working with it. Since EMR can only be terminated and cannot be stopped, always have a copy of your queries in a text editor so that you can copy-paste them every time you launch a new cluster. Do not leave PuTTY idle for so long. Do some activity like pressing the space bar at regular intervals. If the terminal becomes inactive, you don't have to start a new cluster. You can reconnect to the master node by opening the puTTY terminal again, giving the host address and loading .ppk key file. For your information, if you are using emr-6.x release, certain queries might take a longer time, we would suggest you use emr-5.29.0 release for this case study. There are different options for storing the data in an EMR cluster. You can briefly explore them in this link. In your previous module on hive querying, you copied the data to the local file system, i.e., to the master node's file system and performed the queries. Since the size of the dataset is large here in this case study, it is a good practice to load the data into the HDFS and not into the local file system. You can revisit the segment on 'Working with HDFS' from the earlier module on 'Introduction to Big data and Cloud'. You may have to use CSVSerde with the default properties value for loading the dataset into a Hive table. You can refer to this link for more details on using CSVSerde. Also, you may want to skip the column names from getting inserted into the Hive table. You can refer to this link on how to skip the headers.
Halum / Social Network DatabaseThis is our project for database lab. We are creating a small social network database to prove ourselves that we learned it well.
lavantien / Chess Repertoire DatabaseA SCID repertoire database I'm making, based on what I've learned from Chessbrah's Building Habits series
zjiaqi725 / FLAIR[ICML 2025] Official Repo for Paper "In-Context Adaptation to Concept Drift for Learned Database Operations"
ousstrk / Stack Overflow 2024 Survey AnalysisThe Stack Overflow 2024 Survey, with 65,437 participants, is summarized in four dashboards analyzing how coding was learned, correlations between tools (languages, databases, platforms, OS, AI), and participants' views on AI.
AditiDande20 / JetWeatherAppAnother app from Jetpack Compose course for showing Current weather and forecasts. Learned the concept of clean architecture with other advanced concepts such as DI, Retrofit,Room Database and Jetpack Navigation.
Ara225 / House Price Index ApiA small project with the goal of making an API for the UK's House Price Index. It's all in Node.js, using AWS's CDK framework to define the infrastructure, API gateway + Lambda for the API, Jest for tests, and a mySQL RDS database to store the information. The choice of dataset is mostly random: I just wanted a real life dataset to practice building an API around. It's far from perfect, but I learned a lot about RDS in the process.
inikhilkedia / CS612Assignment5** This assignment will combine a few things that you have learned in this class and will require a little learning on your own. Do your best and be creative. If you need help ask sooner rather than later in slack. Myself and your classmates are here to help and do not wait to the last minute to do this assignment. ** You may work in two person teams, if you plan to do so please email me to let me know who you are working with. You may NOT work in multiple teams. ** You may use python, JAVA or node.js (javascript) for this assignment. ** I have listed some tutorials below, but you may need to google some on your own. What You Will Do: You will create a RESTful web service that runs in a docker container. Your web service will contain two GET routes: One that displays a collection of records One that displays a single record that the corresponds to an ID Example: If I created two routes, /customers and /customers/35 (note, that 35 is the ID of a given customer in my database) The data returned from your web service routes must be in JSON or XML form. Note, if you would like to load your results in a web page you are welcome to do so, you just need a way to display the data your routes return in a web browser. You will create a hardcoded JSON file based database as the backing datastore for your web service routes. Note, if you are comfortable using a SQL or NO-SQL database as your datastore you may do so but it is not required. Also note, your data model is something you make up. Meaning you can store a collection of cars, customers, food items, restaurants, video games, sports teams etc. Be creative :) This is similar to what the presenter did in the GraphQL video we watched in our last class. He used a JSON file as a database for his demo. You will have to present your work to the class, with a live demo or video you recorded of you running your web service from own computer. This is not optional! Tutorials: Docker What is docker: https://www.youtube.com/watch?v=dz5_lsWlfTU Installing Docker: Windows - https://www.youtube.com/watch?v=wCTTHhehJbU Docker Tutorial (Step by Step) - https://www.youtube.com/watch?v=Vyp5_F42NGs https://blog.talpor.com/2015/01/docker-beginners-tutorial/ https://docs.docker.com/engine/getstarted/ https://hackr.io/tutorials/learn-docker Python RESTful services using Flask: https://code.tutsplus.com/tutorials/building-restful-apis-with-flask-diy--cms-26625 https://impythonist.wordpress.com/2015/07/12/build-an-api-under-30-lines-of-code-with-python-and-flask/ Node + Express REST API Example https://closebrace.com/tutorials/2017-03-02/creating-a-simple-restful-web-app-with-nodejs-express-and-mongodb Node Simple RESTful API (shows using json file as DB) https://www.tutorialspoint.com/nodejs/nodejs_restful_api.htm Dockerize your Flask App https://www.smartfile.com/blog/dockerizing-a-python-flask-application/ http://containertutorials.com/docker-compose/flask-simple-app.html Docker + Spring Boot (JAVA) https://spring.io/guides/gs/spring-boot-docker/ To Submit The Assignment (Read Carefully): ** Please follow all instructions as not following them will lead to loss of points. Create a github account. Create public github repository and all all of your source code for this assignment to the repository. (See the “getting started with github” document in the “Course Documents > Tutorials & Cheat Sheets” folder for help) Make sure to add a README file to the root of your repository that describes what your web service does. Create a presentation powerpoint slide deck that contains 2 slides: A title slides that contains your name(s) and the name of your web service project A slide that talks a little about your data model Submit the following to the “Submit Assignment” thread in RESTful Web Service Implementation + Docker discussion board. The powerpoint slide file. Link to your public github repository that contains all of the source code including your JSON database file. Note, if you worked with classmate for this assignment...if you did please state the person you worked with.
meghdadk / DDUpAn implementation of the SIGMOD23 paper: Detect, Distill and Update: Detect, Distill and Update: Learned DB Systems Facing Out of Distribution Data
vmwhoami / NatourThis is a solo project. Learned how to work with NodeJS how to build a database in MongoDB and how to connect that database to a website.
RexAgarwal / EventManagementSystemIt is built using java and MYSQL as database system. This project was done just to practice my java skills which i learned throughout my college course. I have used only the basics, JFrame as UI. Please check it out.
sudoSanto / ITDEV115 Intermediate Object Oriented ProgrammingIntermediate programming with C#. Building on what was learned in ITDEV110, this course focuses on intermediate object-oriented concepts, such as encapsulation, data hiding, inheritance, and polymorphism. Students will be introduced to file I/O, data abstraction, pointers, and database access.
Henoxx / ASCII ARTThis is my first project after studying python for 20 days. I tried to use everything I learned as a beginner. With this project, I tried to scrap a website (http://patorjk.com) to get and create a database for ascii arts. Then display user inputs with another script which I call ascii_art
ayoubbouali / ICardsHere is a web application for card management. I started from scratch and learned how to create a development environment for Angular 7 CLI, then I went on to learn all the fundamentals and basic concepts of Angular and TypeScript, after I started with a basic project, then I switched to a much more advanced card management system with authentication either by an already created account or to let the quick authentication process use our google account, and for data storage it is managed with the Firebase platform and fireStore (real-time database, it synchronizes your data between client applications and listeners in real time), and in the end I developed a complete website.