Dataquest
No description available
Install / Use
/learn @buswedg/DataquestREADME
<table>
<tr></tr>
<tr>
<td colspan="4"><b>Project</b></td>
</tr>
<tr>
<td colspan="4">
<a href="https://github.com/buswedg/dataquest/tree/master/Python%20Programming/">Python Programming</a>
</td>
</tr>
<tr>
<td><b>Author</b></td>
<td><b>Expertise</b></td>
<td><b>Tool</b></td>
<td><b>Industry</b></td>
</tr>
<tr>
<td>
Darryl Buswell
</td>
<td>
Exploratory Analysis
</td>
<td>
Python
</td>
<td>
Entertainment<br>Environment<br>Government Policy and Planning<br>Information Technology<br>Sciences<br>Securities and Finance<br>Sports and Recreation<br>Transportation
</td>
</tr>
<tr>
<td colspan="4"><b>Description</b></td>
</tr>
<tr>
<td colspan="4">
<p>Foundations of Python and programming, including modules, enumeration, indexing, scopes, object-oriented programming, lambda functions, and exception handling.</p>
<p>Includes:</p>
<ul>
<li>Employed basic Python syntax on Star Wars script data to determine which character speaks most often.</li>
<li>Use of Python to read/ parse a raw dataset, convert data types, apply IF statements, and apply for loops in order to find which US city has the lowest rate of violent crime.</li>
<li>Application of Python functions to parse data, apply IF statements, and create a dictionary in order to calculate the frequency of different weather conditions in Los Angeles.</li>
<li>Use of Python functions which tokenize string data, check for syntax and index errors, and normalize data dictionaries in order to provide a check for spelling errors within text data.</li>
<li>Employed modules and classes in Python to determine the number of wins for an American National Football League (NFL) team using data from 2009 to 2013.</li>
<li>Use of the enumerate function, list comprehensions, try/ except blocks, and the None type in Python, while finding the most common names for US Congressman/ Congresswomen.</li>
<li>Application of Python functions to create a 'while' loop, use the 'break' keyword, and add named and optional arguments to a function in order to find which US airlines experience the most delays.</li>
<li>Use of scopes and debugging in Python while analyzing student loan defaults in the US.</li>
<li>Object oriented programming in Python, including writing organized sensible code and implementing comparison operators, to compare the average ages of players on various NBA teams.</li>
<li>Example exception handling code in Python applied to recorded chopstick 'food pinching efficiency' data.</li>
<li>Advanced string manipulation and anonymous functions in Python in order to assess characteristics of a list of user passwords.</li>
</ul>
</td>
</tr>
<tr>
<td colspan="4"><b>Dataset</b></td>
</tr>
<tr>
<td colspan="4">
<ul>
<li>Star Wars Episode IV script. <a href="https://github.com/gastonstat/StarWars/blob/master/Text_files/EpisodeIV_dialogues.txt" target="_blank">[link]</a></li>
<li>Number of incidents of 'violent crime' within each US city for 2013. <a href="https://www.fbi.gov/about-us/cjis/ucr/crime-in-the-u.s/2013/crime-in-the-US-2013/violent-crime/violent-crime-topic-page/violentcrimemain_final" target="_blank">[link]</a></li>
<li>Historic daily weather conditions for Los Angeles. <a href="https://github.com/buswedg/dataquest/blob/master/S1%20Python%20Introduction/Python%20Programming%20Beginner/Dictionaries/data/la_weather.csv" target="_blank">[link]</a></li>
<li>Short story text file with a number of spelling mistakes. <a href="https://github.com/buswedg/dataquest/blob/master/S1%20Python%20Introduction/Python%20Programming%20Beginner/Functions%20and%20Debugging/data/story.txt" target="_blank">[link]</a></li>
<li>National Football League (NFL) win/ loss records for each game from 2009 to 2013</li>
<li>Members of the United States Congress (1789-Present) and congressional committees (1973-Present) in YAML. <a href="https://github.com/unitedstates/congress-legislators" target="_blank">[link]</a></li>
<li>US airline flight delay statistics from the US Department of Transportation's (DOT) Bureau of Transportation Statistics (BTS). <a href="http://www.transtats.bts.gov/OT_Delay/OT_DelayCause1.asp" target="_blank">[link]</a></li>
<li>Student loan debt data (e.g. number of borrowers and defaulted borrowers) for educational institutions within the US.</li>
<li>NBA players data (e.g. player name, position and points per game) from the 2013-2014 season. <a href="http://stats.nba.com/" target="_blank">[link]</a></li>
<li>Recorded 'food pinching efficiency' for 31 male junior college students and 21 primary school pupils who used chopsticks of various lengths.</li>
<li>List of 2,151,220 unique ASCII paswords. <a href="http://datashaping.com/passwords.txt" target="_blank"></a></li>
</ul>
</td>
</tr>
</table>
<br>
<table>
<tr></tr>
<tr>
<td colspan="4"><b>Project</b></td>
</tr>
<tr>
<td colspan="4">
<a href="https://github.com/buswedg/dataquest/tree/master/Data%20Analysis%20with%20Pandas/">Data Analysis with Pandas</a>
</td>
</tr>
<tr>
<td><b>Author</b></td>
<td><b>Expertise</b></td>
<td><b>Tool</b></td>
<td><b>Industry</b></td>
</tr>
<tr>
<td>
Darryl Buswell
</td>
<td>
Exploratory Analysis
</td>
<td>
Python
</td>
<td>
Food, Beverages and Tobacco<br>Government Policy and Planning<br>Transportation
</td>
</tr>
<tr>
<td colspan="4"><b>Description</b></td>
</tr>
<tr>
<td colspan="4">
<p>Use of the Pandas dataframe object in Python for more efficient data analysis.</p>
<p>Includes:</p>
<ul>
<li>Use of the Numpy library, matrices, and vectors in Python in order to assess alcohol consumption by country.</li>
<li>Application of Python and Pandas to index, retrieve, sort, normalize and run calculations on US Department of Agriculture (USDA) data to discover the most/ least nutritional foods.</li>
<li>Application of Python and Pandas to compute summary statistics, create pivot tables, remove missing values, and reindex rows of passenger survival data from the Titanic.</li>
<li>Use of Python and Pandas to manipulate dataframes and calculate summary statistics of employment data from the American Community Survey (ACS) for 2010 to 2012.</li>
</ul>
</td>
</tr>
<tr>
<td colspan="4"><b>Dataset</b></td>
</tr>
<tr>
<td colspan="4">
<ul>
<li>Alcohol consumption data (e.g. type of alcohol and amount consumed) for countries from around the world.</li>
<li>Food nutrition data from the US Department of Agriculture (USDA) National Nutrient Database for Standard Reference. <a href="http://www.ars.usda.gov/Services/services.htm?modecode=80-40-05-25" target="_blank">[link]</a></li>
<li>Passenger data (e.g. age, gender, fare, cabin) who were onboard the Titanic. <a href="https://www.kaggle.com/c/titanic/data" target="_blank">[link]</a></li>
<li>American Community Survey (ACS) results for 2010 to 2012 from a survey on job outcomes for recent college graduates based on the major they studied in college. <a href="https://github.com/fivethirtyeight/data/blob/master/college-majors/recent-grads.csv" target="_blank">[link]</a></li>
</ul>
</td>
</tr>
</table>
<br>
<table>
<tr></tr>
<tr>
<td colspan="4"><b>Project</b></td>
</tr>
<tr>
<td colspan="4">
<a href="https://github.com/buswedg/dataquest/tree/master/Data%20Visualization/">Data Visualization</a>
</td>
</tr>
<tr>
<td><b>Author</b></td>
<td><b>Expertise</b></td>
<td><b>Tool</b></td>
<td><b>Industry</b></td>
</tr>
<tr>
<td>
Darryl Buswell
</td>
<td>
Exploratory Analysis
</td>
<td>
Python
</td>
<td>
Education<br>Environment<br>Health Care<br>Information Technology
</td>
</tr>
<tr>
<td colspan="4"><b>Description</b></td>
</tr>
<tr>
<td colspan="4">
<p>Implementation of various techniques to visualize data using Python with Matplotlib and Seaborn.</p>
<p>Includes:</p>
<ul>
<li>Use of the Matplotlib module to plot charts in Python of forest fire data from Montesinho National Park.</li>
<li>Application of Python, Pandas and Matplotlib to effectively utilize visualization to explore employment data from the American Community Survey for 2010 to 2012.</li>
<li>Prodution of a visually appealing histogram plot in Python using Seaborn and data from the National Survey of Family Growth, 2002 to 2003.</li>
<li>Use of different components of the Matplotlib module to create customizable data visualizations in Python.</li>
</ul>
</td>
</tr>
<tr>
<td colspan="4"><b>Dataset</b></td>
</tr>
<tr>
<td colspan="4">
<ul>
<li>Meteorological and forest burning data for the northeast region of Portugal, including temperature, humidity, wind speed and area of forest burned. <a href="https://archive.ics.uci.edu/ml/datasets/Forest+Fires" target="_blank">[link]</a></li>
<li>American Community Survey (ACS) results for 2010 to 2012 from a survey on job outcomes for recent college graduates based on the major they studied in college. <a href="https://github.com/fivethirtyeight/data/blob/master/college-majors/recent-grads.csv" target="_blank">[link]</a></li>
<li>Survey data from the National Survey of Family Growth from January 2002 to March 2003 which contains data on mothers age, pregnancy duration, and birth weight. <a href="http://www.cdc.gov/nchs/nsfg.htm" target="_blank">[link]</a></li>
</ul>
</td>
</tr>
</table>
<br>
<table>
<tr></tr>
<tr>
<td colspan="4"><b>Project</b></td>
</tr>
<tr>
<td colspan="4">
<a href="https://github.com/buswedg/dataquest/tree/master/Data%20Cleaning/">Data Cleaning</a>
</td>
</tr>
<tr>
<td><b>Author</b></td>
<td><b>Expertise</b></td>
<td><b>Tool</b></td>
<td><b>Industry</b></td>
</tr>
<tr>
<td>
Darryl Buswell
</td>
<td>
Exploratory Analysis
</td>
<td>
Python
</td>
<td>
Entertainment
</td>
</tr>
<tr>
<td colspan="4"><b>Description</b></td>
</tr>
<tr>
<td colspan="4">
<p>Basics of using Python to clean and manipulate data.</p>
<p>Includes:</p>
<ul>
<li>Use of Python and Pandas code to clean a dataset of Avengers characters deaths with the aim of making the data more useful for analysis.</li>
</ul>
</td>
</tr>
<tr>
<td colspan="4"><b>Dataset</b></td>
</tr>
<tr>
<td colspan="4">
<ul>
<li>Details the deaths of Marvel comic book characters between the time they joined the Avenge
