Tidy Tuesday with Python

My weekly (or monthly) data visualisation practicing using data from TidyTuesday, using Matplotlib and Python instead of R!

Projects

2 Aug 2022

Frogs spotted in Oregon. I experimented with circle packing but it's so much pain that I would probably never do it again. Anyway here's the code and the pic.

frogs

Code: Here

19 Jul 2022

This data set is so interesting that I got so obsessed with wrangling it and forgot about visualisation...

Here is a meaningless graph just for fun XD

Code: Here

12 Jul 2022

European flights. I don't know what I'm doing. Ideally this highlights the hit of COVID on air traffic.

Code: Here

5 Jul 2022

SF rent and lease distribution. I tried to do an animation and failed miserably. I still have to get up to work on Wednesday so I'd try again next week lol.

Code: Here

28 June 2022

UK pay gap.

Code: Here

21 June 2022

In honour of Juneteenth :) did a lot of text processing stuff to brush up my regex.

Code: Here

14 June 2022

The data set is on droughts in the US but I focused on California in this part. It isn't going well...

Then I also looked into the general trend for every state in the US but this is kind of unclear at first glance... us_droughts

Code: Here

7 June 2022

Holding companies donating to anti-LGBTQ politicians accountable. Condensed:

Code: Here

24 May 2022

Women's rugby.

Code: Here

17 May 2022

Eurovision! And the drastic contrast between 2022 and 2021.

2021:

Whereas 2022:

All of our best wishes go to Ukraine <3

Plotted on Python using Basemap in Matplotlib and Geopy.

Code: Click here

3 May 2022

After a month of random COVID disruptions and UCLA DataFest I am finally back to TidyTuesday!

Today's raw data: https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-05-03

Graphic:

29 Mar 2022

Plotly is so cool!

ncaafunds

22 Mar 2022

Cheesiest plot I've made so far...

Raw data: [https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-03-22/babynames.csv]

Graphic: A graphic showing the trendiness of a selection of feminine baby names across the time span from 1960 to 2017.

XH 22 Mar 2022

08 Mar 2022

PENDING

01 Mar 2022

Tried Geopandas and Geoplots the first time! I'd say I would probably rather use seaborn the next time though...

Data processing logic:

Raw data: [https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-03-01/stations.csv]
Map the points on the country map using the LONGITUDE and LATITUDE columns.
Colour code the points using the column FUEL_TYPE_CODE distinguishing the fuel types.

Visual features:

Designs:
- Map of the US (excluding Alaska and islands) as background with faint county borders
- Translucent data points showing the density of the distribution clearly
- Legend showing fuel types
Avenir typesetting! Avenir is the best
Also added Alt text

Issues:

Projection: whenever I employ projection methods the session crashes, so now the map looks kind of squished
The spots on the legend are so faint that it's hard to tell apart the difference in colours
The legend handles are currently acronym and might work better if I type in the full name

Plans:

Maybe fix the projection issue
Differentiate the colours more

Graphic: A graphic showing the alternative fuel station distribution in the US. The shape of the country excluding Alaska and islands are shown on the background with bright points indicating the occurrences of the stations in different regions. The colours mark the type of alt fuel the stations supply.

XH 01 Mar 2022

22 Feb 2022

Happy 22022022 palindrome day!

Content note: The raw data given by TidyTuesday this week involves comparison between countries, which might involve some political disputes and/or underlying assumptions. The raw data does not come from me and does not represent my political opinions. Please assess the credibility of the original data under your own judgements.

Data processing logic:

Raw data: [https://github.com/rfordatascience/tidytuesday/blob/master/data/2022/2022-02-22/freedom.csv]
Selecting the column of 'Status' (Free, partially free and not free)
Extract the status of each country for every year (1995-2020), count the data and funnel into 3 dictionaries.
Show the trend of the number of countries that are in each status over the 26 years

Visual features:

Designs:
- Used mock-ggplot style with some modifications (facecolor etc.)
- Translucent on-graph legend with sharp corners
- All-filling solid colours with different shades
Avenir typesetting! Avenir is the best

Issues:

Sort of boring (I didn't have much time to make it fancier :(
Would probably work better if the graph is more horizontal (aka the height could be decreased)
The grids in the background are useless since the area fills don't have an alpha (quick fix)

Plans:

Add alphas to the filled area under curves
Change graph dimensions
Improving the documentation and styling
Alt text

Graphic:

XH 22 Feb 2022

8 Feb 2022 (actually using the data set from 25 Jan 2022)

I am not interested in random american airforce people so I pulled out an old boardgame data set instead!

** LOGGING IN PROCESS NOT FINISHED **

Data processing logic:

Data on dog breeds and their different traits
Quantifying all qualitative descriptions into scores using text processing
Weighting and categorising each trait into two new parameters 'friendliness' and 'fluffiness'
Plot scatter plot with each point corresponding to a breed on the quadrant of fluffiness-friendliness

Visual features:

Detecting overlapping points or close-by points automatically and wrap/dodge off the labelling (still bugged :( )
Generating a new colour for each data point on the tab 20b palette (or any other palettes, might change it if in the mood)
Avenir typesetting! Avenir is the best
Annotation of the breed name beside each data point
Legend indexing all 190+ breed names

Issues:

The overlap detector does not work for certain few points for some reason
Graph too huge with too many data points -- hard to read! Don't know if there's a better way to present the data!
Might need to adjust some weighings a bit (as I don't own a dog myself, I am biased!)

Plans:

Indexing the position on the diagram for each breed and incorporating into the legend
Improving the documentation and styling (it is currently unfortunately a mess!)

Graphic:

XH 8 Feb 2022

1 Feb 2022

Data processing logic:

Data on dog breeds and their different traits
Quantifying all qualitative descriptions into scores using text processing
Weighting and categorising each trait into two new parameters 'friendliness' and 'fluffiness'
Plot scatter plot with each point corresponding to a breed on the quadrant of fluffiness-friendliness

Visual features:

Detecting overlapping points or close-by points automatically and wrap/dodge off the labell

TidyTuesdayWithPython

Install / Use

README

Tidy Tuesday with Python

Projects

2 Aug 2022

19 Jul 2022

12 Jul 2022

5 Jul 2022

28 June 2022

21 June 2022

14 June 2022

7 June 2022

24 May 2022

17 May 2022

3 May 2022

29 Mar 2022

22 Mar 2022

08 Mar 2022

01 Mar 2022

22 Feb 2022

8 Feb 2022 (actually using the data set from 25 Jan 2022)

1 Feb 2022