I haven’t written a post in a while. I had a lot to do for university and my hobbies like recreational programming and blogging have to suffer during those times. But now I have found some time and I’ll be adding smaller posts every now and then.
In the Machine Learning course I am taking at university I could use matplotlib to plot my functions for the homework submissions. So I have gotten more familiar with coding plots and graphs in Python since my last post about matplotlib. So I wanted to prepare some interactive plots for my blog and present to you what I have been able to create so far.
Web Scrapping for Dummies
First I wanted to find some interesting data to display. I decided to collect my own data as opposed to take an already publicly available data set, since this can easily be done in Python. For a step-by-step guide on how to scrap data from a web page, where the site is generated on the server side and you can find your data directly in the html code, I recommend chapter 11 in Automate the Boring Stuff in Python from Al Sweigart. The standard module for this type of web scraping is BeautifulSoup, which makes it easy for you to find certain tags in an HTML file.
So I decided, that I wanted to collect the stats of all currently active NBA players. Luckily, there is a blog post from Greg Reda that explained exactly how this can be done in Python. This approach of web scrapping is different, since a lot of newer sites create the web page on the client-side. So you have to find the url for the request to the server. The response you then get is often a JSON object, which you can then parse for the information you want.
The stats.nba.com web page is generated on the client-side, so the latter approach was necessary. I first collected the person_ids from every NBA player in the database and then checked their roster status, whether the player is still actively playing the NBA (here is the url for the player list). This is how my code for this task looks like:
import requests import csv import sys # get me all active players url_allPlayers = ("http://stats.nba.com/stats/commonallplayers?IsOnlyCurrentSeason" "=0&LeagueID=00&Season=2015-16") #request url and parse the JSON response = requests.get(url_allPlayers) response.raise_for_status() players = response.json()['resultSets']['rowSet'] # use rooster status flag to check if player is still actively playing active_players = [players[i] for i in range(0,len(players)) if players[i]==1 ] ids = [active_players[i] for i in range(0,len(active_players))] print("Number of Active Players: " + str(len(ids)))