Analyzing Pittsburgh dog data in R

(Post edited 4/5/2019 to remove implied recommendation of DataCamp following their poor response to sexual assault by an executive.)

As part of my work during 2018’s Summer of Data Science, I decided to undertake a data project in R on top of my studies with textbooks and MOOCs.

Now, anyone who knows me can attest to how much I want a dog in my life. I figured, until I can make that happen, I could at least look at the dog trends in my new home of Pittsburgh, Pennsylvania. Fortunately, the Western Pennsylvania Regional Data Center has plenty of locally-focused datasets, including multiple years’ worth of dog license information.

Starting with some exploratory analysis on the most recent complete dataset from 2017, I geolocated the zipcodes provided with the zipcode package to see which dogs were found in Pittsburgh vs. the rest of Allegheny County. 39% of all dogs were registered for Pittsburgh, and the most popular breed in the county was a labrador or lab mix.

Count of breeds in Allegheny County. Pittsburgh in Steelers colors, of course.

Count of breeds in Allegheny County. Pittsburgh in Steelers colors, of course.

Next, I wanted to see which dog names were popular, another simple count() measure:


Looking through the list of names a bit, I noticed that not only was Bella a very common name, but ‘Jacob Black’ also made it into the list. This got me thinking: Did the release of Twilight encourage people to name their dogs after the characters?

Now, we don’t have this data from the time the first book was released, but the first movie was released in 2008, and probably got more people interested in the series. I would essentially be conducting an A/B test on dogs before (2007) and after (2009) its release. As both variables are categorical (name in Twilight or not, 2007 and 2009), a chi squared test was best suited to the task. It returned a p-value of 0.003, suggesting that it was very probably not random chance that the occurrence of these names increased from 97 dogs in 2007 to 152 in 2009. There was an even further increase to 220 dogs in 2017 (p = 0.000 when compared to 2007). Twilight dogs are everywhere!

Finally, I wanted to know whether the proportion of dogs to humans was going up in Pittsburgh specifically (as I hoped it was). For this, I had to combine the datasets from all years 2007-2017, and find the populations of the city for those years.


The number of dogs per human tanked after the financial crisis of 2008, and may be recovering. We can only hope so.

Full code posted on Github.