When I was applying to graduate data science programs, I feel like I was largely shooting around in the dark. I had some idea of the prestige of the schools I was applying to (i.e. if had I heard of them or not), could find the costs on the website somewhere, and generally knew classes labelled “machine learning” would be good when I inevitably threw myself into the job market (thanks, covid-19).
Still, there was a lot on the websites with more glossy, smily stock photos than information on the actual program. For a $50,000+/year investment (in the US, at…
I’ll be the first to admit — I know nothing about wine. All I know is that occasionally I sip something I like, and most of the time I sip something I really don’t. So although I personally can’t tell the differences between a wine with notes of cedar versus oak (or probably between cedar and bubblegum, let’s be honest) I know that there’s something that makes me like certain wines, I just have no idea what.
Enter text analysis and a data set of over 130,000 different wine reviews, ranging from $9 Trader Joe’s Chardonnay to $15,000 Dom Perignon…
Grad school SOPs: They can feel like such a beast.
I wasn’t a perfect student when I was applying to M.S. in Data Science programs; in fact, I was far from it. I had never touched Python, never ran a machine learning algorithm, and hadn’t had a sexy internship at Google or Facebook to speak of. …
Association analysis is a hot topic in data science right now. By discovering relationships between items within large quantities or networks of data, we can glean insights in many areas. These include uncovering unconscious consumer buying patterns at a particular store or through an app (known as market basket transactions), finding interesting patterns in text mining, or potentially discovering patterns in healthcare, transportation, or survey-based data. These relationships may be represented by what we call association rules, which are typically written in the form {A} → {B}, where the two items A and B exhibit some sort of relationship.
This…
It all started when I was practicing technical interview questions. My first term as a masters data science student was over, and with sweet relief I went on a brief (but respectable) Netflix bender before switching into diving into technical interview prep. I was feeling confident. Cocky, even. Like Han Solo who had just saved the galaxy while still being cool.
Oh, how quickly that cockiness was about to fade.
Technical Interviewer: “Given the data table attached, determine if there is a relationship between fitness level and smoking habits.”
The human race has never been better. No, I’m not being sarcastic, nor was I being paid to say it by <insert big company name here>. For the everyday person across the globe, products are becoming cheaper and more accessible. Women have more education and contraception than in our entire shared history. Putting aside a few bumps in the road, the world is just getting better.
Don’t believe me? Most people don’t. The news is determined to make you think the world is ending because immediate doom-and-gloom sells. While the news is right on some accounts, such as climate change…
Let’s just get these facts out of the way: it’s a weird time to be alive right now. For job seekers especially, it may be hard to focus on seemingly trivial things like drilling 10 more technical interview questions, earning another online certification or building your data science portfolio, as you also apply and interview for jobs (I’m convinced the only reason medieval kings didn’t use the job interview as a form of torture was because they hadn’t thought of it yet).
While it can be tempting to give in to the collective slump, consuming low-calorie ice cream by the…
I recently wrote an introductory article to geospatial visualizations in R, which quickly became my most read article by a factor of 5 (thanks, btw!). It was fantastic seeing how interested people were in geospatial data visualization, because at this point it’s become one of my favorite things on the planet.
That being said, I felt like I had left some cards on the table; for one, I hadn’t discussed ggplot2. Better yet, this second way was even prettier.
In this tutorial, we’re going to be using 2010 Census data to show off the wonders of graphing with R’s ggplot2…
As an undergraduate I studied economics, which meant I studied a lot of regressions. It was basically 90% of the curriculum (when we’re not discussing supply and demand curves, of course). The effect of corruption on sumo wrestling? Regression. Effect of minimum wage changes on a Wendy’s in NJ? Regression. Or maybe The Zombie Lawyer Apocalypse is more your speed (O.K., not a regression, but the title was cool).
Either way, my undergrad taught me three things: 1) supply-and-demand, 2) regressions are life, and 3) economists think they are gosh darn hilarious.
But what if your regression fails you? What…
I always wondered how people created beautiful geospatial visualizations in R. I would stare googly-eyed at the perfect gradients which could make a dashboard or presentation pop, wondering just how the hell people were doing this in the programming language I associated with p-values and regression lines.
Surprisingly, you can actually accomplish this with but a few lines of code — I’m here today to show you how.
First, you’ll need access to the Census Bureau’s API. Go to this link to request an API key. After about 10 seconds you should get an email with your key and a…
Grad Student in Data Science at UVA. Putting the elation in relational database management. I draw pretty graphs. 🌈 https://www.linkedin.com/in/amanda-west123/