My View: “Curiosity About Data” Is Most Important Skill for Data Scientists
Meet Catie Williams, Program Director for Bellevue University’s new Bachelor of Science in Data Science program and Product Director for InEight.
Q. Tell me about your current role at InEight.
A. I currently oversee the Connected Analytics product stack and we provide data and analytics solutions to improve project profitability, identify and mitigate risk, and increase productivity.
Q. Just how big is “Big Data”?
A. I do think Big Data is a bit of a buzzword still – and typically used as a wrapper for the architecture and infrastructure required to process really large volumes of data quickly. Traditional hardware and software runs into issues with processing data to provide real time analytics – so Big Data is being used as a term to differentiate what is required from an infrastructure perspective. I don’t usually consider data large until it is well in the petabyte range. Datasets with millions of records is more the norm today vs being considered Big Data. Typically the types of industries generating this volume of data are retail, social media, medical – anything with logging, sensors, etc.
Q. What are some everyday examples of how businesses collect and use data? What kind of decisions do they make with it?
A. The data an organization generates and stores today has increased significantly in the last 10-20 years. Storage used to be very expensive, so an organization didn’t always have centralized systems or applications for each portion of their business. Now, you can find a tool for any business process and typically these tools provide insights out of the box, without manual intervention. With the addition of mobile devices, data can now be collected anywhere and can be aggregated to a centralized location to provide analytics and reporting.A scenario might be when we use our debit/credit cards and get a phone call from the bank wanting to verify if the transaction is fraudulent. This is possible because there is a machine learning algorithm that has been learning our patterns and detects a potential anomaly that needs to be validated. The more data it consumes, the better it becomes at understanding your pattern.Prior to this ability of being able to collect standard information and consolidate the information – most decisions have been driven off experience, intuition, and an individual’s level of confidence. Now, experience can be validated by data – for example, determining how much product should be purchased, or what products complement each other. These decisions no longer have to be made via manual observation or calculations, but instead can be built into programs.
Q. What are the most important skills for data scientists?
A. Whenever I am asked this question, I immediately answer ‘curiosity about data.’ Being passionate about data, finding answers, and not taking something at face value is going to propel you against your peers. The data science space is different than traditional reporting and analytics because it is met with an expectation that a data scientist will uncover something unknown – a golden nugget in the data that no one ever realized was there and traditional reports wouldn’t have detected. There is a high level of ambiguity in this field, because the goal is to discover hidden insights, correlations, and patterns in the data that a person didn’t think to ask for.Reports are typically defined by a person – they request they want to see “x” and “y” on a report. But a data scientist doesn’t start with a pre-defined list of requirements, they typically just start with the data. The field requires tenacity, problem solving, and grit to dig into the details and realize that most projects are never really finished, but constantly evolving as we uncover more information.
Q. What don’t people realize about the field of data science?
A. I think most expect there to be a lot of math required, which there is, but there is also a lot of programming because before you can get going too far, you will realize the data is extremely messy and requires a lot of massaging to get it in a workable state. If you think about the number of systems generating data, then trying to blend those together – when they each have different data standards – it quickly becomes the place where most time is spent. Doing the analysis manually or “by hand” is not feasible either.
Q. What advice do you have for students who want to go into the field of data sciences today?
A. Data Science is a broad field, with multiple roles that tend to get all lumped together under the sole umbrella of data science. My advice would be to not feel overwhelmed by the possibilities and focus instead on what you feel most passionate about. Whether that is as an analyst or on the development side, the data engineering or data wrangling, to the storytelling. Most organizations have a myriad of individuals who are all fulfilling these roles – each focusing on a different specialty. While the program will expose you to most roles, it is possible to specialize and really hone in on a specific piece. For example, my passion is around data visualization and helping someone find an answer very quickly, so I am very interested in new developments in this space and the psychology of data viz – this is where I tend to focus my time because it interests me the most.
Q. What does a day in the life of a data scientist look like?
A. A data scientist can typically expect to be part of a project team, either with other data scientists or with a business area (Finance, HR, Operations, etc.), and is likely given projects they work on following a certain methodology, scrum/agile are the most popular right now, but there are methodologies specific to data science as well.On a day to day basis, one could expect to be providing a daily status on progress, interacting with technical resources to help with data access, deploying code, or just be heads-down on development for that project. A good portion of time on a data science project is usually spent figuring out where the data is going to come from, how to easily get to it, should it be automated, is it repeatable and then what steps need to be taken to blend it with other data. Data cleaning/preparation is estimated to take about 80% of a data scientist’s time – because there are so many applications that all have data in different formats. A lot of time must be spent identifying the rules to follow, data that needs cleaning, etc.A data scientist also will likely have a stakeholder group or executive committee that they keep regularly in the loop on their progress for specific projects – it should be expected to have to present findings and a visual analysis on a regular basis to these groups.
Q. How flexible is a data science degree? Are you limited to a specific role once you gradate?
A. Having a data science degree will position you in the job market as a person that understands analysis, data structure, programming, which allows for unlimited job opportunities. Having the ability to manage and analyze data is a skill most organizations think is hard to teach – with a strong portfolio of data science projects after completing the program, it will be evident you as a candidate have this ability.
Q. How big of a role does machine learning and AI play in data science for businesses today? Will it be a bigger deal in the future?
A. Both machine learning and AI (artificial intelligence) are important in data science because they remove some of the manual work required and employers are always looking for opportunities to automate and increase efficiencies. I think we will continue to see new use cases for both in ways we can’t even consider now. Self-driving cars seemed impossible several years ago, but they are a near reality. Other applications, like being able to go to the store without having to physically checkout or having groceries and household goods automatically ordered and shipped are also the type of things hard to imagine happening, but in the short term are very possible.
Click here for more information on Bellevue University’s new Bachelor of Science in Data Science program.