3 Data Careers Decoded: stakes, challenges, realities and recommendations.
In our data-driven
world where we have more than 4 billion Internet users , and 1.75 billion cell phone
users, where it
is a better time to pursue a career in data and leverage data analysis to make effective decisions, Cheng Han Lee, invites you to taking stock of your three
main career options: data analyst, data scientist, and data engineer.
Data analyst
‘A data analyst is essentially a junior data scientist. It’s the perfect
place to start if you’re new to a career in data and eager to cut your teeth.
Data analysts don’t have the mathematical or research background to invent
new algorithms, but they have a strong understanding of how to use existing
tools to solve problems.
Skills and tools
Data analysts need to have a baseline understanding of five core competencies:
programming, statistics, machine learning, data munging, and data
visualization.
Beyond technical skill, attention to detail and the ability to effectively
present results are equally important to be successful as a data analyst.
How it translates
Data analysts are given direction from more experienced data professionals
in their organization. Based on that guidance, they acquire, process, and
summarize data. Data analysts are the ones managing the quality assurance of
data scraping, regularly querying databases for stakeholder requests, and
triaging data issues to come to timely resolutions. They also then package the
data to provide digestible insights in narrative or visual form.
Lauren
Ancona, about to embark on a career as a data analyst
with the city of Philadelphia’s Open Data Office, is most excited about the
ability of data to effect civic change. She said of her passion for exploring
data, “I spent the summer, often late into the night, learning about map tiles,
database theory, JavaScript, and data visualization.”
An enduring curiosity about data and close examination of evolving best
practices and tools serves all data professionals well, no matter the level of
seniority.
Data scientist
Some companies treat the titles of “data scientist” and “data analyst” as
synonymous. But there’s really a distinction between the two in terms of skill
set and experience.
Though data scientists and data analysts have the same mission in an
organization—to glean insight from the massive pool of data available—a data
scientist’s work requires more sophisticated skills to tackle a higher volume
and velocity of data.
As such, a data scientist is someone who can do undirected research and
tackle open-ended problems and questions. Data scientists typically have
advanced degrees in a quantitative field, like computer science, physics,
statistics, or applied mathematics, and they have the knowledge to invent new algorithms
to solve data problems.
An enduring curiosity about data and close examination of evolving best
practices and tools serves all data professionals well.
Data scientists are extremely valuable to their companies, as their work
can uncover new business opportunities or save the organization money by
identifying hidden patterns in data (for example, highlighting surprising
customer behavior or finding potential storage cluster failures).
Skills and tools
Whereas a data analyst might look at data from only a single source, a data
scientist explores data from many different sources. Data scientists use tools
like Hadoop (the most widely used framework for distributed file system
processing), they use programming languages like Python and R, and they apply
the practices of advanced math and statistics.
The exact set of skills differs by organization and project, but this
example from Data Science
The most valuable nontechnical skill a data
scientist brings to the table is an
intense inquisitiveness. Data scientists have to be driven to pose
questions and hunt down solutions, and in so doing to unearth information that
could transform a business.
As data scientist Gaëlle Recourcé, CSO at Evercontact, said, “I love the
power of metrics and tracking user behaviors, because it gives me the
opportunity to test personal intuitions and then have real empirical results
that allow our team to make data-driven decisions and continually improve our
product.”
How it translates
Data scientists essentially leverage data to solve business problems. They
interpret, extrapolate from, and prescribe from data to deliver actionable
recommendations. A data analyst summarizes the past; a data scientist
strategizes for the future.
Data scientists could identify precisely how to optimize websites for
better customer retention, how to market products for stronger customer
lifecycle value, or how to fine-tune a delivery process for speed and minimal
waste.
Data engineer
A data
engineer builds a robust, fault-tolerant data pipeline that cleans, transforms,
and aggregates unorganized and messy data into databases or datasources. Data engineers are typically software engineers by trade.
Instead of data
analysis, data engineers are responsible for compiling and installing database
systems, writing complex queries, scaling to multiple machines, and putting
disaster recovery systems into place.
Data engineers essentially lay the groundwork for a data analyst or data
scientist to easily retrieve the needed data for their evaluations and
experiments.
Skills and tools
Whereas data scientists extract value from data, data engineers are
responsible for making sure that data flows smoothly from source to destination
so that it can be processed.
As such, data engineers have deep knowledge of and expertise in:
- Hadoop-based technologies like MapReduce, Hive, and Pig
- SQL based technologies like PostgreSQL and MySQL
- NoSQL technologies like Cassandra and MongoDB
- Data warehousing solutions
How it
translates
“My responsibilities are quite various,” said Social Searcher Data Engineer
Dmitry Novikov. “They range from designing the system architecture and separate
modules, to algorithm implementation and infrastructure requirements.”
Data engineers do the behind-the-scenes work that enables data analysts and
data scientists to do their jobs more effectively.
Chris Beland, who leads the data engineering team at Allclasses, describes
what his team does, why it matters, and why he loves it:
“In my work right now, I do a
lot of natural language processing, turning semi-structured, human-readable web
content into highly structured machine-readable databases. My favorite thing to
do is to teach the computer something concrete about the real world, like how
humans write calendar dates and what they mean, or how the universe of class
topics breaks down into categories and subcategories.
Then I come up with some
algorithms so my machine can exploit that new knowledge to parse and sort text
and make sense of it just a little bit like a human would. I feel a bit like a
proud parent when I can check the resulting database, give the program a
virtual pat on the head for getting all the right answers, despite getting a
lot of inputs I never anticipated, and with a satisfying click ship the data
out to people who need it.”
The Bottom Line
You have many options when it comes to a career working with data. If
you’re interested in exploring such a career, your three major options are data
analyst, data scientist, and data engineer.
Sanjay Venkat, co-founder of big data analytics and visualization startup
Datavore Labs, crystallizes why and how this subdividing has occurred: “Data
analysts have morphed into these three or more specialized disciplines. I
believe it is the same specialization that doctors went through at the birth of
modern medicine. First there was your village leader or elder who played the
main role, but as tools of the trade have become more and more specialized, we
now have GPs, surgeons, and neurosurgeons.”
If you’re new to the field of data science, you’ll want to start by aiming
for the GP in Venjkat’s analogy, an analyst job. As you develop your skills and
gain experience, you’ll be able to progress to data scientist or data engineer.