Let’s talk data

Read about our company, how we work and the latest developments in the industry.

← Back to articles

How to prepare for the Google Cloud Data Engineer Certification

If you don’t know by now, the GCP Professional Data Engineer certification will demonstrate your ability to design and build data processing systems and create machine learning models on GCP. I have taken the Professional Data Engineer exam from Google Cloud recently and, as seen below, I’m happy to have passed it. I’ve written a short guide, for anyone attempting to take this exam after March 29 (when the exam requirements were last updated). Read on to find out what resources worked for me.

A little background about me: I have three-year experience as a big data solution architect and six in data processing. With this in mind, it took me almost 2 months to prepare for the exam. I strongly recommend previous experience in big data processing and ML before attempting to take this exam.

The resources I followed, and I recommend you do as well, are:

  • Coursera Big Data Specialization – this specialization features 5 courses with plenty of hands-on labs that should cover more than 50% of your training. However, once only does not suffice, so I encourage you to follow them more than once (it can be at a higher speed, 1.5x or 2x).
  • Preparing for the Google Cloud Professional Data Engineer Exam – a Cloudera course for preparing for the exam; at the end, there is a very useful sample of 20 questions that get you a feel of how the exam will look like. Take them and see how prepared you are.
  • https://www.braincert.com – also provides Practice Questions; you will find 150 practice tests with quite detailed answers and indications. Unless you hit more than 90% on each of them, do not book an exam spot. Each wrong answer will show you some weak spots and I advise you go over them again.
  • Read the official docs. On every Google Cloud product: Concepts, Guides, Tutorials and Best Practices. Without practice it will be really difficult to answer complex questions at the exam, so do take the time to test each product on your own.
  • Linux Academy - more practice tests for the Google Cloud Certified Professional Data Engineer exam.
  • Cloud Academy - I actually took this course 2-3 days before the actual exam and I can say it helped me a lot. Not only did it help to refresh a lot of info, but there was also a lot of new information that I did not focus on too much in the beginning.
  • Last but not least – you should obviously take the Google practice exam.

Now, if you score above 90-95% on the practice tests (both Braincert and Google) you should consider taking the exam.

A few information about the actual exam:

  • there are 50 questions, multi-select, 2h time
  • you must take the test at a Kryterion certified location, and it costs 200$
  • Google does not share any feedback with you, it only lets you know if you pass or fail
  • if you fail you can retake it after 2 weeks; fail again and you can retake it after 3 months; if it so happens that you fail once more, be aware that you will have to wait for 1 year before you can retake the test
  • the questions are difficult, and I am not joking - each question is basically a use case, a problem to solve on its own
  • although the case studies questions are not included anymore, I still encourage you to study them (talking about mjtelco and flowlogistic)
  • since the case studies have been taken out, there is a higher percentage of machine learning questions

I will not go into details on every technology and what to focus on, there are many resources available on this topic (I will leave a list at the end of this article), however, I’d like to mention that there is a general shift towards ML and AI in the overall exam, from what I noticed comparing the practice tests with the actual exam.

You will still need to master every database technology available on GCP, Bigquery, Dataflow, Pub/sub but also AutoML, Tensorflow and AI platform. As they noted in the updated exam guide: there is a shift in the Data Engineer job role, and will soon be well defined and specialized into Data Scientist, Data Analyst and Machine Learning Engineer.

Here is a list of related articles that I strongly recommend reading:

I hope this article was useful and good luck on getting certified soon!

Cosmin Pintoiu, Big Data Solution Architect at Lentiq, passionate about distributed computing and machine learning at scale.


Readers also enjoyed:

How to use Lentiq with Google Cloud

Lentiq is a new data science platform that was specifically designed to work in multiple cloud environments. The deploy sequence is fast and easy…

Why did we build on Kubernetes?

Trying to bring forward a solution that is both multi-cloud and capable enough to provide multiple tools for our users, Kubernetes represents the backbone…

Try Lentiq with your team. 14 days free trial.

Create a Free Account