A little background about me: I have three-year experience as a big data solution architect and six in data processing. With this in mind, it took me almost 2 months to prepare for the exam. I strongly recommend previous experience in big data processing and ML before attempting to take this exam.
The resources I followed, and I recommend you do as well, are:
- Coursera Big Data Specialization – this specialization features 5 courses with plenty of hands-on labs that should cover more than 50% of your training. However, once only does not suffice, so I encourage you to follow them more than once (it can be at a higher speed, 1.5x or 2x).
- Preparing for the Google Cloud Professional Data Engineer Exam – a Cloudera course for preparing for the exam; at the end, there is a very useful sample of 20 questions that get you a feel of how the exam will look like. Take them and see how prepared you are.
- https://www.braincert.com – also provides Practice Questions; you will find 150 practice tests with quite detailed answers and indications. Unless you hit more than 90% on each of them, do not book an exam spot. Each wrong answer will show you some weak spots and I advise you go over them again.
- Read the official docs. On every Google Cloud product: Concepts, Guides, Tutorials and Best Practices. Without practice it will be really difficult to answer complex questions at the exam, so do take the time to test each product on your own.
- Linux Academy - more practice tests for the Google Cloud Certified Professional Data Engineer exam.
- Cloud Academy - I actually took this course 2-3 days before the actual exam and I can say it helped me a lot. Not only did it help to refresh a lot of info, but there was also a lot of new information that I did not focus on too much in the beginning.
- Last but not least – you should obviously take the Google practice exam.
Now, if you score above 90-95% on the practice tests (both Braincert and Google) you should consider taking the exam.
A few information about the actual exam:
- there are 50 questions, multi-select, 2h time
- you must take the test at a Kryterion certified location, and it costs 200$
- Google does not share any feedback with you, it only lets you know if you pass or fail
- if you fail you can retake it after 2 weeks; fail again and you can retake it after 3 months; if it so happens that you fail once more, be aware that you will have to wait for 1 year before you can retake the test
- the questions are difficult, and I am not joking - each question is basically a use case, a problem to solve on its own
- although the case studies questions are not included anymore, I still encourage you to study them (talking about mjtelco and flowlogistic)
- since the case studies have been taken out, there is a higher percentage of machine learning questions
I will not go into details on every technology and what to focus on, there are many resources available on this topic (I will leave a list at the end of this article), however, I’d like to mention that there is a general shift towards ML and AI in the overall exam, from what I noticed comparing the practice tests with the actual exam.
You will still need to master every database technology available on GCP, Bigquery, Dataflow, Pub/sub but also AutoML, Tensorflow and AI platform. As they noted in the updated exam guide: there is a shift in the Data Engineer job role, and will soon be well defined and specialized into Data Scientist, Data Analyst and Machine Learning Engineer.
Here is a list of related articles that I strongly recommend reading:
I hope this article was useful and good luck on getting certified soon!
Cosmin Pintoiu, Big Data Solution Architect at Lentiq, passionate about distributed computing and machine learning at scale.