How to crawl websites with Selenide and JDK 14+
Sometimes we find ourselves in a situation in which we need certain data, that needs to be manually fetched from some website. As developers, of course automation is our friend, which is why we can write some automated approach to crawl websites, instead of searching all this information ourselves. I’ve recorded a video, in which I’m fetching up some data from my blog website and transform it into a CSV format, by using Selenide and some new Java features such as Records.
Please keep in mind to be a nice citizen and only use such techniques for websites and situations where you’re allowed to do so, and where your actions don’t disrupt any service.
You can find the code example on GitHub: Selenium Playground
What we’re doing is to use Selenide with it’s helpful queries and methods, and Java Records and Streams to map the entries of my blog to a desired output format. The difference to using a web API is that we have to be a bit more creative in how we identify and get the individual parts, since the data is not necessarily structured for automated consumption.
Published on Java Code Geeks with permission by Sebastian Daschner, partner at our JCG program. See the original article here: How to crawl websites with Selenide and JDK 14+ Opinions expressed by Java Code Geeks contributors are their own. |