Extracting domain names from URLs using Java regular expressions

01 Oct 2023

RegularExpressions

To start with, let’s define what a domain name is in the context of a URL. A domain name typically consists of a series of labels separated by dots, such as “example.com” or “www.google.com”. We will focus on extracting the domain name without any preceding protocol (e.g., http://) or path (e.g., /resources/page.html).

To achieve this, we can use Java regular expressions. Regular expressions provide a powerful way to match patterns in text. Here’s an example code snippet that demonstrates how to extract domain names from URLs using regular expressions in Java:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class DomainExtractor {
    public static void main(String[] args) {
        String url = "https://www.example.com/resources/page.html";
        
        // Regex pattern to match domain names
        String domainPattern = "(?<=://|\\.)[^/]+(?=/|$)";
        
        Pattern pattern = Pattern.compile(domainPattern);
        Matcher matcher = pattern.matcher(url);
        
        if (matcher.find()) {
            String domain = matcher.group();
            System.out.println("Domain name: " + domain);
        } else {
            System.out.println("No domain name found");
        }
    }
}

In the code above, we define a regular expression pattern "(?<=://|\\.)[^/]+(?=/|$)" to match domain names. Let’s break down the pattern:

(?<=://|\\.) -> Lookbehind assertion to check for either “://” or “.” before the domain name starts.
[^/]+ -> Match one or more characters that are not a slash (“/”). This captures the domain name itself.
(?=/|$) -> Lookahead assertion to check for either a slash (“/”) or the end of the string after the domain name ends.

We use the Pattern class from the java.util.regex package to compile the regular expression pattern. Then, we create a Matcher instance by invoking the matcher() method on the pattern and passing the URL string as input.

Next, we use the find() method on the matcher to search for the first occurrence of the domain name pattern within the URL string. If a match is found, we retrieve the matched domain name using the group() method and print it to the console. Otherwise, we indicate that no domain name was found.

You can replace the url variable with any desired URL to extract the domain name from. This code snippet gives you a starting point to extract domain names from URLs using Java regular expressions.

By using this regular expression approach, you can now easily extract domain names from URLs in your Java applications. This can be beneficial in various scenarios, such as URL parsing, web scraping, or any situation where you need to work with domain names extracted from URLs. Happy coding!

#Java #RegularExpressions