libtld 1.2.0

tld.c File Reference

#include "tld.h"
#include "tld_data.h"
#include <malloc.h>
#include <limits.h>
Include dependency graph for tld.c:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Functions

int cmp (const char *a, const char *b, int n)
 Compare two strings, one of which is limited by length.
int search (int i, int j, const char *domain, int n)
 Search for the specified domain.
enum tld_result tld (const char *uri, struct tld_info *info)
 Get information about the TLD for the specified URI.
const char * tld_version ()
 Return the version of the library.

Function Documentation

int cmp ( const char *  a,
const char *  b,
int  n 
)

This internal function was created to handle a simple string (no locale) comparison with one string being limited in length.

The comparison does not require locale since all characters are ASCII (a URI with Unicode characters encode them in UTF-8 and changes all those bytes with XX.)

The length applied to the string in b. This allows us to make use of the input string all the way down to the cmp() function. In other words, we avoid a copy of the string.

The string in a is 'nul' (\0) terminated. This means a may be longer or shorter than b. In other words, the function is capable of returning the correct result with a single call.

Parameters:
[in]aThe pointer in an f_tld field of the tld_descriptions.
[in]bPointer directly in referencing the user domain string.
[in]nThe number of characters that can be checked in b.
Returns:
-1 if a < b, 0 when a == b, and 1 when a > b

Definition at line 110 of file tld.c.

Referenced by search(), and test_compare().

int search ( int  i,
int  j,
const char *  domain,
int  n 
)

This function executes one search for one domain. The search is binary, which means the tld_descriptions are expected to be 100% in order at all levels.

The i and j parameters represent the boundaries of the current level to be checked. Know that for a given TLD, there is a start and end boundary that is used to define i and j. So except for the top level, the bounds are limited to one TLD, sub-TLD, etc. (for example, .uk has a sub-layer with .co, .ac, etc. and that ground is limited to the second level entries accepted within the .uk TLD.)

This search does one search at one level. If sub-levels are available for that TLD, then it is the responsibility of the caller to call the function again to find out whether one of those sub-domain name is in use.

When the TLD cannot be found, the function returns -1.

Parameters:
[in]iThe start point of the search (included.)
[in]jThe end point of the search (excluded.)
[in]domainThe domain name to search.
[in]nThe length of the domain name.
Returns:
The offset of the domain found, or -1 when not found.

Definition at line 169 of file tld.c.

References cmp(), tld_description::f_tld, tld(), and tld_descriptions.

Referenced by test_search(), test_search_array(), and tld().

enum tld_result tld ( const char *  uri,
struct tld_info info 
)

The tld() function searches for the specified URI in the TLD descriptions. The results are saved in the info parameter for later interpretetation (i.e. extraction of the domain name, sub-domains and the exact TLD.)

The function extracts the last extension of the URI. For example, in the following:

 example.co.uk

the function first extracts ".uk". With that extension, it searches the list of official TLDs. If not found, an error is returned and the info parameter is set to unknown.

When found, the function checks whether that TLD (".uk" in our previous example) accepts sub-TLDs (second, third, forth level TLDs.) If so, it extracts the next TLD entry (the ".co" in our previous example) and searches for that second level TLD. If found, we again try with the third level, etc. until all the possible TLDs were exhausted. At that point, we return the last TLD we have found. In case of ".co.uk", we return the information of the ".co" TLD, second-level domain name.

The info structure includes:

  • f_category -- the category of TLD, unless set to TLD_CATEGORY_UNDEFINED, it is considered valid
  • f_status -- the status of the TLD, unless set to TLD_STATUS_UNDEFINED, it was defined from the tld_data.xml file; however, only those marked as TLD_STATUS_VALID are considered to currently be in use, all the other statuses can be used by your software, one way or another, but it should not be accepted as valid in a URI
  • f_country -- if the category is set to TLD_CATEGORY_COUNTRY then this pointer is set to the name of the country
  • f_tld -- is set to the full TLD of your domain name; this is a pointer WITHIN your uri string so make sure you keep your URI string valid if you intend to use this f_tld string
  • f_offset -- the offset to the first period within the domain name TLD (i.e. in our previous example, it would be the offset to the first period in ".co.uk", so in "example.co.uk" the offset would be 7. Assuming you prepend "www." to have the URI "www.example.co.uk" then the offset would be 11.)
Note:
In our previous example, the ".uk" TLD is properly used: it includes a second level domain name (".co".) The URI "example.uk" should return TLD_RESULT_INVALID since .uk by itself is supposed to be acceptable. However, in that special case, there are still some companies using second level domain names, so we would accept "example.uk". However, the ".bd" is not accepted at second level, so "example.bd" returns an error (TLD_RESULT_INVALID).

Assuming that you always get valid URIs, you should get one of those results:

  • TLD_RESULT_SUCCESS -- success! the URI is valid and the TLD was properly determined; use the f_tld or f_offset to extract the TLD domain and sub-domains
  • TLD_RESULT_INVALID -- known TLD, but not currently valid; this result is returned when we know that the TLD is not to be accepted

Other results are return when the input string is considered invalid.

Note:
The function only accepts a bare URI, in other words: no protocol, no path, no anchor, no query string. Also, it should not start and/or end with a period or you are likely to get an invalid response. (i.e. don't use ".example.co.uk.")
 // from example.c
 #include "tld.h"
 #include <stdio.h>

 int main()
 {
   char *uri = "www.example.co.uk";
   struct tld_info info;
   enum tld_result r;

   r = tld(uri, &info);
   if(r == TLD_RESULT_SUCCESS) {
     const char *tld = info.f_tld;
     const char *s = uri + info.f_offset - 1;
     while(s > uri) {
       if(*s == '.') {
         ++s;
         break;
       }
       --s;
     }
     // here uri points to your sub-domains, the length is "s - uri"
     // if uri == s then there are no sub-domains
     // s points to the domain name, the length is "info.f_tld - s"
     // and info.f_tld points to the TLD
     //
     // When TLD_RESULT_SUCCESS is returned the domain cannot be an
     // empty string; also the TLD cannot be empty, however, there
     // may be no sub-domains.
     printf("Sub-domain(s): \"%.*s\"\n", (int)(s - uri), uri);
     printf("Domain: \"%.*s\"\n", (int)(info.f_tld - s), s);
     printf("TLD: \"%s\"\n", info.f_tld);
   }
 }
Parameters:
[in]uriThe URI to be checked.
[out]infoA pointer to a tld_info structure to save the result.
Returns:
One of the TLD_RESULT_... enumeration values.

Definition at line 315 of file tld.c.

References tld_description::f_category, tld_info::f_category, tld_description::f_country, tld_info::f_country, tld_description::f_end_offset, tld_description::f_exception_apply_to, tld_description::f_exception_level, tld_info::f_offset, tld_description::f_start_offset, tld_description::f_status, tld_info::f_status, tld_info::f_tld, search(), TLD_CATEGORY_UNDEFINED, tld_descriptions, tld_end_offset, tld_max_level, TLD_RESULT_BAD_URI, TLD_RESULT_INVALID, TLD_RESULT_NO_TLD, TLD_RESULT_NOT_FOUND, TLD_RESULT_NULL, TLD_RESULT_SUCCESS, tld_start_offset, TLD_STATUS_EXCEPTION, TLD_STATUS_UNDEFINED, and TLD_STATUS_VALID.

Referenced by main(), snap::output_tlds(), snap::read_tlds(), search(), test_all(), test_invalid(), test_tlds(), and test_unknown().

const char* tld_version ( )

This functino returns the version of this library. The version is defined with three numbers: <major>.<minor>.<patch>.

You should be able to use the libversion to compare different libtld versions and know which one is the newest version.

Returns:
A constant string with the version of the library.

Definition at line 432 of file tld.c.

References LIBTLD_VERSION.

Referenced by main().

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Defines

This document is part of the libtld Project.

Copyright by Made to Order Software Corp.