libtld 1.2.0

tld.h.in File Reference

Go to the source code of this file.

Classes

struct  tld_info
 Set of information returned by the tld() function. More...

Defines

#define LIB_TLD_H
#define LIBTLD_VERSION   "@LIBTLD_VERSION_MAJOR@.@LIBTLD_VERSION_MINOR@.@LIBTLD_VERSION_PATCH@"
#define LIBTLD_VERSION_MAJOR   @LIBTLD_VERSION_MAJOR@
#define LIBTLD_VERSION_MINOR   @LIBTLD_VERSION_MINOR@
#define LIBTLD_VERSION_PATCH   @LIBTLD_VERSION_PATCH@

Enumerations

enum  tld_category {
  TLD_CATEGORY_INTERNATIONAL, TLD_CATEGORY_PROFESSIONALS, TLD_CATEGORY_LANGUAGE, TLD_CATEGORY_GROUPS,
  TLD_CATEGORY_REGION, TLD_CATEGORY_TECHNICAL, TLD_CATEGORY_COUNTRY, TLD_CATEGORY_ENTREPRENEURIAL,
  TLD_CATEGORY_UNDEFINED
}
enum  tld_result {
  TLD_RESULT_SUCCESS, TLD_RESULT_INVALID, TLD_RESULT_NULL, TLD_RESULT_NO_TLD,
  TLD_RESULT_BAD_URI, TLD_RESULT_NOT_FOUND
}
 

The result returned by tld().

More...
enum  tld_status {
  TLD_STATUS_VALID, TLD_STATUS_PROPOSED, TLD_STATUS_DEPRECATED, TLD_STATUS_UNUSED,
  TLD_STATUS_RESERVED, TLD_STATUS_INFRASTRUCTURE, TLD_STATUS_UNDEFINED, TLD_STATUS_EXCEPTION = 100
}
 

Defines the current status of the TLD.

More...

Functions

enum tld_result tld (const char *uri, struct tld_info *info)
 Get information about the TLD for the specified URI.
const char * tld_version ()
 Return the version of the library.

Define Documentation

#define LIB_TLD_H

Definition at line 19 of file tld.h.in.

#define LIBTLD_VERSION   "@LIBTLD_VERSION_MAJOR@.@LIBTLD_VERSION_MINOR@.@LIBTLD_VERSION_PATCH@"

Definition at line 28 of file tld.h.in.

Referenced by tld_version().

#define LIBTLD_VERSION_MAJOR   @LIBTLD_VERSION_MAJOR@

Definition at line 25 of file tld.h.in.

#define LIBTLD_VERSION_MINOR   @LIBTLD_VERSION_MINOR@

Definition at line 26 of file tld.h.in.

#define LIBTLD_VERSION_PATCH   @LIBTLD_VERSION_PATCH@

Definition at line 27 of file tld.h.in.


Enumeration Type Documentation

Defines the category of the TLD. The most well known categories are International TLDs (such as .com and .info) and the countries TLDs (such as .us, .uk, .fr, etc.)

IANA offers and is working on other extensions such as .pro for profesionals, and .arpa for their internal infrastructure.

Enumerator:
TLD_CATEGORY_INTERNATIONAL 

International TLDs.

This category represents TLDs that can be used by anyone anywhere in the world. In some cases, these have some limits (i.e. only a museum can register a .museum TLD.) However, the most well known international extension is .com and this one has absolutely no restrictions.

TLD_CATEGORY_PROFESSIONALS 

Professional TLDs.

This category is offered to professionals. Some countries already offer second-level domain name registrations for professionals and either way they are not used very much. These are reserved for people such as accountants, attorneys, and doctors.

Only people who have a lisence with a government can register a .pro domain name.

TLD_CATEGORY_LANGUAGE 

Language specific TLDs.

At time of writing, there is one language extension: .cat for the Catalan language. The idea of the language extensions is to offer a language, rather than a country, a way to have a website that all the people on the Earth can read in their language.

TLD_CATEGORY_GROUPS 

Groups specific TLDs.

The concept of groups is similar to the language grouping, but in this case it may reference to a specific group of people (but not based on anything such as etnicity.)

Examples of groups are Kids, Gay people, Ecologists, etc. This is only proposed at this point.

TLD_CATEGORY_REGION 

Region specific TLDs.

It has been proposed, like the .eu, to have extensions based on well defined regions such as .asia for all of Asia. We currently also have .aq for Antartique. Some proposed regions are .africa and city names such as .paris and .wien.

Old TLDs that were for countries but are not assigned to those because the country disappeared (i.e. in general was split in two and both new countries have different names,) and future regions appear in this category.

We keep old TLDs because it is not unlikely that such will be used every now and then and they can, in this way, cleanly be refused by your software.

TLD_CATEGORY_TECHNICAL 

Technical extensions are considered internal.

These are likely valid (i.e. the .arpa is valid) but are used for technical reasons and not for regular URIs. So they are present but must certainly be ignored by your software.

To avoid returning TLD_RESULT_SUCCESS when a TLD with such a category is found, we mark these with the TLD_STATUS_INFRASTRUCTURE.

TLD_CATEGORY_COUNTRY 

A country extension.

Most of the extensions are country extensions. Country extensions are generally further broken down with second-level domain names. Some countries even have third, forth, and fifth level domain names.

TLD_CATEGORY_ENTREPRENEURIAL 
TLD_CATEGORY_UNDEFINED 

The TLD was not found.

This category is used to initialize the information structure and is used to show that the TLD was not found.

Definition at line 30 of file tld.h.in.

enum tld_result

This enumeration defines all the possible results of the tld() function.

Only the TLD_RESULT_SUCCESS is considered to represent a valid result.

The TLD_RESULT_INVALID represents a TLD that was found but is not currently marked as valid (it may be deprecated or proposed, for example.)

Enumerator:
TLD_RESULT_SUCCESS 

Success! The TLD of the specified URI is valid.

This result is returned when the URI includes a valid TLD. The function further includes valid results in the tld_info structure.

You can accept this URI as valid.

TLD_RESULT_INVALID 

The TLD was found, but it is marked as invalid.

This result represents a TLD that is not valid as is for a URI, but it was defined in the TLD data. The function includes further information in the tld_info structure. There you can check the category, status, and other parameters to determine what the TLD really represents.

It may be possible to use such a TLD, although as far as web addresses are concerned, these are not considered valid. As mentioned in the statuses, some may mean that the TLD can be changed for another and work (i.e. a country name that changed.)

TLD_RESULT_NULL 

The input URI is empty.

The tld() function returns this value whenever the input URI pointer is NULL or the empty string (""). Obviously, no TLD is found in this case.

TLD_RESULT_NO_TLD 

The input URI has no TLD defined.

Whenever the URI does not include at least one period (.), this error is returned. Local URIs are considered valid and don't generally include a period (i.e. "localhost", "my-computer", "johns-computer", etc.) We expect that the tld() function would not be called with such URIs.

A valid Internet URI must include a TLD.

TLD_RESULT_BAD_URI 

The URI includes characters that are not accepted by the function.

This value is returned if a character is found to be incompatible or a sequence of characters is found incompatible.

At this time, tld() returns this error if two periods (.) are found one after another. The errors will be increased with time to detect invalid characters (anything outside of [-a-zA-Z0-9.%].)

Note that the URI should not start or end with a period. This error will also be returned (at some point) when the function detects such problems.

TLD_RESULT_NOT_FOUND 

The URI has a TLD that could not be determined.

The TLD of the URI was searched in the TLD data and could not be found there. This means the TLD is not a valid Internet TLD.

Definition at line 59 of file tld.h.in.

enum tld_status

Each TLD has a status. By default, it is generally considered valid, however, many TLDs are either proposed or deprecated.

Proposed TLDs are not yet officially accepted by the official entities taking care of those TLDs. They should be refused, but may become available later.

Deprecated TLDs were in use before but got dropped. They may be dropped because a country doesn't follow up on their Internet TLD, or because the extension is found to be boycotted.

Enumerator:
TLD_STATUS_VALID 

The TLD is currently valid.

This status represents a TLD that is currently fully valid and supported by the owners.

These can be part of URIs representing valid resources.

TLD_STATUS_PROPOSED 

The TLD was proposed but not yet accepted.

The TLD is nearly considered valid, at least it is in the process to get accepted. The TLD will not work until officially accepted.

No valid URIs can include this TLD until it becomes TLD_STATUS_VALID.

TLD_STATUS_DEPRECATED 

The TLD was once in use.

This status is used by TLDs that were valid (TLD_STATUS_VALID) at some point in time and was changed to another TLD rendering that one useless (or incorrect in the case of a country name change.)

This status means such URIs are not to be considered valid. However, it may be possible to emit a 301 (in terms of HTTP protocol) to fix the problem.

TLD_STATUS_UNUSED 

The TLD was officially assigned but not put to use.

This special status is used for all the TLDs that were assigned to a specific entity, but never actually put to use. Many smaller countries (especially islands) are assigned this status.

Unused TLDs are not valid in any URI until marked valid.

TLD_STATUS_RESERVED 

The TLD is reserved so no one can use it.

This special case forces the specified TLDs into a "do not use" list. Seeing such TLDs may happen by people who whish it were official, but it is not considered legal.

A reserved TLD may represent a second TLD that was assigned to a specific country or other category. It may be possible to do a transfer from that TLD to the official TLD (i.e. Great Britain was assigned .gb, but instead uses .uk; URIs with .gb could be transformed with .uk and checked for validity.)

TLD_STATUS_INFRASTRUCTURE 

These TLDs are reserved for the Internet infrastructure.

These TLDs cannot be used with standard URIs. These are used to make the Internet functional instead.

All URIs for standard resources must refuse these URIs.

TLD_STATUS_UNDEFINED 

Special status to indicate we did not find the TLD.

The info structure is returned with an undefined status whenever the TLD could not be found in the list of existing TLDs. This means the URI is completely invalid. (The only exception would be if you support some internal TLDs.)

URI what cannot get a TLD_STATUS_VALID should all be considered invalid. But those marked as TLD_STATUS_UNDEFINED are completely invalid. This being said, you may want to make sure you passed the correct string. The URI must be just and only the set of sub-domains, the domain, and the TLDs. No protocol, slashes, colons, paths, query strings, anchors are accepted in the URI.

TLD_STATUS_EXCEPTION 

Definition at line 43 of file tld.h.in.


Function Documentation

enum tld_result tld ( const char *  uri,
struct tld_info info 
)

The tld() function searches for the specified URI in the TLD descriptions. The results are saved in the info parameter for later interpretetation (i.e. extraction of the domain name, sub-domains and the exact TLD.)

The function extracts the last extension of the URI. For example, in the following:

 example.co.uk

the function first extracts ".uk". With that extension, it searches the list of official TLDs. If not found, an error is returned and the info parameter is set to unknown.

When found, the function checks whether that TLD (".uk" in our previous example) accepts sub-TLDs (second, third, forth level TLDs.) If so, it extracts the next TLD entry (the ".co" in our previous example) and searches for that second level TLD. If found, we again try with the third level, etc. until all the possible TLDs were exhausted. At that point, we return the last TLD we have found. In case of ".co.uk", we return the information of the ".co" TLD, second-level domain name.

The info structure includes:

  • f_category -- the category of TLD, unless set to TLD_CATEGORY_UNDEFINED, it is considered valid
  • f_status -- the status of the TLD, unless set to TLD_STATUS_UNDEFINED, it was defined from the tld_data.xml file; however, only those marked as TLD_STATUS_VALID are considered to currently be in use, all the other statuses can be used by your software, one way or another, but it should not be accepted as valid in a URI
  • f_country -- if the category is set to TLD_CATEGORY_COUNTRY then this pointer is set to the name of the country
  • f_tld -- is set to the full TLD of your domain name; this is a pointer WITHIN your uri string so make sure you keep your URI string valid if you intend to use this f_tld string
  • f_offset -- the offset to the first period within the domain name TLD (i.e. in our previous example, it would be the offset to the first period in ".co.uk", so in "example.co.uk" the offset would be 7. Assuming you prepend "www." to have the URI "www.example.co.uk" then the offset would be 11.)
Note:
In our previous example, the ".uk" TLD is properly used: it includes a second level domain name (".co".) The URI "example.uk" should return TLD_RESULT_INVALID since .uk by itself is supposed to be acceptable. However, in that special case, there are still some companies using second level domain names, so we would accept "example.uk". However, the ".bd" is not accepted at second level, so "example.bd" returns an error (TLD_RESULT_INVALID).

Assuming that you always get valid URIs, you should get one of those results:

  • TLD_RESULT_SUCCESS -- success! the URI is valid and the TLD was properly determined; use the f_tld or f_offset to extract the TLD domain and sub-domains
  • TLD_RESULT_INVALID -- known TLD, but not currently valid; this result is returned when we know that the TLD is not to be accepted

Other results are return when the input string is considered invalid.

Note:
The function only accepts a bare URI, in other words: no protocol, no path, no anchor, no query string. Also, it should not start and/or end with a period or you are likely to get an invalid response. (i.e. don't use ".example.co.uk.")
 // from example.c
 #include "tld.h"
 #include <stdio.h>

 int main()
 {
   char *uri = "www.example.co.uk";
   struct tld_info info;
   enum tld_result r;

   r = tld(uri, &info);
   if(r == TLD_RESULT_SUCCESS) {
     const char *tld = info.f_tld;
     const char *s = uri + info.f_offset - 1;
     while(s > uri) {
       if(*s == '.') {
         ++s;
         break;
       }
       --s;
     }
     // here uri points to your sub-domains, the length is "s - uri"
     // if uri == s then there are no sub-domains
     // s points to the domain name, the length is "info.f_tld - s"
     // and info.f_tld points to the TLD
     //
     // When TLD_RESULT_SUCCESS is returned the domain cannot be an
     // empty string; also the TLD cannot be empty, however, there
     // may be no sub-domains.
     printf("Sub-domain(s): \"%.*s\"\n", (int)(s - uri), uri);
     printf("Domain: \"%.*s\"\n", (int)(info.f_tld - s), s);
     printf("TLD: \"%s\"\n", info.f_tld);
   }
 }
Parameters:
[in]uriThe URI to be checked.
[out]infoA pointer to a tld_info structure to save the result.
Returns:
One of the TLD_RESULT_... enumeration values.

Definition at line 315 of file tld.c.

References tld_description::f_category, tld_info::f_category, tld_description::f_country, tld_info::f_country, tld_description::f_end_offset, tld_description::f_exception_apply_to, tld_description::f_exception_level, tld_info::f_offset, tld_description::f_start_offset, tld_description::f_status, tld_info::f_status, tld_info::f_tld, search(), TLD_CATEGORY_UNDEFINED, tld_descriptions, tld_end_offset, tld_max_level, TLD_RESULT_BAD_URI, TLD_RESULT_INVALID, TLD_RESULT_NO_TLD, TLD_RESULT_NOT_FOUND, TLD_RESULT_NULL, TLD_RESULT_SUCCESS, tld_start_offset, TLD_STATUS_EXCEPTION, TLD_STATUS_UNDEFINED, and TLD_STATUS_VALID.

Referenced by main(), snap::output_tlds(), snap::read_tlds(), search(), test_all(), test_invalid(), test_tlds(), and test_unknown().

const char* tld_version ( )

This functino returns the version of this library. The version is defined with three numbers: <major>.<minor>.<patch>.

You should be able to use the libversion to compare different libtld versions and know which one is the newest version.

Returns:
A constant string with the version of the library.

Definition at line 432 of file tld.c.

References LIBTLD_VERSION.

Referenced by main().

 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Defines

This document is part of the libtld Project.

Copyright by Made to Order Software Corp.