What is 'content filtering'? - LinuxQuestions.org

		LinuxQuestions.org > Articles > Technical
What is 'content filtering'?

Notices

Welcome to LinuxQuestions.org, a friendly and active Linux Community.

You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!

Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.

Are you new to LinuxQuestions.org? Visit the following links:
Site Howto | Site FAQ | Sitemap | Register Now

If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.

Having a problem logging in? Please visit this page to clear all LQ-related cookies.

Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.

Exclusive for LQ members, get up to 45% off per month. Click here for more info.

By rsean at 2007-06-29 11:17

To answer that question, let me begin by taking you back in history a bit, to catch-up!! Did you know that WWW as we know it today, has evolved out of an "Internet", that was originally conceived much differently. Yes, exchange of information and files was always there, but it happened rather differently! In fact WWW evolved much later than emails. Naturally security issues and solutions have also evolved, in the same foot-steps. We started using anti virus to check content of emails, and SPAM filters, etc. to manage the nuisance created by unwanted emails, cumulatively we relate to these two as content filtering for emails. Nearly similar situation presents itself today, as we access the WWW, and we use web-content filtering, to safeguard ourselves.

The entire evolution actually happened on two fronts (or layers as we call them technically) actually. The Network and the Application. Routers were built to inter-connect various networks; and Firewalls were built to ensure the connections happened, exactly as desired. Similarly on the application layer, proxy servers were created to service the needs of the various applications and content filters were built to ensure that the content was of acceptable nature. And even technically speaking "Firewalls are of two types - Network Layer & Application Layer"; is an accurate statement. And from the security perspective these two forms of firewalls are both required and have a different job to do. But we'll come to that in a moment.

Content Filtering helps to prevent abuse, misuse and any other security breaches when users and their applications access the WWW. Paradoxically "Content Filtering" by itself is a much abused term, that has led to a lot of general confusion. Simply speaking, it means defining "what may be allowed or denied accessed".

A legacy content filter allows you to define - just his "what", in terms of a set of web-site addresses. Whereas modern Content Filtering Software or an Application Layer Firewall - like SafeSquid (http://www.safesquid.com/), allows you to define this "what" more holistically and thus comprehensively address, the need to contextually relax or apply rules.

This definition of "what" therefore requires to be addressed in many more terms, rather than just web-site addresses. This "what" can be defined in terms of the actual nature of the content, and the definition is not necessarily restricted just by the web-site's address.

Every Proxy server is basically an Application Layer Firewall (ALF). Each of the the various filters in an ALF are individually governed by a global rule of Allow or Deny, and exceptions to the rule are set in the ALF's configuration, to precisely reflect the business needs of the implementation. Each of the filters' addresses one specific aspect of the content. This is quite similar in essence to a modern Network Layer Firewall (NLF). Primitive NLFs allowed you to merely allow or deny connections based on the source or target address in terms of I.P. Address and ports, however the more sophisticated developments allow you to even state protocols as parameter, besides other factors such as time of the day, and a more composite security by analyzing, the content (data packets), for malware, by referring the transported data packets, to an AntiVirus Software, or similar other technologies. However the inspection of the content is primarily the function and responsibility of the ALF. Some NLFs offer these functions as an additional feature, because it makes the NLF more beneficial and interesting from the TCO perspective.

Modern Application Layer Firewalls have a comprehensive set of individual filters or processes that holistically allow you to gain access and content control over the way your resources are used. This is achieved by employing a variety of filters, each serving a specific purpose. Some of these filters, parametrically analyze the content, in real-time and then take appropriate action, whereas some do not require the content to be actually downloaded, to take any action. Thus the focus is more on the logic behind an activity, rather than merely the act itself.

Almost all modern ALFs today minimally provide virus scanning of all the content transferred and thus deliver well as a Gateway Anti Virus. But a typical HTTP application is constituted by a variety of independent or inter-linked factors. A specific filter addresses a specific factor. Some ALFs like SafeSquid allow you to frame rules to define policies in terms of all of these features. The factors that can be commonly applicable are "Profiled" and then they are either subjected to (or immunized against) appropriate filters. These filters are either static or dynamic. Here's a list of some of the very important filters and their specific functions. Notice that the function is directly related to their conditional parameters.

* Access Restriction Allow or deny access to a user, and create a Profile.

Basic Conditional Parameters: username, I.P. Address.

* Offer additional privileges like:
o Global Bypass to one or more filters.
o Access to Browser based GUI.
o Any other privileges a user must always (uniquely) enjoy.

* URL Filter Allow or Deny access to content from a particular URL.

Basic Conditional Parameters: Hostname, I.P. Address, file name

* URL Blacklists Allow or Deny access to content from a web-sites listed under a specific category

Basic Conditional Parameters: Category

* Mime Filter Allow or Deny access to content of a particular content-type.

Basic Conditional Parameters: Mime-Type, File-name extensions.

* Cookie Filter Allow or deny exchange of cookie to or from a particular Domain.

Basic Conditional Parameters: Cookie's - Domain Attribute, Path Attribute, Expiry time ( year, month, hour, minute), Direction - Attributes (Inbound, Outbound)

* Keyword Filter Deny Access to web-sites containing unacceptable words or phrases

Basic Conditional Parameters: Patterns of Words and phrases, score

* Document Rewrite Replace or modify unacceptable portions of a web-page.

Basic Conditional parameters: Content Patterns that should be replaced, Pattern of replaced content

* Image Filter Deny Access to pornographic images.

Basic Conditional Parameters: Probability threshold, above which the image may be treated as pornographic

* DNS Blacklist Deny access to content served from malafide servers

Basic Conditional Parameters: The I.P. Address ( as reported for each malafide category)

1 comment

by narainhere on Thu, 2007-07-26 23:44

can u please explain the term probability threshold to block pornographic images?How this works?

Read full thread
Login or register to post comments

All times are GMT -5. The time now is 08:07 PM.

Main Menu

(Con't)

My LQ

Write for LQ

LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.

Main Menu

Syndicate

Latest Threads

LQ News

Twitter: @linuxquestions