|
May
18
|
|
Filed Under (Anti spam) by admin on 18-05-2007
A Review on Anti-Spam Technique
To prevent e-mail spam, both end users and administrators of e-mail systems use various anti-spam techniques. None of the techniques is a complete solution to the spam problem, and each has trade offs between incorrectly rejecting legitimate e-mail vs. not rejecting all spam, and associated costs in time and effort.
Anti-spam techniques can be broken into two broad categories: those that require actions by individuals, and those that can be automated.
End-user anti-spam techniques
There are a number of techniques that individuals can use to reduce the problems associated with spam.
Enable and Configure Automatic Techniques
Many Internet service providers and e-mail clients have automated anti-spam systems installed, or can have optional systems added. Since all anti-spam techniques can cause legitimate e-mail to be incorrectly identified as spam, many anti-spam systems are either not enabled by default or are configured to be very conservative about what will be identified as spam.
E-mail Address Harvesting
Address harvesting are methods that spammers use to obtain e-mail addresses of real people, and if the spammers can’t learn of the address, the address is less likely to be sent spam.
Most people want new people to be able to contact them via e-mail and many people cannot hide their e-mail addresses. While preventing spammers from obtaining email addresses does not solve the spam problem any more than avoiding the high crime areas of a city solves crime, individuals need to weigh the risks.
One way that spammers obtain email addresses to target is to trawl the Web and Usenet for strings which look like addresses, using a spambot. Contact forms and address munging are good ways to prevent email addresses from appearing on these forums.
There are other ways that spammers can get addresses, such as dictionary attacks in which the spammer generates a number of likely-to-exist addresses out of names and common words. For instance, if there is someone with the address adam@example.com, where ‘example.com’ is a popular ISP or mail provider, it is likely that he frequently receives spam.
Address munging
Posting anonymously, or with a fake name and address, is one way to avoid “address harvesting,” but users should ensure that the fake address is not valid. Users who want to receive legitimate email regarding their posts or Web sites can alter their addresses so humans can figure out but spammers cannot. For instance, joe@example.net might post as joeNOS@PAM.example.net.invalid, or display his email address as an image instead of text. Address munging, however, can cause legitimate replies to be lost. And if it’s not the user’s valid address, it has to be truly invalid, otherwise someone or some server will still get the spam for it.
Contact Forms
Contact forms allow users to send email by filling out forms in a web browser. The web server takes the form data, forwarding it to an email address. The user never sees the email address. Contact forms have the drawback that they require a website that supports server side scripts. They are also inconvenient to the message sender as they are not able to use their preferred e-mail client. Finally if the software used to run the contact forms is badly designed they can become spam tools in their own right.
Disposable e-mail addresses
Many email users sometimes need to give an address to a site without complete assurance that the site will not send out spam. One way to mitigate the risk is to provide a disposable email address—a temporary address which forwards email to a real account, which the user can disable or abandon. A number of services provide disposable address forwarding. Addresses can be manually disabled, can expire after a given time interval, or can expire after a certain number of messages have been forwarded.
to be continued…
Source: en.wikipedia.org
|
|
May
18
|
|
Filed Under (Anti spam) by admin on 18-05-2007
SpamExperts Home is an automatic spam filter that works with any email program and automatically intercepts spam - no configuration or changes to your existing email setup are required! Even though the program does require an initial training phase (you simply need to tell it which messages are good and which are spam), it does not rely on keyword lists to detect spam, but automatically learns from the content of the messages that you accept or reject within a few days. The email that is identified as spam is either tagged in the subject line and can then be sorted into a spam folder (using your email rules), or you can choose to block it completely, in which case it is not downloaded to your email program but can still be reviewed and recovered (if needed) from the SpamExperts cache. SpamExperts Home can also check your email periodically, filtering spam in the background, and maintain a list of blocked and allowed senders.
Pros: No email configuration needed; very accurate results after a few days; spam does not need to be downloaded; filtered spam can be recovered if needed. Multilingual interface is available. Currently, the following languages are supported: English, German, Russian, Italian, Spanish, Polish and Dutch.
Cons: Does not remember interface size; high memory usage
Downloading of the application is available at the Download Link. There is a commercial version of the application - SpamExperts Professional - too.
After initial instaling of the program it asks you to correctly classify some 10 letters as spam, and some 10 letters as non-spam (valid mail) - initial learning period. The program also asks the user to confirm the flag Periodic Mail Delivery. If this flag is set, the program will automatically check your registration records every 5 minutes for new incoming mail.
In the Main Window of the application there are a list of incoming mail in the form of a grid; the data and content of the current e-mail letter are shown below. On the left in the Main Window there are buttons for manually classifying incoming letters as Spam, Not Spam and Unsure. There are also buttons for the whitelist (Allowed Senders) and blacklist (Blocked Senders).
The Main Menu of the application has three options: File, Edit and Help.
The option File opens File Dialog which contains only two options: Close and Exit. The option Close closes the Main Window of the application but it remains active in the Windows taskbar. The option Exit closes the application altogether.
The option Edit opens the Edit popup containing also two options: Settings and Reset SpamExperts User Data.
Here is the view of the Settings window:
Finally, there is a Help popup which activates by the option Help of the Main Menu. It contains options for Open FAQ Page, Open Basic Instructions, Status, Submit bug report, Check for SpamExperts Updates, Update to Professional and About SpamExprets.
Basically, the application intercepts all incoming mail before it gets into your Inbox and compares it with the e-mail “fingerprints” in its spam database. If as a result of this analysis the program decides the incoming mail is valid it directs it into your Inbox; if the mail is classified as spam, it will be directed into your Trashcan.
Source: snapfiles.com
|
|
May
17
|
|
Filed Under (Anti spam) by admin on 17-05-2007
- The Bayesian method is multi-lingual and international - A Bayesian anti-spam filter, being adaptive, can be used for any language required. Most keyword lists are available in English only and are therefore quite useless in non-English speaking regions. The Bayesian filter also takes into account certain languages deviations or the diverse usage of certain words in different areas, even if the same language is spoken. The intelligence enables such a filter to catch more spam
- A Bayesian filter is difficult to fool, as opposed to a keyword filter - An advanced spammer who wants to trick a Bayesian filter can either use fewer words that usually indicate spam (such as free, Viagra, etc.), or more words that generally indicate valid mail (such as a valid contact name, etc.). Doing the latter is impossible because the spammer would have to know the e-mail profile of each recipient - and a spammer can never hope to gather this kind of information from every intended recipient. Using neutral words, for example the word “public”, would not work since these are are disregarded in the final analysis. Breaking up words assotiated with spam, such as using “m-o-r-t-g-a-g-e” instead of “mortgage”, will only increase the chance of the message being spam, since a legitimate user will rarely write the word “mortgage” as “m-o-r-t-g-a-g-e”.
Bayesian filters or updated keyword lists?
Some types of anti-spam software regularly download new keyword files. While this is, of course, better than not updating keyword lists, the fact is a rather patchy approach that is easily circumvented. Downloading updates makes it a little bit harder, but the principal system is flawed compared to a Bayesian filter.
What’s the catch?
Bayesian filtering, if implemented the right way and tailored to your company is by far the most effective technology to combat spam. Is there a downside? Well, in a way there is one downside, but this can easily be overcome: Before you can use and judge the Bayesian filter, you have to wait for it to learn for at least two weeks - that or create the ham or spam databases yourself. This task can be quite complex, so it is best to wait until the filter has had time to learn. Over time, the Bayesian filter becomes more and more effective as it learns more about your organization’s e-mail habits. To quote the old saying, good things come to those who wait.
It is important, therefore, to keep this in mind when evaluating anti-spam software. If the product has advanced, customized Bayesian analysis, then it can only be judged after a few weeks. It is probable that basic anti-spam software might perform better initially, but after a few weeks the Bayesian filter catches up and well outperforms the conventional anti-spam filters once and for all.
About GFI MailEssentials
GFI MailEssentials for Exchange/SMTP offers spam protection at server level and eliminates the need to install and update anti-spam software on each desktop. GFI MailEssentials offers a fast set-up and a high spam detection rate using Bayesian analysis and other methods - no configuration required, very low false positives through its automatic whitelists, and the ability to automatically adapt to your e-mail environment to constantly tune and improve spam detection. It also enables you to sort spam to users’ junk mail folders. GFI MailEssentials also adds key e-mail tools to your mail server: disclaimers, reporting, mail archiving and monitoring, server-based auto replies and POP3 downloading. More information and a full evaluation version are available at http://www.gfi.com/mes/
About GFI
GFI is a leading software developer that provides a single source for network administrators to address their network security, content security and messaging needs. With award-winning technology, an aggressive pricing strategy and a strong focus on small-to-medium sized businesses, GFI is able to satisfy the need for business continuity and productivity encountered by organizations on a global scale. Founded in 1992, GFI has offices in Malta, London, Raleigh, Hong Kong, Adelaide, Hamburg and Cyprus which support more than 160,000 installations worldwide. GFI is a channel-focused company with over 10,000 partners throughout the world. GFI is also a Microsoft Gold Certified Partner. More information about GFI can be found at http://www.gfi.com.
Source: the white paper why-bayesian-filtering.pdf on www.gfi.com
|
|
May
17
|
|
Filed Under (Anti spam) by admin on 17-05-2007
Creating the spam database
Besides ham mail, the Bayesian filter also relies on a spam data file. The spam data file must include a large sample of known spam and must be constantly updated with the latest spam by the anti-spam software. This will ensure that the Bayesian filter is aware of the latest spam tricks, resulting in a high spam detection rate (note: this is achieved once the required initial two-week learning period is over).
How the actual filtering is done
Once the ham and spam databases have been created, the word probabilities can be calculated and the filter is ready for use.
When a new mail arrives, it is broken down into words and the most relevant words - i.e., those that are most significant in identifying whether the mail is spam or not - are singled out. From these words, the Bayesian filter calculates the probability of the new message being spam or not. If the probability is greater than a threshold, say 0.9, then the message os classified as spam.
The Bayesian approach to spam is highly effective - a May 2003 BBC article reported that spam detection rates of over 99.7% can be achieved with a very low number of false positives.
Why Bayesian filtering is better?
- The Bayesian method takes the whole message into account - it recognizes keywords that identify spam, but it also recognizes words that denote valid mail. For example: not every e-mail that contains the word “free” and “cash” is spam. The advantage of the Bayesian filter is that it considers the most interesting words (as defined by their deviation from the mean) and comes up with a probability that a message is spam. The Bayesian method would find the words “cash” and “free” interesting but it would also recognize the name of the business contact who sent the message and thus classify the message as legitimate, for instance: it allows words to “balance” each other out. In other words, Bayesian filtering is a much more intelligent approach because it examines all aspects of a message, as opposed to keyword checking that classifies a mail as spam on the basis of a single word.
- A Bayesian filter is constantly self-adapting - By learning from new spam and new valid outbound mails, the Bayesian filter evolves and adapts to new spam techniques. For example, when spammers started using “f-r-e-e” instead of “free” they succeeded in evading keyword checking until “f-r-e-e” was also included in the keyword database. On the other hand, the Bayesian filter automatically notices such tactics; in fact, if the word “f-r-e-e” is found, it is an even better spam indicator, since it is unlikely to occur in a ham mail. Another example would be using the word “5ex” instead of “Sex”. You would probably not have a word 5ex in a ham mail, and therefore a likelyhood that it is spam increases.
- The Bayesian technique is sensitive to the user - it learns the e-mail habits of the company and understands that, for example, the word “mortgage” might indicate spam if the company running the filter is, say, a car dealership, whereas it would not indicate it as spam if the company is a financial institution dealing with mortgages.
…to be continued…
Source: the white paper why-bayesian-filtering.pdf on www.gfi.com
|
|
May
17
|
|
Filed Under (Anti spam) by admin on 17-05-2007
ePrompter is an e-mail notification program that automatically checks up to sixteen password-protected e-mail accounts for AOL, Hotmail, Juno, Netscape, USA.net, Rediffmail, Yahoo, POP3, AltaVista, Email.com, GO.com, iname, Lycos, Mail.com, MSN, and many other e-mail domains–simultaneously.
ePrompter not only lets you compose, forward, and reply to your retrieved messages, but also gives you the ability to delete unwanted spam or suspicious-looking mail without having to launch your e-mail program or having to go to your Web mail’s site.
ePrompter notification features include a unique rotating tray icon and a choice of five screensavers that let you know at a glance the number of new messages in each account, as well as audio alerts for new messages. ePrompter also includes autodial for automatic message retrieval.
Version 2.0 build 2 adds supports for Gmail.
ePrompter let’s you delete unwanted spam or suspicious looking mail - the kind that might contain viruses - again without having to launch your email program or go to your webmail’s site.
Feature Overview
- Read and delete email without opening your email client
- Unique Rotating Tray Icon shows # of new messages in each account.
- Choice of five screensavers that show # of new messages per account.
- Autodial goes online and retrieves messages at scheduled intervals.
- Choice of two audio alerts for email notification of new messages.
- Optional password protection for individual accounts.
- Message headers or full messages available.
- Delete messages while online or in autodial mode.
- Separate retrieval intervals for online and autodial modes.
- The ability to print messages.
- The ability to read hyperlinks.
- The ability to work with standard firewalls and proxy servers.
- The ability to retain retrieved messages when a computer is turned off.
- Lean installation - Minimal computer requirements.
- Easy download, setup and uninstall.
- No personal information required.
Downloading of the program is available at the Download Page. While installing the Installation Wizard collects data from your computer about your Internet connection, e-mail accounts and preferences.
ePrompter even features its of FAQ.
The small Main Window of the application features three option buttons: Menu, Update and Help.
The option Help displays Help topics. The option Update allows you to update manually information about incoming mail. And the option Menu opens a popup menu which contains options dealing with your e-mail accounts (Update All Mail, Interrupt Updating, Address Book, Setup, etc.)
Sources: gigenestuff.homestead.com, eprompter.com
|
|
May
17
|
|
Filed Under (Anti spam) by admin on 17-05-2007
Creating a tailor-made Bayesian word database
Before mail can be filtered using this method, the user needs to generate a database with words and tokens (such as the $ sign, IP addresses and domains, and so on), collected from a sample of spam mail and valid mail (referred to as ‘ham’)

A probability value is then assigned to each word or token; the probability is based on calculations that take into account how often that word occurs in spam as opposed to legitimate mail (ham). This is done by analyzing the user’s outbound mail and by analyzing known spam: All the words and tokens in both pools of mail are analyzed to generate the probability that a particular word points to the mail being spam.
This word probability is calculated as follows: If the word “mortgage” occurs in 400 of 3,000 spam mails and in 5 out of 300 legitimate emails, for example, then its spam probability would be 0.8889 (that is, [400/3000] divided by [5/300 + 400/3000]).
Creating the ham database (tailored to your company)
It is important to note that the analysis of ham mail is performed on the organization’s mail, and is therefore tailored to that particular organization. For example, a financial institution might use the word “mortgage” many times over and would get a lot of false positives if using a general anti-spam rule set. On the other hand, the Bayesian filter, if tailored to your company through an initial training period, takes note of the company’s valid outbound mail (and recognizes “mortgage” as being frequently used in legitimate messages), and therefore has a much better spam detection rate and a far lower false positive rate.
Note that some anti-spam software with very basic Bayesian capabilities, such as the Outlook spam filter or the Internet Message Filter in Exchange Server, does not create a tailored ham data file for your company, but ships a standard ham data file with the installation. Although this method does not require an initial learning period, it has two major flaws:
- The ham data file is publicly available and can thus be hacked by professional spammers and therefore bypassed. If the ham data file is unique to your company, then hacking the ham data file is useless. For example, there are hacks available to bypass the Microsoft Outlook 2003 or Exchange Server spam filter.
- Such a ham data file is a general one, and thus not tailored to your company, it cannot be as effective and you will suffer from noticeably higher false positives.
Source: the white paper why-bayesian-filtering.pdf on www.gfi.com
|
|
May
15
|
|
Filed Under (Anti spam) by admin on 15-05-2007
Anti-Spam technique - Statistical content filtering
Statistical filtering was first proposed in 1998 by Mehran Sahami et al., at the AAAI-98 Workshop on Learning for Text Categorization. A statistical filter is a kind of document classification system, and a number of machine learning researchers have turned their attention to the problem. Statistical filtering was popularized by Paul Graham’s influential 2002 article A Plan for Spam, which proposed the use of naive Bayes classifiers to predict whether messages are spam or not – based on collections of spam and nonspam (”ham”) email submitted by users.
Statistical filtering, once set up, requires no maintenance per se: instead, users mark messages as spam or nonspam and the filtering software learns from these judgements. Thus, a statistical filter does not reflect the software author’s or administrator’s biases as to content, but it does reflect the user’s biases as to content; a biochemist who is researching Viagra won’t have messages containing the word “Viagra” flagged as spam, because “Viagra” will show up often in his or her legitimate messages. Spam emails containing the word “Viagra”, however, do get filtered because of their unique content compared to legitimate messages. A statistical filter can also respond quickly to changes in spam content, without administrative intervention. Statistical filters should also look at message headers, thereby considering not just the content but also peculiarities of the transport mechanism of the email.
Spammers have attempted to fight statistical filtering by inserting many random but valid “noise” words or sentences into their messages while attempting to hide them from view, making it more likely that the filter will classify the message as neutral. (See Word salad (computer science).) Attempts to hide the noise words include setting them in tiny font or the same colour as the background. However, these noise countermeasures seem to have been largely ineffective.
Software programs that implement statistical filtering include Bogofilter, DSPAM, SpamBayes the e-mail programs Mozilla and Mozilla Thunderbird, Mailwasher, and later revisions of SpamAssassin. Another interesting project is CRM114 which hashes phrases and does bayesian classification on the phrases.
There is also the free mail filter POPFile which sorts mail in as many categories as you want (family, friends, co-worker, spam, whatever) with bayesian filtering.
Source: en.wikipedia.org
|
|
May
15
|
|
Filed Under (Anti spam) by admin on 15-05-2007
Anti-Spam technique - Rule-based content filtering
Until recently, content filtering techniques relied on mail administrators specifying lists of words or regular expressions disallowed in mail messages. Thus, if a site receives spam advertising “herbal Viagra”, the administrator might place these words in the filter configuration. The mail server would thence reject any message containing the phrase.
Content based filtering can also filter based on content other than the words and phrases that make up the body of the message. Primarily, this means looking at the header of the email, the part of the message that contains information about the message, and not the body text of the message. Spammers will often spoof fields in the header in order to hide their identities, or to try to make the email look more legitimate than it is; many of these spoofing methods can be detected. Also, spam sending software often produces a header that violates the RFC 2822 standard on how the email header is supposed to be formed.
Disadvantages of this static filtering are threefold: First, it is time-consuming to maintain. Second, it is prone to false positives. Third, these false positives are not equally distributed: manual content filtering is prone to reject legitimate messages on topics related to products advertised in spam. A system administrator who attempts to reject spam messages which advertise mortgage refinancing may easily inadvertently block legitimate mail on the same subject.
Finally, spammers can change the phrases and spellings they use, or employ methods to try to trip up phrase detectors. This means more work for the administrator. However, it also has some advantages for the spam fighter. If the spammer starts spelling “Viagra” as “V1agra” (see leet) or “Via_gra”, it makes it harder for the spammer’s intended audience to read their messages. If they try to trip up the phrase detector, by, for example, inserting an invisible-to-the-user HTML comment in the middle of a word (”Viagra”), this sleight of hand is itself easily detectable, and is a good indication that the message is spam. And if they send spam that consists entirely of images, so that anti-spam software can’t analyze the words and phrases in the message, the fact that there is no readable text in the body can be detected.
However, content filtering can also be implemented by examining the URLs present (i.e. spamvertising) in an email message. This form of content filtering is much harder to disguise as the URLs must resolve to a valid domain name. Extracting a list of such links and comparing them to published sources of spamvertised domains is a simple and reliable way to eliminate a large percentage of spam via content analysis.
Methods
Mail filters can be installed by the user, either as separate programs (see links below), or as part of their e-mail program (e-mail client). In e-mail programs, users can make personal, “manual” filters that then automatically filter mail according to the chosen criteria. Most e-mail programs now also have an automatic spam filtering function. Internet service providers can also install mail filters in their mail transfer agents as a service to all of their customers. Corporations often use them to protect their employees and their information technology assets.
Customization
Mail filters have varying degrees of configurability. Sometimes they make decisions based on matching a regular expression. Other times, keywords in the message body are used, or perhaps the e-mail address of the sender of the message. Some more advanced filters, particularly anti-spam filters, use statistical document classification techniques such as the naive Bayes classifier. Image filtering can also be used that use complex image analysis algorithms to detect skin-tones and specific body shapes normally associated with adult-images (pornographic images).
Source: en.wikipedia.org
|
|
May
14
|
|
Filed Under (Anti spam) by admin on 14-05-2007
Why Bayesian filtering is the most effective anti-spam technology?
Introduction
This white paper describes how Bayesian mathematics can be applied to the spam problem, resulting in an adaptive, ’statistical intelligence’ technique that achieves very high spam detection rates.
It also explains why the Bayesian approach is the best way to tackle spam once and for all, as it overcomes the obstacles faced by more static technologies such as blacklist checking, comparing to databases of know spam and keyword checking. These technologies are not obsolete, but cannot be relied upon without a Bayesian filter.
Current spam detection technologies
Spam is ever-increasing problem. The number of spam mails is increasing daily - studies show that over 50% of all current mail is spam; the Radicatti Group predicts it will reach 70% by 2007. Added to this, spammers are becoming more sophisticated and are constantly managing to outsmart ’static’ methods of fighting spam.
The techniques currently used by most anti-spam software are static, meaning that it is fairly easy to evade by tweaking the message a little. To do this, spammers simply examine the latest anti-spam techniques and find ways how to dodge them.
To effectively combat spam, an adaptive new technique is needed. The method must be familiar with spammers’ tactics as they change over time. It must also be able to adapt to the particular organization that it is protecting from spam. The answer lies in Bayesian mathematics.
How the Bayesian spam filter works
Bayesian filtering is based on the principle that most events are dependent and that the probability of an event occuring in the future can be inferred from the previous occurences of that event. (More information about the mathematical basis of Bayesian filtering is available at Bayesian Parameter Estimation -
http://www-ccrma.stanford.edu/~jos/bayes/Bayesian_Parameter_Estimation.html
and an Introduction to Bayesian Networks and their Contemporary Applications
http://www.niedermayer.ca/papers/bayesian/bayes.html)
This same technique can be used to classify spam. If some pieces of text occurs often in spam but not in legitimate mail, then it would be reasonable to assume that this email is probably spam.
to be continued…
Source: the white paper why-bayesian-filtering.pdf on www.gfi.com
|
|
May
14
|
|
Filed Under (Software) by admin on 14-05-2007
Spam Terrier (FreeWare)

Size: 1.58 MB
Type: Freeware
Requirements:
• Windows 98/Me/2000/XP
• Windows 2003 Server
• MS Outlook 2000
• MS Outlook Express 5.0
Spam Terrier brings comprehensive antispam protection to users of varying computer skills. It comes free and doesn`t stress even a low-end computer. Effortlessly integrating into Outlook and Outlook Express, the program provides a wide array of filtering options, including self-learning Bayesian-based filter that can be trained to individual preferences. Other filtering criteria include senders` white- and blacklists, customized message content that will be analyzed when the mail arrives. On-demand spam scan allows for scanning of already received messages throughout any folder. These, plus a handful of other innovative features make Spam Terrier the ideal antispam solution.
Downloading of the utility is available at the Download Link.
Source: freeware4pc.com
|
|
|