歡迎來到全傑科技

今日訪客: 0
線上訪客:

本軟體最新開課活動

目前沒有相關活動!!!
本軟體之前的活動!!

下載專區 Download

活動資訊

  • 目前尚無任何訓練課程!!

聯絡我們

姓名:
Email:
聯絡電話:
單位:
部門:
附件:
您的留言:

提供專業軟體代購服務
如有未列於網站之產品需求
歡迎來電洽詢,感謝您!
電話:(02)2507-8298

GritBot
資料採礦先期異常資料辨認軟體
identify data anomalies
軟體代號:7648
瀏覽次數:9
WindowsXPWindowsVISTAWindows7
產品介紹!

GritBot 資料採礦先期異常資料辨認軟體

GriBot 軟體簡介:
GritBot是一個全自動的工具,會去找出龐大資料內的異常和錯誤的部份,非常適合與See5和GritBot一起使用,因為資料分析的一個重要原則就是「垃圾進,垃圾出」,本軟體可以預先發現問題,不論是離散類別或連續的資料,可處理的資料高達數十萬筆,任何出現的異常資料,並會附上異常原因的報告,由於GritBot幾乎是全自動執行,因此您不必具備任何統計或資料分析的知識,本公司也有相關教育訓練課程協助您。

GriBot 應用範例:
Medical Data、Census Income Data、Telecommunications Churn、Crystallography Data、Marine Biology Data、Genetics Data、Agricultural Data

GriBot 功能特色:
‧GritBot has been designed to analyze substantial databases containing tens or hundreds of thousands of records and many numeric or nominal fields.
‧Possibly anomalous values that GritBot identifies are reported, together with an explanation of why each value seems surprising.
‧The patterns found by GritBot can be saved and used to check new data. Potential anomalies found in new data can differ from the types of anomalies originally identified.
‧GritBot is virtually automatic -- the user does not require a knowledge of Statistics or Data Analysis.
‧GritBot is available for Win 2000/Xp/ Vista/7 and Linux.

SAMPLE 1:Census Income Data
The next application uses data from the 1994 US Census extracted by Kohavi and Becker. Fourteen demographic attributes cover matters such as age, sex, marital status, education, employment class, hours per week, capital gain or loss, and income band (below or above $50,000). There are 48,841 cases divided between data and test files. GritBot takes 6.8 seconds to find 75 possible anomalies such as the following:

data case 7110:  [0.000]
        sex = Female  (19716 cases, 99.99% `Male')
          relationship = Husband

data case 576:  [0.002]
        sex = Male  (2331 cases, 99.87% `Female')
          relationship = Wife

test case 5954:  [0.002]
        class = >50K  (4449 cases, 99.96% `<=50K')
          age <= 21 [20]
          education-num <= 12 [8]
          marital-status = Never-married
          capital-gain <= 7000 [0]

data case 15377:  [0.004]
        class = <=50K  (1357 cases, 99.71% `>50K')
          age > 36 and <= 59 [55]
          capital-gain > 7000 [34095]

The first two of these are obviously data errors and illustrate GritBot's ability to find suspicious nominal (discrete) values as well as odd-looking numerical values.

 

SAMPLE 2:Telecommunications Churn
The MLC++ site at SGI contains several interesting datasets including simulated telecommunications churn data. ("Churn" here has nothing to do with making butter -- it's about customers changing providers.)

The training and test files contain a total of 5000 cases, each described by 21 attributes. GritBot analyzes them in 2.4 seconds and finds two possible anomalies:

test case 1570:  [0.001]
voice mail plan = yes  (3678 cases, 99.97% `no')
number vmail messages <= 0 [0]

data case 15:  [0.016]
class = 0  (75 cases, 99% `1')
total day minutes <= 135 [120.7]
number customer service calls > 3 [4]

The first highlights someone paying for a voice mail plan who has received no voice mail messages. The second describes a non-churning customer who is a light user but has numerous service calls.

 

SAMPLE 3:Crystallography Data
The data for this example were provided by Dr John Rodgers of National Research Council Canada. The data and test files contain a total of 34,641 cases, each describing 122 properties of a substance such as the number of atoms of each element that it contains, the number of atoms belonging to each periodic table family, density, crystal structure group, and whether it is magnetic. GritBot  requires 21.7 seconds to identify just one potential anomaly:

test case 4190:
(label AL2562/Al8 Dy Fe4/MN12 Th/tI26)  [0.006]
......Magnetic = neg  (352 cases, 99.4% `pos')
......Fe > 3 [4]
......Group = tI26

GritBot has found a subset of 352 cases, most of them magnetic, among which this non-magnetic case stands out. Since only 7% of the cases in the entire dataset are noted as being magnetic, this potential anomaly is indeed interesting.

 

SAMPLE 4:Genetics Data
This application's data, assembled by Towell, Noordewier, and Shavlik, concern splicing sites in genes. There are 3190 cases, each described by 61 attributes representing a "window" of 60 residues (amino acids, normally A, G, T, or C) and information on whether the center of the window is a splice junction (intron-extron, extron-intron) or not.

GritBot finds two possible anomalies in these cases (in 1.4 seconds):

case 550:  [0.009]
A30 = C  (657 cases, 99.8% `G')
A34 = G
class = EI

case 839:  [0.010]
A28 = T  (606 cases, 99.8% `A')
A27 = C
class = IE

There are 657 extron-intron junction cases that have G in position 34, all of them (except case 550) also having a G in position A30. Similarly, among the 606 cases that are intron-extron junctions and for which the residue in position A27 is C, all except case 839 have A in position A28.