GLCM/PGLCM: Efficient Parallel Mining of Closed Frequent Gradual Itemsets

GLCM/PGLCM are efficient algorithms for mining gradual closed frequent itemsets. Given a numerical dataset (e.g., biological databases, survey databases, data streams or sensor), a gradual itemset can be expressed as the covariation of several attributes, for example: “the higher the age, the higher the salary, the lower the loan”. GLCM/PGLCM inherit the ppc-extension from LCM algorithm by Hiroki Arimura and Takeaki Uno. See here for the original C code of LCM. LCM is the winner of the FIMI'04 competition about frequent itemset miners.

GLCM is the sequential version of the algorithm. It has the same strong advantages of LCM: linear time complexity and constant memory consumption.

PGLCM is the parallel implementation. This parallel implementation can take advantages of the strength of recent multi-process computer. It scales well with the number of available cores for complex datasets, where such computing power is really needed.

The low memory requirements of our two algorithms allow it to handle large real world datasets, which could not be handled by existing algorithms due to memory saturation. Our algorithms thus removed the lock that prevented the use of gradual patterns analysis in realistic applications.

You can find more details about GLCM/PGLCM, as well as a detailed experimental study about its performance in our ICDM 2010 paper: PDF.

Download:

codeGLCM_PGLCM.tar.gz

Compilation and Run:

Read the README.txt file in the download pakage.

GLCM/PGLCM is the Master Thesis work of Trong Dinh Thac Do, during is master internship between LIG laboratory, University of Grenoble and LIRMM laboratory, University of Montpellier
Current contact: Alexandre Termier. Feel free to contact me by email if you have any inquiry!