Tuesday, May 17, 2016

ELK . OpenAM . OpenDJ

We have ELK (Elasticsearch, Logstash and Kibana) deployed for a long time OpenAM/OpenDJ customer of ours some time back. The idea is not new. Similar solutions have been deployed by some other ForgeRock folks/partners.


We know ELK can only keep trends, but not send notification. (Ok ok, Elastic does offer Watcher for its commercial version) What we intend to do is to add Notification Service side-by-side with ELK. No, we do not intend to keep all data from OpenAM/OpenDJ in Elastic and to trigger alert from there. Some data are not useful to keep in Elastic (e.g. total entries count from all OpenDJ to determine if replication is operating optimally). We just need a simple cache layer (e.g. Ehcache) to keep these types of "data-in-transit" in order to trigger alerts to administrators/operators.

I'll talk more about this next time.

But so far, how useful is ELK to customer? Pretty good feedback.


"Login Failed Server Trend Live" - This is a live trend where by Logstash agents send "live" data from all OpenAM servers by monitoring amAuthentication.error logs. This tracks the user login failure events. 




If the user login failure count is high for a particular day for a particular OpenAM node, we can zoom into amAuthentication.error log to find out more.




"Invalid Password Server Trend Live" - This tracks the user invalid password events. 

This trend is different from the previous. An Invalid Password event happens when a user id is correct, but password is invalid. 



"2016-05-10 13:00:43"   "Invalid Password"      "Not Available" UID=ntustc001,ou=XXXX,o=xxx.sg    202.83.xx.xxx   INFO    o=xxx.sg  AUTHENTICATION-201      "cn=dsameuser,ou=DSAME Users,o=xxx.sg"       "Not Available" LDAP    202.83.xx.xxx

A quick zoom into amAuthentication.error log reveals a particular user was attempting to log in with an invalid password.

[amuser@f1]$ cat amAuthentication.error.20160510 | wc -l
18475

[amuser@f1]$ cat amAuthentication.error.20160510 | grep -i ntustc001 | wc -l
18039


A total of >18k invalid login attempts. That's quite unusual.

This is where the customer service personnel can call up their paying customer to find out what exactly happened and if he/she requires a password reset service.

Proactive customer engagement model!



By the way, if you look at amAuthentication.error in-depth, you might see some Chinese characters like 登录失败  (Login Failed) and 无效密码 (Invalid Password). These are traffic from Chinese locale browsers.


"2016-05-10 07:00:33"   登录失败        "Not Available" "Not Available" 118.176.xx.xxx  INFO    o=xxx.sg  AUTHENTICATION-200      "cn=dsameuser,ou=DSAME Users,o=xxx.sg"    "Not Available" LDAP    118.176.xx.xxx

"2016-05-10 08:56:22"   无效密码        "Not Available" uid=A480,ou=xxx,o=xxx.sg  
     202.4.xxx.xx    INFO    o=xxx.sg  AUTHENTICATION-201      "cn=dsameuser,ou=DSAME Users,o=xxx.sg"    "Not Available" LDAP    202.4.xxx.xx



No comments:

Post a Comment