“犀◎牛鸟基金”于2013年由CCF和腾讯联♂合发起，旨在助力全球青年学者开展创新研究，推动科研成果应用转化，“将伟大的创想变成现实的影响”。五年来，CCF与腾讯联卐合组织了20余场次的“犀牛鸟沙龙”、 “犀牛鸟?学问” 论坛。作为“犀牛鸟基金” 五周年系列活动，由 CCF YOCSEF 深圳』与腾讯高校合作联合主办的 “未来数据智能”国际学术论坛将于3月29日在深圳举行。论坛将邀请到ACM, IEEE, AAAS, and SIAM Fellow, Professor Vipin Kumar, 港科大陈雷教授， 浙江大学纪守领教授， 香港大学Reynold Cheng副教授为您带来数据智能⌒ 研究最前沿。 欢迎参与！
雷凯 CCF YOCSEF深圳2017-2018主席
Talk 1: Big Data in Climate and Earth Sciences: Challenges and Opportunities for Machine Learning
University of Minnesota, Professor
Talk 2: Human-Powered Machine Learning
Hong Kong University of Science and Technology，Professor
Talk 3: Price TAG: Towards Automatically Discovery Tactics, Techniques and Procedures of E-Commerce Cyber Threat Intelligence
Talk 4: Meta Paths and Meta Structures: Analyzing Large Heterogeneous Information
Reynold C.K. Cheng
Big Data in Climate and Earth Sciences: Challenges and Opportunities for Machine Learning
University of Minnesota
The climate and earth sciences have recently undergone a rapid transformation from a data-poor to a data-rich environment. In particular, massive amount of data about Earth and its environment is now continuously being generated by a large number of Earth observing satellites as well as physics-based earth system models running on large-scale computational platforms. These massive and information-rich datasets offer huge potential for understanding how the Earth's climate and ecosystem have been changing and how they are being impacted by humans actions. This talk will discuss various challenges involved in analyzing these massive data sets as well as opportunities they present for both advancing machine learning as well as the science of climate change in the context of monitoring the state of the tropical forests and surface water on a global scale.
Vipin Kumar是明尼苏达大学计算机科学与工程系的ㄨ终身教授♂，同时也是William Norris主席。他的研究兴趣包括数据挖掘，高▲性能计算，以及它们在气候/生态系统和医疗保健中的应用。
Vipin Kumar is a Regents Professor and holds William Norris Chair in the department of Computer Science and Engineering at the University of Minnesota. His research interests include data mining, high-performance computing, and their applications in Climate/Ecosystems and health care. He is currently leading an NSF Expedition project on understanding climate change using data science approaches. He has authored over 300 research articles, and co-edited or coauthored 10 books including the widely used text book ``Introduction to Parallel Computing", and "Introduction to Data Mining". Kumar has served as chair/co-chair for many international conferences and workshops in the area of data mining and parallel computing, including 2015 IEEE International Conference on Big Data, IEEE International Conference on Data Mining (2002), and International Parallel and Distributed Processing Symposium (2001). Kumar is a Fellow of the ACM, IEEE, AAAS, and SIAM. Kumar's research has been honored by the ACM SIGKDD 2012 Innovation Award, which is the highest award for technical excellence in the field of Knowledge Discovery and Data Mining (KDD), and the 2016 IEEE Computer Society Sidney Fernbach Award, one of IEEE Computer Society's highest awards in high performance computing.
Human-Powered Machine Learning
Hong Kong University of Science and Technology
最近，机器学习变得∞非常流行和有吸引力，不仅对学术界，而且对工业界也是如←此。Alpha-go 和 Texas机器学习的成功事例引起了人们对机器学习的极大兴趣▂。
Recently, machine learning becomes quite popular and attractive, not only to academia but also to the industry. The successful stories of machine learning on Alpha-go and Texas hold 'em games raise significant interests on machine learning. The question is whether machine learning can do everything perfect? In this talk, I will first give several examples that current machine learning techniques have difficulty to perform well. Then, I will show by putting human in the machine-learning loop, the results can be significantly improved. After that, I will discuss the challenges and opportunities for this human-powered machine learning paradigm.
Lei Chen received the BS degree in computer science and engineering from Tianjin University, Tianjin, China, in 1994, the MA degree from Asian Institute of Technology, Bangkok, Thailand, in 1997, and the PhD degree in computer science from the University of Waterloo, Canada, in 2005. He is currently a full professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. His research interests include human-powered machine learning, crowdsourcing , social media analysis, probabilistic and uncertain databases, and privacy-preserved data publishing. The system developed by his team won the excellent demonstration award in VLDB 2014. He got the SIGMOD Test-of-Time Award in 2015. He is PC Track chairs for SIGMOD 2014, VLDB 2014, ICDE 2012, CIKM 2012, SIGMM 2011. He has served as PC members for SIGMOD, VLDB, ICDE, SIGMM, and WWW. Currently, he serves as Editor-in-Chief of VLDB Journal and an associate editor-in-chief of IEEE Transaction on Data and Knowledge Engineering. He is the secretary of the VLDB endowment.
Price TAG: Towards Automatically Discovery Tactics, Techniques and Procedures of E-Commerce Cyber Threat Intelligence
In Cyber Threat Intelligence (CTI), tactics, techniques and procedures (TTPs) characterize attack patterns, infrastructures or victim targeting associated with specific threat actors (e.g., bulletproof hosting used for malware hosting). Collecting TTPs helps organizations identify, mitigate and respond to cyber threat effectively. In this project, we make the first step to semi-automatically extract TTPs from e-commerce threat intelligence corpora. We build a system called TTP Automatic Generator (TAG) which specializes NLP techniques of topic term extraction and name entity recognition for TTP recognition. Starting from a seed set of 34 crawler keywords, TAG collects 22,380 e-commerce threat corpora across 8 months (from 2017/05 to 2018/01), successfully identifies 1,013 black keywords and 2,352 TTPs. After clustering the black keywords in 60 groups and mapping all the TTPs to each group, we find that TTPs extend and evolve in many e-commerce threats. We also discover 694 illicit websites and some attack campaigns. Further, we shed on light the longitude of e-commerce threat landscape. Moreover, TAG reveals many new TTPs, which are confirmed and applied into the anti-threat system in the e-commerce company Alibaba.
Biography: 纪守领，获美国佐治亚理工学院电子与计算机工程博士学位、佐治亚州立大学计算机科学博士学位，现任浙江↓大学“百人计划”研究员、博士生导师、信息安全专业系主任，兼任佐治亚理工学院Research Faculty、浙江大学网络空间安全研究中心主任助理、浙江大学-中兴通讯联合创新中心技术委员会委员，入选国家“青年千人”、浙江省“千人计划”。 研究兴趣包括数据驱动安全、AI安全、大数据安全隐私、对抗学习，先后主持国家自然科学基金面上项目、浙江省♀重点研发计划“网络空间安全”重点专项、CCF-腾讯“犀牛鸟”科研基金、CCF-绿盟“鲲鹏”科研基金、CCF-启明星辰“鸿雁”科研基金、阿里巴巴科研基金等多项，作为技术负责人或项目骨干，参加美国NSF项目8项。发表国际高水平论文80余篇、含CCF-A类论文近30篇（如ACM IEEE Transactions ToN, TDSC, TIFS, TMC, ACM CCS, USENIX Security等），出版英文¤专编著4部。曾获美国著名高校弗吉尼亚理工学院（Virginia Tech）、凯斯西储大学（Case Western Reserve University）、里海大学（Lehigh University）、佐治亚大学（University of Georgia）终身教职系列︼（Tenure-Track）助理教授offer，获中国国家优秀自费留学生奖、三项最佳/优秀论文奖、GSU杰出研究奖、ELSEVIER高引论文奖等。
Meta Paths and Meta Structures: Analysing Large Heterogeneous Information Networks
Hong Kong University
A heterogeneous information network (HIN) is a graph model in which objects and edges are annotated with types. Large and complex databases, such as YAGO and DBLP, can be modeled as HINs. A fundamental problem in HINs is the computation of closeness, or relevance, between two HIN objects. Relevance measures, such as PCRW, PathSim, and HeteSim, can be used in various applications, including information retrieval, entity resolution, and product recommendation. These metrics are based on the use of meta-paths, essentially a sequence of node classes and edge types between two nodes in a HIN. In this tutorial, we will give a detailed review of meta-paths, as well as how they are used to define relevance. In a large and complex HIN, retrieving meta paths manually can be complex, expensive, and error-prone. Hence, we will explore systematic methods for finding meta paths. In particular, we will study a solution based on the Query-by-Example (QBE) paradigm, which allows us to discovery meta-paths in an effective and efficient manner.
We further generalise the notion of meta path to "meta structures", which is a directed acyclic graph of object types with edge types connecting them. Meta structure, which is more expressive than the meta path, can describe complex relationship between two HIN objects (e.g., two papers in DBLP share the same authors and topics). We develop three relevance measures based on meta structure. Due to the computational complexity of these measures, we also study an algorithm with data structures proposed to support their evaluation. Finally, we will examine solutions for performing query recommendation based on meta-paths. We will also discuss future research directions in HINs.
Dr. Reynold Cheng is an Associate Professor of the Department of Computer Science in the University of Hong Kong. He was an Assistant Professor in HKU in 2008-11. He received his BEng ( Computer Engineering ) in 1998, and MPhil ( Computer Science and Information Systems ) in 2000, from the Department of Computer Science in the University of Hong Kong. He then obtained his MSc and PhD from Department of Computer Science of Purdue University in 2003 and 2005 respectively. Dr. Cheng was an Assistant Professor in the Department of Computing of the Hong Kong Polytechnic University during 2005-08. He was a visiting scientist in the Institute of Parallel and Distributed Systems in the University of Stuttgart during the summer of 2006.
Dr. Cheng was granted an Outstanding Young Researcher Award 2011-12 by HKU. He was the recipient of the 2010 Research Output Prize in the Department of Computer Science of HKU. He also received the U21 Fellowship in 2011. He received the Performance Reward in years 2006 and 2007 awarded by the Hong Kong Polytechnic University. He is the Chair of the Department Research Postgraduate Committee, and was the Vice Chairperson of the ACM ( Hong Kong Chapter ) in 2013. He is a member of the IEEE, the ACM, the Special Interest Group on Management of Data ( ACM SIGMOD ), and the UPE (Upsilon Pi Epsilon Honor Society). He is an editorial board member of TKDE, DAPD and IS, and was a guest editor for TKDE, DAPD, and Geoinformatica. He is an area chair of ICDE 2017, a senior PC member for DASFAA 2015, PC co-chair of APWeb 2015, area chair for CIKM 2014, area chair for Encyclopedia of Database Systems, program co-chair of SSTD 2013, and a workshop co-chair of ICDE 2014. He received an Outstanding Service Award in the CIKM 2009 conference. He has served as PC members and reviewer for top conferences (e.g., SIGMOD, VLDB, ICDE, EDBT, KDD, ICDM, and CIKM) and journals (e.g., TODS, TKDE, VLDBJ, IS, and TMC).
第一届IEEE信息中心未来网络学术会议（IEEE HotICN2018)将于8月15日至17日在深圳北京大学深圳研究生院举行。会议欢迎以下三∞个领域的论文:信息中心未来网络、区块链技术和知识→图谱。 HotICN2018致力于解决未来网络系统的设计、构建、管理⊙和评估等研究问题。它是研究人员、从业人员、开发人员和用户探索尖端思想，交流技术、工具和经验的前沿论坛。我们诚邀大家提交原创性的研究成果。