Project 6: Machine Learning to Estimate Crop Productivity from Multi-Sensor Fused Satellite Data

Co-Mentors: Kaiyu Guan (Natural Resources & Environmental Sciences) and Jian Peng (Computer Science)
Social Impact: Developing a reliable and time-lead forecasting systems for crop type and crop yield for farm communities and government agencies use.

Project description: Reliable and time-lead forecasting systems for crop type and crop yield has critical values for various purposes for farmer communities and government agencies. Field-level estimation of crop yield is particularly useful for understanding how crop productivity responses to various management requirements and environmental factors. With continued climate variability (e.g., 2012 Midwest drought) and the ongoing climate change, farmer community and our government require better information to monitor crop growth and their near-term prospects. As the most important staple food production area, the U.S. Corn Belt produces half of the global corn and soybean combined, and has significant importance for regional, national, and global economy and food security. However, given the great needs for such forecasting information, we do not have a forecasting system for the U.S. Corn Belt for public use. This project will develop scalable machine-learning methods to integrate data from satellite remote sensing and other auxiliary information to make accurate yet cost-effective predictions of crop type. First, we will generate a 30-meter (2000-present; 10-meter for post-2014 period), daily, cloud-free data stack for the three major Corn Belt States, Illinois, Iowa, Indiana, by integrating three major satellite datasets (Landsat, MODIS, and new Sentinel-2). We will build upon this data stack to develop a machine-learning forecasting system for crop type, up to a lead time of 1-to-2 months before the harvest time with a particular focus on rain-fed corn and soybean. We will fully leverage satellite information, including spectral, phenological, and field-level texture information, for our analytics to achieve field-level predictions with advanced deep machine learning approaches.

In this project, the REU students will implement a computational pipeline that extract real-time satellite images, deposit data into a database and develop data fusion and machine learning components for predictive tasks. Due to the enormity of the satellite data and the highly demanding computational requirements, the students will need to distribute storage- and computation-intensive modules on the Blue Waters supercomputer. The students will have meetings with both Guan and Peng on a regular basis, participate the group meetings and reading groups organized in their research labs, and interact with graduate students. The students will learn advanced knowledge in remote sensing and machine learning and improve scientific research skills.