Data Project for Research Reporting


NSF_Data_Diagram.jpg
Quick View
Summary: 

With the intent of adding other sources of research activity to the NSF Survey, other data sources need to be consolidated and centralized. The Data project will consist of creating the extraction queries, transformation processes, and load procedures to migrate this data to an environment that can be used by the individuals responsible for the reporting.

The revised expectations are that the Research Dashboard show the "University Investment" along with the other expenditures made in support of research activities, and that this be available as soon as possible to display the fiscal year end results. Tables and other objects will be created in the UTO SQL Server consolidated environment, and will eventually be accessible by staff and officers who need to use the data to report university research activity completely and consistently. The project is a high-priority need.

See initial case (Customer's request) in CRM 417908

Start Date: 
July 2, 2009
Go Live: 
December 24, 2009
Current Milestone: 
Nov. 30, 2009: Review of data by principals. In progress.
Stage: 
On track
Lead Customer Comments: 
Tamara Deuser (8-18 email): I do agree to the extension, it is important that we are all comfortable with the data before we press on. 
People
Sponsor/Champion: 
R.F. Rick Shangraw, Vice President of Research and Economic Affairs
Project Manager: 
Rick Bolin
Contact for more information: 
Database.God@asu.edu
Associate VP University Technology: 
John Rome
More Info
Source: 
Executive
Priority: 
High
Scope: 

This project will develop the means by which payroll data can be incorporated with research-related reports.

The initial request (from customer Steve Beguin on 6/24/09) was to create two new tables in the SPONSORED data warehouse, with pay detail and cost share detail.
From initial CRM tickets (417908 and 417910):

  • Description: Contain an extraction of all pay detail including job code for entire University.
  • Instructions: Build new table from PS_ASU_PAY_EXP_VW and ASU_EMP_JOB_PROFILE_VW
  • Structure: Refer to attached Excel example [worksheet available from Rick Bolin--partial image here]
  • Points of Contact: Justin Poll, Steve Beguin
  • Description: Same logic will be used that creates the Certified Cost Share view on the Sponsored Effort Reporting Dashboard.
  • Instructions: See Letty Butner for SQL and source tables.
  • Structure: Same fields as the dashboard view.
  • Points of Contact: Justin Poll, Steve Beguin

Additional details of tables/views required for addition of imputed data into OVPREA reporting (from Steve Beguin email attachment 7/14/09):

 

IDC_RATE_SCHEDULE Source Description
FISCAL_YEAR Manual Entry Data will be manually entered once a year depending on Federally
Negotiated Rate. This rate will be used to calculate Unreimbursed IDC.
EFFECTIVE_DATE Manual Entry
RATE_STATUS Manual Entry
RATE_TYPE Manual Entry
IDC_ACTIVITY_TYPE Manual Entry
RATE Manual Entry
ON_OFF_CAMPUS Manual Entry
     
RA_OTHER_GRA_EXPENSES Source Description
FISCAL_YEAR Fiscal Year Employee Pay Detail views accessible through MyReports as
detailed on "Emp Pay" tab of this workbook. Will also need to join
to the ACCOUNT table for "Investigator" data. This view will be an
aggregate of only Non-Research GRA expenses (Job codes = 989002 or 989557)
FISCAL_MONTH Fiscal Month
SPONSORED_DEPARTMENT_CODE Account.HR_DEPARTMENT_CODE
SPONSOR_CODE SPONSOR_CODE = 999999 (ASU)
INVESTIGATOR_PERSON_ID Account.MANAGER_AFFILIATE_ID
INVESTIGATOR_PERSON_FULL_NAME Account.MANAGER_FULL_NAME
PI_AFFILIATE_ID Account.MANAGER_AFFILIATE_ID
PI_FULL_NAME Account.MANAGER_FULL_NAME
AWARD_NUMBER Account Number
AWARD_TITLE Account.Desciption
RECOGNITION_ALLOCATION_PERCENT 100%
RECOGNITION_$ Amount
     
RA_UNREIMBURSED_IDC Source Description
FISCAL_YEAR   This is calculated based on the current IDC rate for on campus
organized research. It will be calculated on Sponsored, GRA, TRIF and State
and Local Accounts expenditures. Column values will depend on which source is
being used.
FISCAL_MONTH  
SPONSORED_DEPARTMENT_CODE  
SPONSOR_CODE  
INVESTIGATOR_PERSON_ID  
INVESTIGATOR_PERSON_FULL_NAME  
PI_AFFILIATE_ID  
PI_FULL_NAME  
AWARD_NUMBER  
AWARD_TITLE  
RECOGNITION_ALLOCATION_PERCENT  
RECOGNITION_$  
     
RA_INSTITUTIONAL_COST_SHARE Source Description
FISCAL_YEAR Extrapulated from Begin Date? Query will be identical to the one which populates the Certified
Cost Share Screen on the Effort Reporting dashboard. SQL available directly
from dashboard. When brought over to view an ERE rate of 15% will be applied
to each line's total.
FISCAL_MONTH Extrapulated from Begin Date?
SPONSORED_DEPARTMENT_CODE Employee Dept Code or Project Dept
Code?
SPONSOR_CODE 999999
INVESTIGATOR_PERSON_ID ID
INVESTIGATOR_PERSON_FULL_NAME Employee
PI_AFFILIATE_ID Award.PI_AFFILIATE_ID?
PI_FULL_NAME Award.PI_FULL_NAME?
AWARD_NUMBER Project
AWARD_TITLE Description
RECOGNITION_ALLOCATION_PERCENT Cost Share %?
RECOGNITION_$ Cost Share Amount?
     
RA_STATE_AND_LOCAL_RESEARCH Source Description
FISCAL_YEAR Expenditure.FISCAL_YEAR All account expenses on State and Local accounts coded with an
activity type of "Research"
FISCAL_MONTH Expenditure.FISCAL_MONTH
SPONSORED_DEPARTMENT_CODE Account.HR_DEPT_CODE
SPONSOR_CODE SPONSOR_CODE = 999999 (ASU)
INVESTIGATOR_PERSON_ID Account.MANAGER_AFFILIATE_ID
INVESTIGATOR_PERSON_FULL_NAME Account.MANAGER_FULL_NAME
PI_AFFILIATE_ID Account.MANAGER_AFFILIATE_ID
PI_FULL_NAME Account.MANAGER_FULL_NAME
AWARD_NUMBER Account Number
AWARD_TITLE Account.Desciption
RECOGNITION_ALLOCATION_PERCENT 100%
RECOGNITION_$ Expenditure.Expenditure_$

 

The data warehouse for SPONSORED activity reporting is housed on the ENTERPRISE1 Sybase data server. The tables presently used for research reporting include data from the ASU Foundation (input manually), TRIF income and expenditures (extracted from the FINANCIAL data warehouse), as well as sponsored projects (the SPONSORED database). Payroll information resides in the Oracle-based EPM. The Research Dashboard uses a SQL Server data server as its source.

After meeting with the principals for the project, it was determined that the best strategy for successful deployment to the dashboard would be to place the data in the enterprise consolidated SQL Server cluster. Later, these tables will be made available (perhaps with other objects) to assist University staff and officers with preparing reports of research activity.

All Milestones and Schedule: 

July 2, 2009: Initial meeting with George Raudenbush (Justin Poll on phone), Matt Sheppard, Don Chiurazzi Completed

July 17, 2009: Scope definition, documentation of deliverable items, preparation of work effort estimates Completed

July 25, 2009: TRIF revenue and cost data corrected in ENTERPRISE1 SPONSORED database, using revised queries from accounting Completed

July 29, 2009: Generate extraction queries for all relevant data from payroll and other financial systems -- Completed; Tables generated, ETL Drafted, connections to data sources being crafted/approved 

Note: Alternative method of connecting with the data was devised by ePM directors, and some data validation and import strategies could be drafted

August 10: Tables populated in MS2K5DEV1 (development server) from EPM for review Completed and customer advised

Note: the grain of the data for payroll expenditures is ANNUAL, not MONTHLY as required by the Dashboard. Additionally, many values are missing in the new tables (such as department and investigator ID), which will block drill-down and cause incorrect roll-up values. Until resolved, the full deployment to PRODUCTION can not occur.

August 11, 2009: All tables modified to roll up the University Investment costs along with other source. Complete. Work in progress to incorporate feedback and guidance from customer regarding data issues identified above.

From Steve Beguin (8-19 email): Are you able to extrapolate the Fiscal Month from the Pay_End_Date as you do with the Proposal Report Date?

You have the appropriate join between the ASU_AGNCYORG and ACCOUNT. Unmatched values  should not be the case as long as you are joining on Account and Fiscal Year in the Account table. Can you provide a few examples of unmatched AGNCYORGS?

August 22-24, 2009: Source data (EPM) load issues block efforts to develop and test extract procedures for pay data. Granularity issues addressed in consultation with financial services, regarding the use of a crosswalk to determine appropriate fiscal month for transaction. Configured.

August 24--November 30, 2009: Coding for ETL process to development server for Dashboard. Connection to development and production HR data is completed. State & Local Research, TRIF data, and Sponsored data rolled into Unreimbursed IDC table, per above chart and discussion with principals. Package developed for "Other GRA Expenses", using guidelines established in the email threads. Completed. Review by principals in progress.

November 1--December 1, 2009: Discussions and initial queries developed for the extraction of the Cost Share data. Results being reviewed. Fine-tuning and integration with other ETL pending.

December 15, 2009: Research Dashboard procedures (queries) to be modified to report "University Investment Only" on all grids and charts; dashboard drop-down lists modified to include this option. Pending. Functional tests of dashboard begin when complete in DEV.

December 18, 2009: Deployment of functional capability and data objects to production-class servers. Validation and verification testing on QA systems.

December 20, 2009: Changes migrated into PRODUCTION along with all supporting data.

Deliverables: 

TABLES: IDC_RATE_SCHEDULE, RA_OTHER_GRA_EXPENSES, RA_UNREIMBURSED_IDC, RA_INSTITUTIONAL_COST_SHARE (Note: Not included in present scope of work due to incompatible reporting periods for pay data and delayed reporting requirements), RA_STATE_AND_LOCAL_RESEARCH


Queries and Processes: Extraction from source systems into local tables. Computation of summary values in aggregate (fact) tables. Display of filtered values in dashboard. Display of "University Investment Only" values on dashboard.


Deferred slightly Make data in these tables available to reporting staff and officers throughout the University.



Note: INSTITUTIONAL_COST_SHARE table is now to be part of the delivered set (November 4, 2009), per conversation with Steve Beguin.

Risk & Threats: 

The data exists in the EPM, not in the Sybase SPONSORED data warehouse; there are several reasons for not simply copying large amounts of data into the SPONSORED system. New development in Sybase-hosted systems has been discouraged for years. The systems are limited in connections, data flow alternatives, and storage capacity. If we can not identify an alternative data system (or means of providing the data to the customer) the project will be delayed.

Note: The data shall be extracted to, and maintained in, the UTO SQL Server consolidated environment. This places it in an optimal location for extraction and integration with the other tables of source and use data, and makes development on this compressed time schedule possible.

The approval process for application ID and permission to connect with the source HR data could take longer than a few days to complete. Firewalls might also not be opened where necessary for the connection from the host systems to the ETL server. Either of these security-related circumstances could delay the completion of the initial queries because the data would not yet be directly accessible by the server. Final completion date of the project (the go-live date) should not be affected unless these delays are substantial (weeks, instead of days).

AttachmentSize
July 2009 Data Diagram for NSF Imputed Values.pdf14.08 KB
Syndicate content