Professional Documents
Culture Documents
CREATEFUNCTION[dbo].[fnCalculateDistance](@Lat1float,@Long1float,@Lat2float,
@Long2float)
Userdefinedfunctionthatcalculatesthedirectdistancebetweentwogeographical
coordinates.
RETURNSfloat
AS
BEGIN
DECLARE@distancedecimal(28,10)
Converttoradians
SET@Lat1=@Lat1/57.2958
SET@Long1=@Long1/57.2958
SET@Lat2=@Lat2/57.2958
SET@Long2=@Long2/57.2958
Calculatedistance
SET@distance=(SIN(@Lat1)*SIN(@Lat2))+(COS(@Lat1)*COS(@Lat2)*COS(@Long2
@Long1))
Converttomiles
IF@distance<>0
BEGIN
SET@distance=3958.75*ATAN(SQRT(1POWER(@distance,2))/@distance);
END
RETURN@distance
END
GO
The function is a scalarvalued function, returning a single data value of a predefined type.
It takes latitude and longitude values as inputs, obtained from trip pickup and dropoff locations. The Haversine
formula converts locations to radians and uses those values to compute the direct distance in miles between those
two locations.
To add the computed value to a table that can be used for training the model, you'll use another function, fnEngineerFeatures.
CREATEFUNCTION[dbo].[fnEngineerFeatures](
@passenger_countint=0,
@trip_distancefloat=0,
@trip_time_in_secsint=0,
@pickup_latitudefloat=0,
@pickup_longitudefloat=0,
@dropoff_latitudefloat=0,
@dropoff_longitudefloat=0)
RETURNSTABLE
AS
RETURN
(
AddtheSELECTstatementwithparameterreferenceshere
SELECT
@passenger_countASpassenger_count,
@trip_distanceAStrip_distance,
@trip_time_in_secsAStrip_time_in_secs,
[dbo].[fnCalculateDistance](@pickup_latitude,@pickup_longitude,@dropoff_latitude,
@dropoff_longitude)ASdirect_distance
)
GO
2. To verify that this function works, you can use it to calculate the geographical distance for those trips where the metered
distance was 0 but the pickup and dropoff locations were different.
SELECTtipped,fare_amount,passenger_count,(trip_time_in_secs/60)asTripMinutes,
trip_distance,pickup_datetime,dropoff_datetime,
dbo.fnCalculateDistance(pickup_latitude,pickup_longitude,dropoff_latitude,
dropoff_longitude)ASdirect_distance
FROMnyctaxi_sample
WHEREpickup_longitude!=dropoff_longitudeandpickup_latitude!=dropoff_latitude
andtrip_distance=0
ORDERBYtrip_time_in_secsDESC
As you can see, the distance reported by the meter doesn't always correspond to geographical distance. This is why
feature engineering is so important.
In the next step, you'll learn how to use these data features to train a machine learning model using R.
Next Step
Step 5: Train and Save a Model using TSQL
Previous Step
Step 3: Explore and Visualize the Data
See Also
InDatabase Advanced Analytics for SQL Developers Tutorial
SQL Server R Services Tutorials
2016 Microsoft